woops was obviously tired, what I said clearly doesn't make sense. On 10 June 2016 at 14:52, kurt Greaves <k...@instaclustr.com> wrote:
> Sorry, I did mean larger number of rows per partition. > > On 9 June 2016 at 10:12, John Thomas <jthom...@gmail.com> wrote: > >> The example I gave was for when N=1, if we need to save more values I >> planned to just add more columns. >> >> On Thu, Jun 9, 2016 at 12:51 AM, kurt Greaves <k...@instaclustr.com> >> wrote: >> >>> I would say it's probably due to a significantly larger number of >>> partitions when using the overwrite method - but really you should be >>> seeing similar performance unless one of the schemas ends up generating a >>> lot more disk IO. >>> If you're planning to read the last N values for an event at the same >>> time the widerow schema would be better, otherwise reading N events using >>> the overwrite schema will result in you hitting N partitions. You really >>> need to take into account how you're going to read the data when you design >>> a schema, not only how many writes you can push through. >>> >>> On 8 June 2016 at 19:02, John Thomas <jthom...@gmail.com> wrote: >>> >>>> We have a use case where we are storing event data for a given system >>>> and only want to retain the last N values. Storing extra values for some >>>> time, as long as it isn’t too long, is fine but never less than N. We >>>> can't use TTLs to delete the data because we can't be sure how frequently >>>> events will arrive and could end up losing everything. Is there any built >>>> in mechanism to accomplish this or a known pattern that we can follow? The >>>> events will be read and written at a pretty high frequency so the solution >>>> would have to be performant and not fragile under stress. >>>> >>>> >>>> >>>> We’ve played with a schema that just has N distinct columns with one >>>> value in each but have found overwrites seem to perform much poorer than >>>> wide rows. The use case we tested only required we store the most recent >>>> value: >>>> >>>> >>>> >>>> CREATE TABLE eventyvalue_overwrite( >>>> >>>> system_name text, >>>> >>>> event_name text, >>>> >>>> event_time timestamp, >>>> >>>> event_value blob, >>>> >>>> PRIMARY KEY (system_name,event_name)) >>>> >>>> >>>> >>>> CREATE TABLE eventvalue_widerow ( >>>> >>>> system_name text, >>>> >>>> event_name text, >>>> >>>> event_time timestamp, >>>> >>>> event_value blob, >>>> >>>> PRIMARY KEY ((system_name, event_name), event_time)) >>>> >>>> WITH CLUSTERING ORDER BY (event_time DESC) >>>> >>>> >>>> >>>> We tested it against the DataStax AMI on EC2 with 6 nodes, replication >>>> 3, write consistency 2, and default settings with a write only workload and >>>> got 190K/s for wide row and 150K/s for overwrite. Thinking through the >>>> write path it seems the performance should be pretty similar, with probably >>>> smaller sstables for the overwrite schema, can anyone explain the big >>>> difference? >>>> >>>> >>>> >>>> The wide row solution is more complex in that it requires a separate >>>> clean up thread that will handle deleting the extra values. If that’s the >>>> path we have to follow we’re thinking we’d add a bucket of some sort so >>>> that we can delete an entire partition at a time after copying some values >>>> forward, on the assumption that deleting the whole partition is much better >>>> than deleting some slice of the partition. Is that true? Also, is there >>>> any difference between setting a really short ttl and doing a delete? >>>> >>>> >>>> >>>> I know there are a lot of questions in there but we’ve been going back >>>> and forth on this for a while and I’d really appreciate any help you could >>>> give. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> John >>>> >>> >>> >>> >>> -- >>> Kurt Greaves >>> k...@instaclustr.com >>> www.instaclustr.com >>> >> >> > > > -- > Kurt Greaves > k...@instaclustr.com > www.instaclustr.com > -- Kurt Greaves k...@instaclustr.com www.instaclustr.com