Re: Partition size, limits, recommendations for tables where all columns are part of the primary key
Hi Yes, basically rows have no cells as everything is in the partition key/clustering columns. You can always look unto the data using the sstabledump (this is for DSE 6.7 that I have running): sstabledump ac-1-bti-Data.db [ { "partition" : { "key" : [ "977eb1f1-aa5b-11ea-b91a-db426f6f892c", "977ed900-aa5b-11ea-b91a-db426f6f892c" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 78, "clustering" : [ "test", "977ed901-aa5b-11ea-b91a-db426f6f892c" ], "liveness_info" : { "tstamp" : "2020-06-09T14:14:54.863249Z" }, "cells" : [ ] } ] } ] P.S. You can play with your schema, and do some performance tests using the https://github.com/nosqlbench/ On Tue, Jun 9, 2020 at 3:51 PM Benjamin Christenson < ben.christen...@kineticdata.com> wrote: > Hello all, I am doing some data modeling and want to make sure that I > understand some nuances to cell counts, partition sizes, and related > recommendations. Am I correct in my understanding that tables for which > every column is in the primary key will always have 0 cells? > > For example, using https://cql-calculator.herokuapp.com/, I tested the > following table definition with 100 (1 million) rows per partition and > an average value size of 255 bytes, and it returned that there were 0 cells > and the partition took up 32 bytes total: > CREATE TABLE IF NOT EXISTS widgets ( > id timeuuid, > key_id timeuuid, > parent_id timeuuid, > value text, > PRIMARY KEY ((parent_id, key_id), value, id) > ) > > Obviously the total amount of disk space for this table must be more than > 32 bytes. In this situation, how should I be reasoning about partition > sizes (in terms of the 2B cell limit, and 100MB-400MB partition size > limit)? Additionally, are there other limits / potential performance > issues I should be concerned about? > > Ben Christenson > Developer > > Kinetic Data, Inc. > Your business. Your process. > 651-556-0937 | ben.christen...@kineticdata.com > www.kineticdata.com | community.kineticdata.com > > -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)
Partition size, limits, recommendations for tables where all columns are part of the primary key
Hello all, I am doing some data modeling and want to make sure that I understand some nuances to cell counts, partition sizes, and related recommendations. Am I correct in my understanding that tables for which every column is in the primary key will always have 0 cells? For example, using https://cql-calculator.herokuapp.com/, I tested the following table definition with 100 (1 million) rows per partition and an average value size of 255 bytes, and it returned that there were 0 cells and the partition took up 32 bytes total: CREATE TABLE IF NOT EXISTS widgets ( id timeuuid, key_id timeuuid, parent_id timeuuid, value text, PRIMARY KEY ((parent_id, key_id), value, id) ) Obviously the total amount of disk space for this table must be more than 32 bytes. In this situation, how should I be reasoning about partition sizes (in terms of the 2B cell limit, and 100MB-400MB partition size limit)? Additionally, are there other limits / potential performance issues I should be concerned about? Ben Christenson Developer Kinetic Data, Inc. Your business. Your process. 651-556-0937 | ben.christen...@kineticdata.com www.kineticdata.com | community.kineticdata.com
Re: how to check C* partition size
Hello, You can also graph metrics using Datadog / Grafana or any other monitoring tool. Look at the max / mean partition size I would say, see: http://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics. There is also a metric called 'EstimatedPartitionSizeHistogram' yet it is a gauge... I am not too sure about how to use this specific metric. C*heers --- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2018-01-08 16:47 GMT+00:00 Ahmed Eljami : > >Nodetool tablestats gives you a general idea. > > Since C* 3.X :) >
Re: how to check C* partition size
>Nodetool tablestats gives you a general idea. Since C* 3.X :)
RE: how to check C* partition size
Nodetool tablestats gives you a general idea. Meg Mara From: Peng Xiao [mailto:2535...@qq.com] Sent: Sunday, January 07, 2018 9:26 AM To: user Subject: how to check C* partition size Hi guys, Could anyone please help on this simple question? How to check C* partition size and related information. looks nodetool ring only shows the token distribution. Thanks
Re: how to check C* partition size
nodetool cfstats nodetool cfhistograms -- Jeff Jirsa > On Jan 7, 2018, at 7:26 AM, Peng Xiao <2535...@qq.com> wrote: > > Hi guys, > > Could anyone please help on this simple question? > How to check C* partition size and related information. > looks nodetool ring only shows the token distribution. > > Thanks - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
how to check C* partition size
Hi guys, Could anyone please help on this simple question? How to check C* partition size and related information. looks nodetool ring only shows the token distribution. Thanks
Re: effect of partition size
Yes, that's LIKELY "better". On Mon, Dec 11, 2017 at 8:10 AM, Micha wrote: > ok, thanks for the answer. > > So the better approach here is to adjust the table schema to get the > partition size to around 100MB max. > This means using a partition key with multiple parts and making more > selects instead of one when querying the data (which may increase > parallelism). > > Michael > > > >
Re: effect of partition size
ok, thanks for the answer. So the better approach here is to adjust the table schema to get the partition size to around 100MB max. This means using a partition key with multiple parts and making more selects instead of one when querying the data (which may increase parallelism). Michael - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: effect of partition size
There's a few, and there have been various proposals (some in progress) to deal with them. The two most obvious problems are: The primary problem for most people is that wide partitions cause JVM heap pressure on reads (CASSANDRA-11206, CASSANDRA-9754). This is because we break the wide partitions into 64k chunks for indexing, and then load the entire index for a partition into memory at once. You 820MB partition would then create ~12000 index objects, each with 2 clustering keys (start of index, end of index). When the read is done, the objects are released, and the JVM has to clean it up - that's expensive (and can lead to GC pauses). CASSANDRA-11206 lazily loads these objects for 3.6 and higher, CASSANDRA-9754 will make it a b-tree on disk - look for #9754 in the 4.0 era. In this category, you can end up with a huge addition to your key cache that is either immediately invalidated, or invalidates a number of other rows - key cache is one of the most important caches in cassandra, so having a huge row wipe it out is bad. The second problem is repair, both anti-entropy and read repair. The unit we use for repair is a partition. If you have huge partitions, when you repair, you repair the whole partition. You've got 820MB of data, but may 100 bytes difference? For anti-entropy repairs right now: we're streaming 820MB-100 bytes of data, and letting compaction clean it up. For anti-entropy repairs, CASSANDRA-8911 is a proposal to do that more efficiently. For read repairs: we'll end up reading most of the partition and sending mutations for the whole thing all at once, which can be a lot of updates if you're very out of sync. The typical recommendation is to keep rows around 10-100MB. In your case, you're ~800. Whether or not that's "too big" is based on your read latency requirements, read concurrency, and whether or not 800MB is the upper bound. It may be ok if you're rarely reading it and it doesnt grow. Or it may be that you're reading it a lot and you need to re-model your data. On Mon, Dec 11, 2017 at 5:44 AM, Micha wrote: > Hi, > > What are the effects of large partitions? > > I have a few tables which have partitions sizes as: > > 95% 24000 > 98% 42000 > 99% 85000 > > Max 82000 > > > So, should I redesign the schema to get this max smaller or doesn't it > matter much, since 99% of the partitions are <= 85000 ? > > Thanks for answering > Michael > > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
effect of partition size
Hi, What are the effects of large partitions? I have a few tables which have partitions sizes as: 95% 24000 98% 42000 99% 85000 Max 82000 So, should I redesign the schema to get this max smaller or doesn't it matter much, since 99% of the partitions are <= 85000 ? Thanks for answering Michael - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: How to obtain partition size
How about this tool? https://github.com/instaclustr/cassandra-sstable-tools > On 13 Mar 2017, at 17:56, Artur R wrote: > > Hello! > > I can't find where C* stores information about partitions size (if stores it > at all). > So, the questions; > > 1. How to obtain the size (in rows or in bytes - doesn't matter) of some > particular partition? > I know that there is system.size_estimates table with mean_partition_size, > but it's only mean size among all partitions. > > 2. How to obtain the size of entire table? > Again, does "mean_partition_size * partitions_count" (fields from > system.size_estimates table) == real size of the table? > > 3. Is it possible to obtain size of rows by some clustering key within some > partition? > > > Maybe one can obtain these information using Java driver or from C* system > tables? > > >
How to obtain partition size
Hello! I can't find where C* stores information about partitions size (if stores it at all). So, the questions; 1. How to obtain the size (in rows or in bytes - doesn't matter) of some particular partition? I know that there is *system.size_estimates* table with *mean_partition_size*, but it's only mean size among all partitions. 2. How to obtain the size of entire table? Again, does "mean_partition_size * partitions_count" (fields from *system.size_estimates* table) == real size of the table? 3. Is it possible to obtain size of rows by some clustering key within some partition? Maybe one can obtain these information using Java driver or from C* system tables?
Re: Metric to monitor partition size
We're on 2.X so this information may not apply to your version, but you should see: 1) A log statement upon compaction, like "Writing large partition", including the primary partition key (see https://issues.apache.org/jira/browse/CASSANDRA-9643). Configurable threshold in cassandra.yaml 2) Problematic partition distributions in nodetool cfhistograms, although without the primary partition key 3) Potentially large partitions in sstables themselves using sstable parsing utilities. There's also a patch for sstablekeys here, but I've never used it (https://issues.apache.org/jira/browse/CASSANDRA-8720) While you _could_ monitor partitions and stop writing to that partition key when the size reaches a certain threshold (roughly acquired through a method like above) I'm struggling to think of a case where you'd actually want to do that- pushing partitions to some maximum size is generally not a great idea. Ideally you'd want your partitions as small as you can manage them without making your queries absolutely neurotic. On Thu, Jan 12, 2017 at 6:08 AM, Saumitra S wrote: > Is there any metric or way to find out if any partition has grown beyond a > certain size or certain row count? > > If a partition reaches a certain size or limit, I want to stop sending > further write requests to it. Is it possible? > > >
Metric to monitor partition size
Is there any metric or way to find out if any partition has grown beyond a certain size or certain row count? If a partition reaches a certain size or limit, I want to stop sending further write requests to it. Is it possible?
Partition size estimation formula in 3.0
Hello, Until 3.0, we had a nice formula to estimate partition size : sizeof(partition keys) + sizeof(static columns) + countof(rows) * sizeof(regular columns) + countof(rows) * countof(regular columns) * sizeof(clustering columns) + 8 * count(values in partition) With the 3.0 storage engine, the size is supposed to be smaller. And I'm looking for the new formula. I reckon the formula to become : sizeof(partition keys) + sizeof(static columns) + countof(rows) * sizeof(regular columns) + countof(rows) * sizeof(clustering columns) + 8 * count(values in partition) That is the clustering column values are no more repeated for each regular column in the row. Could anyone confirm me that new formula or am I missing something ? Thank you, -- Jérôme Mainaud jer...@mainaud.com
Re: Partition size
Generally if you foresee the partitions getting out of control in terms of size, a method often employed is to bucket according to some criteria. For example, if I have a time series use case, I might bucket by month or week. That presumes you can foresee it though. As far as limiting that capability, I can see that being in the ballpark of https://issues.apache.org/jira/browse/CASSANDRA-8303 <https://issues.apache.org/jira/browse/CASSANDRA-8303> but a bit trickier than the limits mentioned in that ticket. > On Sep 12, 2016, at 12:17 PM, Anshu Vajpayee wrote: > > Thanks Jeff. I got the answer now. > Is there any way to put guardrail to avoid large partition from cassandra > side? I know it is modeling problem and cassandra writes warning on system. > log for large partition. But I think there should be a way to put > restriction for it from Cassandra side. > > On 12 Sep 2016 9:50 p.m., "Jeff Jirsa" <mailto:jji...@apache.org>> wrote: > On 2016-09-08 18:53 (-0700), Anshu Vajpayee <mailto:anshu.vajpa...@gmail.com>> wrote: > > Is there any way to get partition size for a partition key ? > > > > Anshu, > > The simple answer to your question is that it is not currently possible to > get a partition size for an arbitrary key without quite a lot of work > (basically you'd have to write a tool that iterated over the data on disk, > which is nontrivial). > > There exists a ticket to expose this: > https://issues.apache.org/jira/browse/CASSANDRA-12367 > <https://issues.apache.org/jira/browse/CASSANDRA-12367> > > It's not clear when that ticket will land, but I expect you'll see an API for > getting the size of a partition key in the near future. > >
Re: Partition size
On 2016-09-12 10:17 (-0700), Anshu Vajpayee wrote: > Thanks Jeff. I got the answer now. > Is there any way to put guardrail to avoid large partition from cassandra > side? I know it is modeling problem and cassandra writes warning on > system. log for large partition. But I think there should be a way to put > restriction for it from Cassandra side. Perhaps not surprisingly, folks active in the other ticket (for determining partition size) also have a ticket to blacklist large partitions: https://issues.apache.org/jira/browse/CASSANDRA-12106 Again, not complete, but it's an active topic of discussion and may appear in future versions. In the mean time, having your application maintain a list of 'blacklisted' partitions may be a suitable workaround.
Re: Partition size
Thanks Jeff. I got the answer now. Is there any way to put guardrail to avoid large partition from cassandra side? I know it is modeling problem and cassandra writes warning on system. log for large partition. But I think there should be a way to put restriction for it from Cassandra side. On 12 Sep 2016 9:50 p.m., "Jeff Jirsa" wrote: > On 2016-09-08 18:53 (-0700), Anshu Vajpayee > wrote: > > Is there any way to get partition size for a partition key ? > > > > Anshu, > > The simple answer to your question is that it is not currently possible to > get a partition size for an arbitrary key without quite a lot of work > (basically you'd have to write a tool that iterated over the data on disk, > which is nontrivial). > > There exists a ticket to expose this: https://issues.apache.org/ > jira/browse/CASSANDRA-12367 > > It's not clear when that ticket will land, but I expect you'll see an API > for getting the size of a partition key in the near future. > > >
Re: Partition size
h. In the grand scheme of > things the single e-mail that started this particular discussion is in > the noise. However, a consistent pattern of such e-mails would be much > more troubling. My intent was to ensure that such a pattern did not form. > > Whether people agree with my response or not, the community is hopefully > more aware of the issue than it was previously. > > Mark > > > > On Friday, 9 September 2016, Mark Thomas > <mailto:ma...@apache.org>> wrote: > > > > On 09/09/2016 16:46, Mark Curtis wrote: > > > If your partition sizes are over 100MB iirc then you'll normally > see > > > warnings in your system.log, this will outline the partition key, > at > > > least in Cassandra 2.0 and 2.1 as I recall. > > > > > > Your best friend here is nodetool cfstats which shows you the > > > min/mean/max partition sizes for your table. It's quite often used > to > > > pinpoint large partitons on nodes in a cluster. > > > > > > More info > > > here: > > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/ > toolsCFstats.html > > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/ > toolsCFstats.html> > > > > Folks, > > > > It is *Apache* Cassandra. If you are going to point to docs, please > > point to the official Apache docs unless there is a very good reason > > not to. > > > > In this case: > > > > http://cassandra.apache.org/doc/latest/configuration/ > cassandra_config_file.html#compaction_large_partition_warning_threshold_mb > > <http://cassandra.apache.org/doc/latest/configuration/ > cassandra_config_file.html#compaction_large_partition_warning_threshold_mb > > > > > > looks to the place. > > > > Mark > > > > > > > > > > Thanks > > > > > > Mark > > > > > > > > > On 9 September 2016 at 02:53, Anshu Vajpayee < > anshu.vajpa...@gmail.com > > > <mailto:anshu.vajpa...@gmail.com>> wrote: > > > > > > Is there any way to get partition size for a partition key ? > > > > > > > > > >
Re: Partition size
On 2016-09-08 18:53 (-0700), Anshu Vajpayee wrote: > Is there any way to get partition size for a partition key ? > Anshu, The simple answer to your question is that it is not currently possible to get a partition size for an arbitrary key without quite a lot of work (basically you'd have to write a tool that iterated over the data on disk, which is nontrivial). There exists a ticket to expose this: https://issues.apache.org/jira/browse/CASSANDRA-12367 It's not clear when that ticket will land, but I expect you'll see an API for getting the size of a partition key in the near future.
Re: Partition size
uthor to see if they'd >> be >> > willing to contribute something to the docs. >> > >> > > Would you have reacted this way if Aaron Morton linked a blog >> post by >> > > thelastpickle? Or a random user posted their own resources? >> Obviously not. >> > >> > Wrong. My reaction was based on the content of the message (a link >> to >> > 3rd party docs in response to a question when an equivalent link to >> > project hosted docs was available) not on who sent it or their >> employer. >> > >> > > I was initially all for the ASF endeavour to counteract DataStax' >> > > outsized influence on the project, and was hopeful you might >> achieve >> > > some positive change. Perhaps you may well still do. But it >> seems to >> > > me that the ASF behaviour is beginning to cross from constructive >> > > criticism of the project participants to prejudicially hostile >> behaviour >> > > against certain community members - and that is unlikely to >> result in a >> > > better project. >> > > >> > > You should be treating everyone consistently, in a manner that >> promotes >> > > project health. >> > >> > It is not healthy if community members are directing users to 3rd >> party >> > documentation in preference to the project's own documentation. If >> it is >> > happening because the project's documentation is non-existent / >> wrong / >> > poorly written / etc. then that is understandable (and would be an >> issue >> > the project needed to address) but that was not the case in this >> > instance. >> > >> > There are many aspects to community health. In the grand scheme of >> > things the single e-mail that started this particular discussion is >> in >> > the noise. However, a consistent pattern of such e-mails would be >> much >> > more troubling. My intent was to ensure that such a pattern did not >> > form. >> > >> > Whether people agree with my response or not, the community is >> hopefully >> > more aware of the issue than it was previously. >> > >> > Mark >> > >> > >> > > On Friday, 9 September 2016, Mark Thomas > <mailto:ma...@apache.org> >> > > <mailto:ma...@apache.org <mailto:ma...@apache.org>>> wrote: >> > > >> > > On 09/09/2016 16:46, Mark Curtis wrote: >> > > > If your partition sizes are over 100MB iirc then you'll >> > normally see >> > > > warnings in your system.log, this will outline the partition >> > key, at >> > > > least in Cassandra 2.0 and 2.1 as I recall. >> > > > >> > > > Your best friend here is nodetool cfstats which shows you >> the >> > > > min/mean/max partition sizes for your table. It's quite >> > often used to >> > > > pinpoint large partitons on nodes in a cluster. >> > > > >> > > > More info >> > > > here: >> > > >> > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/t >> oolsCFstats.html >> > <https://docs.datastax.com/en/cassandra/2.1/cassandra/ >> tools/toolsCFstats.html> >> > > >> > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/ >> toolsCFstats.html >> > <https://docs.datastax.com/en/cassandra/2.1/cassandra/ >> tools/toolsCFstats.html>> >> > > >> > > Folks, >> > > >> > > It is *Apache* Cassandra. If you are going to point to docs, >> > please >> > > point to the official Apache docs unless there is a very good >> > reason >> > > not to. >> > > >> > > In this case: >> > > >> > > >> > http://cassandra.apache.org/doc/latest/configuration/cassand >> ra_config_file.html#compaction_large_partition_warning_threshold_mb >> > <http://cassandra.apache.org/doc/latest/configuration/cassa >> ndra_config_file.html#compaction_large_partition_warning_threshold_mb> >> > > >> > <http://cassandra.apache.org/doc/latest/configuration/cassan >> dra_config_file.html#compaction_large_partition_warning_threshold_mb >> > <http://cassandra.apache.org/doc/latest/configuration/cassa >> ndra_config_file.html#compaction_large_partition_warning_threshold_mb>> >> > > >> > > looks to the place. >> > > >> > > Mark >> > > >> > > >> > > > >> > > > Thanks >> > > > >> > > > Mark >> > > > >> > > > >> > > > On 9 September 2016 at 02:53, Anshu Vajpayee >> > mailto:anshu.vajpa...@gmail.com> >> > > > <mailto:anshu.vajpa...@gmail.com >> > <mailto:anshu.vajpa...@gmail.com>>> wrote: >> > > > >> > > > Is there any way to get partition size for a partition >> > key ? >> > > > >> > > > >> > > >> > >> > >> >> >
Re: Partition size
r project. > > > > > > You should be treating everyone consistently, in a manner that > promotes > > > project health. > > > > It is not healthy if community members are directing users to 3rd > party > > documentation in preference to the project's own documentation. If > it is > > happening because the project's documentation is non-existent / > wrong / > > poorly written / etc. then that is understandable (and would be an > issue > > the project needed to address) but that was not the case in this > > instance. > > > > There are many aspects to community health. In the grand scheme of > > things the single e-mail that started this particular discussion is > in > > the noise. However, a consistent pattern of such e-mails would be > much > > more troubling. My intent was to ensure that such a pattern did not > > form. > > > > Whether people agree with my response or not, the community is > hopefully > > more aware of the issue than it was previously. > > > > Mark > > > > > > > On Friday, 9 September 2016, Mark Thomas <mailto:ma...@apache.org> > > > <mailto:ma...@apache.org <mailto:ma...@apache.org>>> wrote: > > > > > > On 09/09/2016 16:46, Mark Curtis wrote: > > > > If your partition sizes are over 100MB iirc then you'll > > normally see > > > > warnings in your system.log, this will outline the partition > > key, at > > > > least in Cassandra 2.0 and 2.1 as I recall. > > > > > > > > Your best friend here is nodetool cfstats which shows you the > > > > min/mean/max partition sizes for your table. It's quite > > often used to > > > > pinpoint large partitons on nodes in a cluster. > > > > > > > > More info > > > > here: > > > > > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/ > toolsCFstats.html > > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/ > toolsCFstats.html> > > > > > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/ > toolsCFstats.html > > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/ > toolsCFstats.html>> > > > > > > Folks, > > > > > > It is *Apache* Cassandra. If you are going to point to docs, > > please > > > point to the official Apache docs unless there is a very good > > reason > > > not to. > > > > > > In this case: > > > > > > > > http://cassandra.apache.org/doc/latest/configuration/ > cassandra_config_file.html#compaction_large_partition_warning_threshold_mb > > <http://cassandra.apache.org/doc/latest/configuration/ > cassandra_config_file.html#compaction_large_partition_warning_threshold_mb > > > > > > > <http://cassandra.apache.org/doc/latest/configuration/ > cassandra_config_file.html#compaction_large_partition_warning_threshold_mb > > <http://cassandra.apache.org/doc/latest/configuration/ > cassandra_config_file.html#compaction_large_partition_warning_threshold_mb > >> > > > > > > looks to the place. > > > > > > Mark > > > > > > > > > > > > > > Thanks > > > > > > > > Mark > > > > > > > > > > > > On 9 September 2016 at 02:53, Anshu Vajpayee > > mailto:anshu.vajpa...@gmail.com> > > > > <mailto:anshu.vajpa...@gmail.com > > <mailto:anshu.vajpa...@gmail.com>>> wrote: > > > > > > > > Is there any way to get partition size for a partition > > key ? > > > > > > > > > > > > > > > > >
Re: Partition size
t; > > > On 09/09/2016 16:46, Mark Curtis wrote: > > > If your partition sizes are over 100MB iirc then you'll > normally see > > > warnings in your system.log, this will outline the partition > key, at > > > least in Cassandra 2.0 and 2.1 as I recall. > > > > > > Your best friend here is nodetool cfstats which shows you the > > > min/mean/max partition sizes for your table. It's quite > often used to > > > pinpoint large partitons on nodes in a cluster. > > > > > > More info > > > here: > > > > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html > > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html> > > > > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html > > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html>> > > > > Folks, > > > > It is *Apache* Cassandra. If you are going to point to docs, > please > > point to the official Apache docs unless there is a very good > reason > > not to. > > > > In this case: > > > > > > http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb > > <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb> > > > > <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb > > <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb>> > > > > looks to the place. > > > > Mark > > > > > > > > > > Thanks > > > > > > Mark > > > > > > > > > On 9 September 2016 at 02:53, Anshu Vajpayee > mailto:anshu.vajpa...@gmail.com> > > > <mailto:anshu.vajpa...@gmail.com > <mailto:anshu.vajpa...@gmail.com>>> wrote: > > > > > > Is there any way to get partition size for a partition > key ? > > > > > > > > > >
Re: Partition size
a question when an equivalent link to > project hosted docs was available) not on who sent it or their employer. > > > I was initially all for the ASF endeavour to counteract DataStax' > > outsized influence on the project, and was hopeful you might achieve > > some positive change. Perhaps you may well still do. But it seems to > > me that the ASF behaviour is beginning to cross from constructive > > criticism of the project participants to prejudicially hostile behaviour > > against certain community members - and that is unlikely to result in a > > better project. > > > > You should be treating everyone consistently, in a manner that promotes > > project health. > > It is not healthy if community members are directing users to 3rd party > documentation in preference to the project's own documentation. If it is > happening because the project's documentation is non-existent / wrong / > poorly written / etc. then that is understandable (and would be an issue > the project needed to address) but that was not the case in this instance. > > There are many aspects to community health. In the grand scheme of > things the single e-mail that started this particular discussion is in > the noise. However, a consistent pattern of such e-mails would be much > more troubling. My intent was to ensure that such a pattern did not form. > > Whether people agree with my response or not, the community is hopefully > more aware of the issue than it was previously. > > Mark > > > > On Friday, 9 September 2016, Mark Thomas > <mailto:ma...@apache.org>> wrote: > > > > On 09/09/2016 16:46, Mark Curtis wrote: > > > If your partition sizes are over 100MB iirc then you'll normally > see > > > warnings in your system.log, this will outline the partition key, > at > > > least in Cassandra 2.0 and 2.1 as I recall. > > > > > > Your best friend here is nodetool cfstats which shows you the > > > min/mean/max partition sizes for your table. It's quite often used > to > > > pinpoint large partitons on nodes in a cluster. > > > > > > More info > > > here: > > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/ > toolsCFstats.html > > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/ > toolsCFstats.html> > > > > Folks, > > > > It is *Apache* Cassandra. If you are going to point to docs, please > > point to the official Apache docs unless there is a very good reason > > not to. > > > > In this case: > > > > http://cassandra.apache.org/doc/latest/configuration/ > cassandra_config_file.html#compaction_large_partition_warning_threshold_mb > > <http://cassandra.apache.org/doc/latest/configuration/ > cassandra_config_file.html#compaction_large_partition_warning_threshold_mb > > > > > > looks to the place. > > > > Mark > > > > > > > > > > Thanks > > > > > > Mark > > > > > > > > > On 9 September 2016 at 02:53, Anshu Vajpayee < > anshu.vajpa...@gmail.com > > > <mailto:anshu.vajpa...@gmail.com>> wrote: > > > > > > Is there any way to get partition size for a partition key ? > > > > > > > > > >
Re: Partition size
info > > here: > > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html > > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html> > > Folks, > > It is *Apache* Cassandra. If you are going to point to docs, please > point to the official Apache docs unless there is a very good reason > not to. > > In this case: > > > http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb > > <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb> > > looks to the place. > > Mark > > > > > > Thanks > > > > Mark > > > > > > On 9 September 2016 at 02:53, Anshu Vajpayee > <mailto:anshu.vajpa...@gmail.com>> wrote: > > > > Is there any way to get partition size for a partition key ? > > > > >
Re: Partition size
I fully agree with Benedict here. I would much prefer to keep this sort of toxic behavior off the ML. People can link to whatever helpful docs / blogs they choose. On Fri, Sep 9, 2016 at 1:12 PM Benedict Elliott Smith wrote: > Come on. This kind of inconsistent 'policing' is not helpful. > > By all means, push the *committers* to improve the project docs as is > happening, and to promote the internal resources over external ones. > > But Mark has absolutely no formal connection with the project, and his > contributions have only been to file a couple of JIRA (all of which have so > far been ignored by those of his colleagues who *are* active community > members, I'll note!). Shaming him for not linking docs that describe > something *other* than what he was even talking about is crossing the > line IMO. > > Linking to third-party resources is commonplace, the only difference I can > see here is that these have been called "docs" by the authors, instead of > a blog post, and Mark has a DataStax email address. > > Would you have reacted this way if Aaron Morton linked a blog post by > thelastpickle? Or a random user posted their own resources? Obviously not. > > I was initially all for the ASF endeavour to counteract DataStax' outsized > influence on the project, and was hopeful you might achieve some positive > change. Perhaps you may well still do. But it seems to me that the ASF > behaviour is beginning to cross from constructive criticism of the project > participants to prejudicially hostile behaviour against certain community > members - and that is unlikely to result in a better project. > > You should be treating everyone consistently, in a manner that promotes > project health. > > > > On Friday, 9 September 2016, Mark Thomas wrote: > >> On 09/09/2016 16:46, Mark Curtis wrote: >> > If your partition sizes are over 100MB iirc then you'll normally see >> > warnings in your system.log, this will outline the partition key, at >> > least in Cassandra 2.0 and 2.1 as I recall. >> > >> > Your best friend here is nodetool cfstats which shows you the >> > min/mean/max partition sizes for your table. It's quite often used to >> > pinpoint large partitons on nodes in a cluster. >> > >> > More info >> > here: >> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html >> >> Folks, >> >> It is *Apache* Cassandra. If you are going to point to docs, please >> point to the official Apache docs unless there is a very good reason not >> to. >> >> In this case: >> >> >> http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb >> >> looks to the place. >> >> Mark >> >> >> > >> > Thanks >> > >> > Mark >> > >> > >> > On 9 September 2016 at 02:53, Anshu Vajpayee > > <mailto:anshu.vajpa...@gmail.com>> wrote: >> > >> > Is there any way to get partition size for a partition key ? >> > >> > >> >>
Re: Partition size
Come on. This kind of inconsistent 'policing' is not helpful. By all means, push the *committers* to improve the project docs as is happening, and to promote the internal resources over external ones. But Mark has absolutely no formal connection with the project, and his contributions have only been to file a couple of JIRA (all of which have so far been ignored by those of his colleagues who *are* active community members, I'll note!). Shaming him for not linking docs that describe something *other* than what he was even talking about is crossing the line IMO. Linking to third-party resources is commonplace, the only difference I can see here is that these have been called "docs" by the authors, instead of a blog post, and Mark has a DataStax email address. Would you have reacted this way if Aaron Morton linked a blog post by thelastpickle? Or a random user posted their own resources? Obviously not. I was initially all for the ASF endeavour to counteract DataStax' outsized influence on the project, and was hopeful you might achieve some positive change. Perhaps you may well still do. But it seems to me that the ASF behaviour is beginning to cross from constructive criticism of the project participants to prejudicially hostile behaviour against certain community members - and that is unlikely to result in a better project. You should be treating everyone consistently, in a manner that promotes project health. On Friday, 9 September 2016, Mark Thomas wrote: > On 09/09/2016 16:46, Mark Curtis wrote: > > If your partition sizes are over 100MB iirc then you'll normally see > > warnings in your system.log, this will outline the partition key, at > > least in Cassandra 2.0 and 2.1 as I recall. > > > > Your best friend here is nodetool cfstats which shows you the > > min/mean/max partition sizes for your table. It's quite often used to > > pinpoint large partitons on nodes in a cluster. > > > > More info > > here: https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/t > oolsCFstats.html > > Folks, > > It is *Apache* Cassandra. If you are going to point to docs, please > point to the official Apache docs unless there is a very good reason not > to. > > In this case: > > http://cassandra.apache.org/doc/latest/configuration/cassand > ra_config_file.html#compaction_large_partition_warning_threshold_mb > > looks to the place. > > Mark > > > > > > Thanks > > > > Mark > > > > > > On 9 September 2016 at 02:53, Anshu Vajpayee > <mailto:anshu.vajpa...@gmail.com>> wrote: > > > > Is there any way to get partition size for a partition key ? > > > > > >
Re: Partition size
On 9/9/16, 12:14 PM, "Mark Thomas" wrote: > If you are going to point to docs, please >point to the official Apache docs unless there is a very good reason not to. > (And if the good reason is that there’s a deficiency in the apache Cassandra docs, please make it known on the list or in a jira so someone can write what’s missing) smime.p7s Description: S/MIME cryptographic signature
Re: Partition size
On 9/9/16, 8:47 AM, "Rakesh Kumar" wrote: >> If your partition sizes are over 100MB iirc then you'll normally see >> warnings in your system.log, this will outline the partition key, at least >> in Cassandra 2.0 and 2.1 as I recall. > >Has it improved in C* 3.x. What is considered a good partition size in C* 3.x In modern versions (2.1 and newer), the “real” risk of large partitions is that they generate a lot of garbage on read – it’s not a 1:1 equivalence, but it’s linear, and a partition that’s 10x as large generates 10x as much garbage. You can tune around it (very large new gen, for example), but it’s best fixed at the data model most of the time. The long term fix will be Cassandra-9754, which is a work in progress. The short term fix for 3.x was http://issues.apache.org/jira/browse/CASSANDRA-11206 , which went into 3.6 and higher In the notes on 11206, you’ll see that Robert Stupp tested up to an 8GB partition – while nobody’s going to recommend you create a data model with 8gb partitions, I imagine you may find partitions in that rough order of magnitude acceptable. smime.p7s Description: S/MIME cryptographic signature
Re: Partition size
On 09/09/2016 16:46, Mark Curtis wrote: > If your partition sizes are over 100MB iirc then you'll normally see > warnings in your system.log, this will outline the partition key, at > least in Cassandra 2.0 and 2.1 as I recall. > > Your best friend here is nodetool cfstats which shows you the > min/mean/max partition sizes for your table. It's quite often used to > pinpoint large partitons on nodes in a cluster. > > More info > here: > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html Folks, It is *Apache* Cassandra. If you are going to point to docs, please point to the official Apache docs unless there is a very good reason not to. In this case: http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb looks to the place. Mark > > Thanks > > Mark > > > On 9 September 2016 at 02:53, Anshu Vajpayee <mailto:anshu.vajpa...@gmail.com>> wrote: > > Is there any way to get partition size for a partition key ? > >
Re: Partition size
On 9 September 2016 at 16:47, Rakesh Kumar wrote: > On Fri, Sep 9, 2016 at 11:46 AM, Mark Curtis > wrote: > > If your partition sizes are over 100MB iirc then you'll normally see > > warnings in your system.log, this will outline the partition key, at > least > > in Cassandra 2.0 and 2.1 as I recall. > > Has it improved in C* 3.x. What is considered a good partition size in C* > 3.x > The 100MB is just a default setting you can set this up or down as you need it: https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__compaction_large_partition_warning_threshold_mb There isn't really a "good" or "bad" value, it all depends on the data model, your query patterns and required response times as to what's acceptable for your application. The 100MB default is just a guide. If you're seeing partitions of 1GB and above then you may very well start to see problems. Again cfstats is your friend here! -Mark
Re: Partition size
On Fri, Sep 9, 2016 at 11:46 AM, Mark Curtis wrote: > If your partition sizes are over 100MB iirc then you'll normally see > warnings in your system.log, this will outline the partition key, at least > in Cassandra 2.0 and 2.1 as I recall. Has it improved in C* 3.x. What is considered a good partition size in C* 3.x
Re: Partition size
If your partition sizes are over 100MB iirc then you'll normally see warnings in your system.log, this will outline the partition key, at least in Cassandra 2.0 and 2.1 as I recall. Your best friend here is nodetool cfstats which shows you the min/mean/max partition sizes for your table. It's quite often used to pinpoint large partitons on nodes in a cluster. More info here: https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html Thanks Mark On 9 September 2016 at 02:53, Anshu Vajpayee wrote: > Is there any way to get partition size for a partition key ? >
Partition size
Is there any way to get partition size for a partition key ?
Estimating partition size for C*2.X and C*3.X and Time Series Data Modelling.
Hello, I'm currently enrolled in a master's degree and my thesis project involves the usage of Big Data tools in the context of Smart Grid applications. I explored sever storage solutions and found Cassandra to be fitting to my problem. The data is mostly Time Series data, incoming from multiple PLCs, currently being captured and stored by a proprietary SCADA software connected to a MSSQL server. Reading into C* storage engine and how Time Series should be modelled, it is inevitable that I have to use a sort of time bucketing for splitting into multiple partitions. Here is the issue, in the MSSQL server, each PLC has very wide tables (5 at the moment for one building) with around 36 columns of data being collected every 10 seconds. Data is being queried as much as 15 columns at a time with time ranges varying between 1 hour and a whole month. A simple mapping of the same tables in MSSQL to C* is not recommended due to the way C*2.X stores its data. I took the DS220: Data Modelling Course, that showcases two formulas for estimating a partition size based on the Table design. [cid:image003.png@01D1CB16.9A41FD30] [cid:image004.png@01D1CB16.9A41FD30] Note: This Ps formula does not account for column name length, TTLs, counter columns, and additional overhead. If my calculations are correct, with a table such as the one below and a the time resolution of 10 seconds, the Ps (Partition Size) would be shy of 10 MB (value often recommended) if I partitioned it weekly. CREATE TABLE TEST ( BuildingAnalyzer text, Time timestamp, P1 double, P2 double, P3 double, Acte1 int, Acte2 int, Acte3 int, PRIMARY KEY (BuildingAnalyzer, Time) ) However, as of C*3.0, a major refactor of the storage engine brought efficiency in storage costs. From what I could gather in [1], clustering columns and column name are no longer repeated for each value in a record and, among other things, the timestamps for conflict resolution (the 8 × Nv of the 2nd formula) can be stored only once per record if they have the same value and are encoded as varints. I also read [2], which explains the storage in intricate detail, adding too much complexity to a simple estimation formula. Is there any way to estimate partition size of a table with similar formulas as the ones above? Should I just model my tables similar to what is done with metric collection (table with columns, "parametername" and "value")? [1]http://www.datastax.com/2015/12/storage-engine-30 [2] http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html Sorry for the long wall of text, Best regards, Gil Pinheiro.
Re: on-disk size vs partition-size in cfhistograms
Hi Joseph, The approach i took was to insert increasing number of rows into a replica > of the table to sized, watch the size of the "data" directory (after doing > nodetool flush and compact), and calculate the average size per row (total > directory size/count of rows). Can this be considered a valid approach to > extrapolate for future growth of data ? You also need to consider the replication factor you are going to use and the percentage of the data this node you are looking at is owning. Also, when you run "nodetool compact" you get the minimal possible size, when in real conditions you probably never will never be in this state. If you update the same row again and again, shards of the row will be spread in multiple sstables, with more overhead. Plus if you plan to TTL data or to delete some, you will always having some tombstones in there too, and maybe for long depending on how you tune Cassandra and on you use case I guess. So I would say this approach is not very accurate. My guess is you will end up using more space than you think. But it is also harder to do capacity planning from nothing than from a working system. It seems the size in cfhisto has a wide variation with the calculated value > using the approach detailed above (avg 2KB/row). Could this difference be > due to compression, or are there any other factors at play here? It could be compression indeed. To check that, you need to dig into the code. What Cassandra version are you planning to use? By the way, If disk space matters to you as it seems to me, you might want to use Cassandra 3.0+: http://www.datastax.com/2015/12/storage-engine-30, http://www.planetcassandra.org/blog/this-week-in-cassandra-3-0-storage-engine-deep-dive-3112016/, http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html . > What would be the typical use/interpretation of the "partition size" > metric. I guess people use that to spot wide rows mainly, but if you are happy summing those, it should be good as long as you know what you are summing. Each Cassandra operator has his tips and own usage of the tools available and might have a distinct way of performing operations depending on its needs and own experience :-). So if it looks relevant to you, go ahead. For example, if you find out that this is the data before compression, then just applying the compression ratio to your sum should be good. Still take care of my first point above. C*heers, --- Alain Rodriguez - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-05-06 13:27 GMT+02:00 Joseph Tech : > Hi, > > I am trying to get some baselines for capacity planning. The approach i > took was to insert increasing number of rows into a replica of the table to > sized, watch the size of the "data" directory (after doing nodetool flush > and compact), and calculate the average size per row (total directory > size/count of rows). Can this be considered a valid approach to extrapolate > for future growth of data ? > > Related to this, is there any information we can gather from > partition-size of cfhistograms (snipped output for my table below) : > > Partition Size (bytes) >642 bytes: 221 >770 bytes: 2328 >924 bytes: 328858 > .. > 8239 bytes: 153178 > ... > 24601 bytes: 16973 > 29521 bytes: 10805 > ... > 219342 bytes: 23 > 263210 bytes: 6 > 315852 bytes: 4 > > It seems the size in cfhisto has a wide variation with the calculated > value using the approach detailed above (avg 2KB/row). Could this > difference be due to compression, or are there any other factors at play > here? . What would be the typical use/interpretation of the "partition > size" metric. > > The table definition is like : > > CREATE TABLE abc ( > key1 text, > col1 text, > PRIMARY KEY ((key1)) > ) WITH > bloom_filter_fp_chance=0.01 AND > caching='KEYS_ONLY' AND > comment='' AND > dclocal_read_repair_chance=0.10 AND > gc_grace_seconds=864000 AND > index_interval=128 AND > read_repair_chance=0.00 AND > replicate_on_write='true' AND > populate_io_cache_on_flush='false' AND > default_time_to_live=0 AND > speculative_retry='99.0PERCENTILE' AND > memtable_flush_period_in_ms=0 AND > compaction={'sstable_size_in_mb': '50', 'class': > 'LeveledCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > > Thanks, > Joseph > > > > >
on-disk size vs partition-size in cfhistograms
Hi, I am trying to get some baselines for capacity planning. The approach i took was to insert increasing number of rows into a replica of the table to sized, watch the size of the "data" directory (after doing nodetool flush and compact), and calculate the average size per row (total directory size/count of rows). Can this be considered a valid approach to extrapolate for future growth of data ? Related to this, is there any information we can gather from partition-size of cfhistograms (snipped output for my table below) : Partition Size (bytes) 642 bytes: 221 770 bytes: 2328 924 bytes: 328858 .. 8239 bytes: 153178 ... 24601 bytes: 16973 29521 bytes: 10805 ... 219342 bytes: 23 263210 bytes: 6 315852 bytes: 4 It seems the size in cfhisto has a wide variation with the calculated value using the approach detailed above (avg 2KB/row). Could this difference be due to compression, or are there any other factors at play here? . What would be the typical use/interpretation of the "partition size" metric. The table definition is like : CREATE TABLE abc ( key1 text, col1 text, PRIMARY KEY ((key1)) ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.10 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.00 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'sstable_size_in_mb': '50', 'class': 'LeveledCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'}; Thanks, Joseph
Re: Data Modeling: Partition Size and Query Efficiency
On Tue, Jan 5, 2016 at 5:52 PM, Jonathan Haddad wrote: > You could keep a "num_buckets" value associated with the client's account, > which can be adjusted accordingly as usage increases. > Yes, but the adjustment problem is tricky when there are multiple concurrent writers. What happens when you change the number of buckets? Does existing data have to be re-written into new buckets? If so, how do you make sure that's only done once for each bucket size increase? Or perhaps I'm misunderstanding your suggestion? Jim > On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona wrote: > >> On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < >> clintlmar...@coolfiretechnologies.com> wrote: >> >>> What sort of data is your clustering key composed of? That might help >>> some in determining a way to achieve what you're looking for. >>> >> Just a UUID that acts as an object identifier. >> >>> >>> Clint >>> On Jan 5, 2016 2:28 PM, "Jim Ancona" wrote: >>> Hi Nate, Yes, I've been thinking about treating customers as either small or big, where "small" ones have a single partition and big ones have 50 (or whatever number I need to keep sizes reasonable). There's still the problem of how to handle a small customer who becomes too big, but that will happen much less frequently than a customer filling a partition. Jim On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall wrote: > >> In this case, 99% of my data could fit in a single 50 MB partition. >> But if I use the standard approach, I have to split my partitions into 50 >> pieces to accommodate the largest data. That means that to query the 700 >> rows for my median case, I have to read 50 partitions instead of one. >> >> If you try to deal with this by starting a new partition when an old >> one fills up, you have a nasty distributed consensus problem, along with >> read-before-write. Cassandra LWT wasn't available the last time I dealt >> with this, but might help with the consensus part today. But there are >> still some nasty corner cases. >> >> I have some thoughts on other ways to solve this, but they all have >> drawbacks. So I thought I'd ask here and hope that someone has a better >> approach. >> >> > Hi Jim - good to see you around again. > > If you can segment this upstream by customer/account/whatever, > handling the outliers as an entirely different code path (potentially > different cluster as the workload will be quite different at that point > and > have different tuning requirements) would be your best bet. Then a > read-before-write makes sense given it is happening on such a small number > of API queries. > > > -- > - > Nate McCall > Austin, TX > @zznate > > Co-Founder & Sr. Technical Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com >
Re: Data Modeling: Partition Size and Query Efficiency
You could keep a "num_buckets" value associated with the client's account, which can be adjusted accordingly as usage increases. On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona wrote: > On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < > clintlmar...@coolfiretechnologies.com> wrote: > >> What sort of data is your clustering key composed of? That might help >> some in determining a way to achieve what you're looking for. >> > Just a UUID that acts as an object identifier. > >> >> Clint >> On Jan 5, 2016 2:28 PM, "Jim Ancona" wrote: >> >>> Hi Nate, >>> >>> Yes, I've been thinking about treating customers as either small or big, >>> where "small" ones have a single partition and big ones have 50 (or >>> whatever number I need to keep sizes reasonable). There's still the problem >>> of how to handle a small customer who becomes too big, but that will happen >>> much less frequently than a customer filling a partition. >>> >>> Jim >>> >>> On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall >>> wrote: >>> > In this case, 99% of my data could fit in a single 50 MB partition. > But if I use the standard approach, I have to split my partitions into 50 > pieces to accommodate the largest data. That means that to query the 700 > rows for my median case, I have to read 50 partitions instead of one. > > If you try to deal with this by starting a new partition when an old > one fills up, you have a nasty distributed consensus problem, along with > read-before-write. Cassandra LWT wasn't available the last time I dealt > with this, but might help with the consensus part today. But there are > still some nasty corner cases. > > I have some thoughts on other ways to solve this, but they all have > drawbacks. So I thought I'd ask here and hope that someone has a better > approach. > > Hi Jim - good to see you around again. If you can segment this upstream by customer/account/whatever, handling the outliers as an entirely different code path (potentially different cluster as the workload will be quite different at that point and have different tuning requirements) would be your best bet. Then a read-before-write makes sense given it is happening on such a small number of API queries. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com >>> >>>
Re: Data Modeling: Partition Size and Query Efficiency
On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < clintlmar...@coolfiretechnologies.com> wrote: > What sort of data is your clustering key composed of? That might help some > in determining a way to achieve what you're looking for. > Just a UUID that acts as an object identifier. > > Clint > On Jan 5, 2016 2:28 PM, "Jim Ancona" wrote: > >> Hi Nate, >> >> Yes, I've been thinking about treating customers as either small or big, >> where "small" ones have a single partition and big ones have 50 (or >> whatever number I need to keep sizes reasonable). There's still the problem >> of how to handle a small customer who becomes too big, but that will happen >> much less frequently than a customer filling a partition. >> >> Jim >> >> On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall >> wrote: >> >>> In this case, 99% of my data could fit in a single 50 MB partition. But if I use the standard approach, I have to split my partitions into 50 pieces to accommodate the largest data. That means that to query the 700 rows for my median case, I have to read 50 partitions instead of one. If you try to deal with this by starting a new partition when an old one fills up, you have a nasty distributed consensus problem, along with read-before-write. Cassandra LWT wasn't available the last time I dealt with this, but might help with the consensus part today. But there are still some nasty corner cases. I have some thoughts on other ways to solve this, but they all have drawbacks. So I thought I'd ask here and hope that someone has a better approach. >>> Hi Jim - good to see you around again. >>> >>> If you can segment this upstream by customer/account/whatever, handling >>> the outliers as an entirely different code path (potentially different >>> cluster as the workload will be quite different at that point and have >>> different tuning requirements) would be your best bet. Then a >>> read-before-write makes sense given it is happening on such a small number >>> of API queries. >>> >>> >>> -- >>> - >>> Nate McCall >>> Austin, TX >>> @zznate >>> >>> Co-Founder & Sr. Technical Consultant >>> Apache Cassandra Consulting >>> http://www.thelastpickle.com >>> >> >>
Re: Data Modeling: Partition Size and Query Efficiency
What sort of data is your clustering key composed of? That might help some in determining a way to achieve what you're looking for. Clint On Jan 5, 2016 2:28 PM, "Jim Ancona" wrote: > Hi Nate, > > Yes, I've been thinking about treating customers as either small or big, > where "small" ones have a single partition and big ones have 50 (or > whatever number I need to keep sizes reasonable). There's still the problem > of how to handle a small customer who becomes too big, but that will happen > much less frequently than a customer filling a partition. > > Jim > > On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall > wrote: > >> >>> In this case, 99% of my data could fit in a single 50 MB partition. But >>> if I use the standard approach, I have to split my partitions into 50 >>> pieces to accommodate the largest data. That means that to query the 700 >>> rows for my median case, I have to read 50 partitions instead of one. >>> >>> If you try to deal with this by starting a new partition when an old one >>> fills up, you have a nasty distributed consensus problem, along with >>> read-before-write. Cassandra LWT wasn't available the last time I dealt >>> with this, but might help with the consensus part today. But there are >>> still some nasty corner cases. >>> >>> I have some thoughts on other ways to solve this, but they all have >>> drawbacks. So I thought I'd ask here and hope that someone has a better >>> approach. >>> >>> >> Hi Jim - good to see you around again. >> >> If you can segment this upstream by customer/account/whatever, handling >> the outliers as an entirely different code path (potentially different >> cluster as the workload will be quite different at that point and have >> different tuning requirements) would be your best bet. Then a >> read-before-write makes sense given it is happening on such a small number >> of API queries. >> >> >> -- >> - >> Nate McCall >> Austin, TX >> @zznate >> >> Co-Founder & Sr. Technical Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> > >
Re: Data Modeling: Partition Size and Query Efficiency
Hi Nate, Yes, I've been thinking about treating customers as either small or big, where "small" ones have a single partition and big ones have 50 (or whatever number I need to keep sizes reasonable). There's still the problem of how to handle a small customer who becomes too big, but that will happen much less frequently than a customer filling a partition. Jim On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall wrote: > >> In this case, 99% of my data could fit in a single 50 MB partition. But >> if I use the standard approach, I have to split my partitions into 50 >> pieces to accommodate the largest data. That means that to query the 700 >> rows for my median case, I have to read 50 partitions instead of one. >> >> If you try to deal with this by starting a new partition when an old one >> fills up, you have a nasty distributed consensus problem, along with >> read-before-write. Cassandra LWT wasn't available the last time I dealt >> with this, but might help with the consensus part today. But there are >> still some nasty corner cases. >> >> I have some thoughts on other ways to solve this, but they all have >> drawbacks. So I thought I'd ask here and hope that someone has a better >> approach. >> >> > Hi Jim - good to see you around again. > > If you can segment this upstream by customer/account/whatever, handling > the outliers as an entirely different code path (potentially different > cluster as the workload will be quite different at that point and have > different tuning requirements) would be your best bet. Then a > read-before-write makes sense given it is happening on such a small number > of API queries. > > > -- > - > Nate McCall > Austin, TX > @zznate > > Co-Founder & Sr. Technical Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com >
Re: Data Modeling: Partition Size and Query Efficiency
Hi Jack, Thanks for your response. My answers inline... On Tue, Jan 5, 2016 at 11:52 AM, Jack Krupansky wrote: > Jim, I don't quite get why you think you would need to query 50 partitions > to return merely hundreds or thousands of rows. Please elaborate. I mean, > sure, for that extreme 100th percentile, yes, you would query a lot of > partitions, but for the 90th percentile it would be just one. Even the 99th > percentile would just be one or at most a few. > Exactly, but, as I mentioned in my email, the normal way of segmenting large partitions is to use some deterministic bucketing mechanism to bucket rows into different partitions. If you know of a way to make the number of buckets vary with the number of rows, I'd love to hear about it. It would help if you could elaborate on the actual access pattern - how > rapidly is the data coming in and from where. You can do just a little more > work at the app level and and use Cassandra more effectively. > The write pattern is batches of inserts/updates mixed with some single row inserts/updates. Not surprisingly, the customers with more data also do more writes. > As always, we look to queries to determine what the Cassandra data model > should look like, so elaborate what your app needs to see. What exactly is > the app querying for - a single key, a slice, or... what? > The use case here is sequential access to some or all or a customer's rows in order to filter based on other criteria. The order doesn't matter much, as long as it's well-defined. > And, as always, you commonly need to store the data in multiple query > tables so that the data model matches the desired query pattern. > > Are the row sizes very dynamic, with some extremely large, or is it just > the number of rows that is making size an issue? > No, row sizes don't vary much, just the number of rows per customer. > > Maybe let the app keep a small cache of active partitions and their > current size so that the app can decide when to switch to a new bucket. Do > a couple of extra queries when a key is not in that cache to determine what > the partition size and count to initialize the cache entry for a key. If > necessary, keep a separate table that tracks the partition size or maybe > just the (rough) row count to use to determine when a new partition is > needed. > I've done almost exactly what you suggest in a previous application. The issue is that the cache of active partitions needs to be consistent for multiple writers and the transition from one bucket to the next really wants to be transactional. Hence my reference to a "nasty distributed consensus problem" and Clint's reference to an "anti-pattern". I'd like to avoid it if I can. Jim > > -- Jack Krupansky > > On Tue, Jan 5, 2016 at 11:07 AM, Jim Ancona wrote: > >> Thanks for responding! >> >> My natural partition key is a customer id. Our customers have widely >> varying amounts of data. Since the vast majority of them have data that's >> small enough to fit in a single partition, I'd like to avoid imposing >> unnecessary overhead on the 99% just to avoid issues with the largest 1%. >> >> The approach to querying across multiple partitions you describe is >> pretty much what I have in mind. The trick is to avoid having to query 50 >> partitions to return a few hundred or thousand rows. >> >> I agree that sequentially filling partitions is something to avoid. >> That's why I'm hoping someone can suggest a good alternative. >> >> Jim >> >> >> >> >> >> On Mon, Jan 4, 2016 at 8:07 PM, Clint Martin < >> clintlmar...@coolfiretechnologies.com> wrote: >> >>> You should endeavor to use a repeatable method of segmenting your data. >>> Swapping partitions every time you "fill one" seems like an anti pattern to >>> me. but I suppose it really depends on what your primary key is. Can you >>> share some more information on this? >>> >>> In the past I have utilized the consistent hash method you described >>> (add an artificial row key segment by modulo some part of the clustering >>> key by a fixed position count) combined with a lazy evaluation cursor. >>> >>> The lazy evaluation cursor essentially is set up to query X number of >>> partitions simultaneously, but to execute those queries only add needed to >>> fill the page size. To perform paging you have to know the last primary key >>> that was returned so you can use that to limit the next iteration. >>> >>> You can trade latency for additional work load by controlling the number >>> of conc
Re: Data Modeling: Partition Size and Query Efficiency
> > > In this case, 99% of my data could fit in a single 50 MB partition. But if > I use the standard approach, I have to split my partitions into 50 pieces > to accommodate the largest data. That means that to query the 700 rows for > my median case, I have to read 50 partitions instead of one. > > If you try to deal with this by starting a new partition when an old one > fills up, you have a nasty distributed consensus problem, along with > read-before-write. Cassandra LWT wasn't available the last time I dealt > with this, but might help with the consensus part today. But there are > still some nasty corner cases. > > I have some thoughts on other ways to solve this, but they all have > drawbacks. So I thought I'd ask here and hope that someone has a better > approach. > > Hi Jim - good to see you around again. If you can segment this upstream by customer/account/whatever, handling the outliers as an entirely different code path (potentially different cluster as the workload will be quite different at that point and have different tuning requirements) would be your best bet. Then a read-before-write makes sense given it is happening on such a small number of API queries. -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Data Modeling: Partition Size and Query Efficiency
Jim, I don't quite get why you think you would need to query 50 partitions to return merely hundreds or thousands of rows. Please elaborate. I mean, sure, for that extreme 100th percentile, yes, you would query a lot of partitions, but for the 90th percentile it would be just one. Even the 99th percentile would just be one or at most a few. It would help if you could elaborate on the actual access pattern - how rapidly is the data coming in and from where. You can do just a little more work at the app level and and use Cassandra more effectively. As always, we look to queries to determine what the Cassandra data model should look like, so elaborate what your app needs to see. What exactly is the app querying for - a single key, a slice, or... what? And, as always, you commonly need to store the data in multiple query tables so that the data model matches the desired query pattern. Are the row sizes very dynamic, with some extremely large, or is it just the number of rows that is making size an issue? Maybe let the app keep a small cache of active partitions and their current size so that the app can decide when to switch to a new bucket. Do a couple of extra queries when a key is not in that cache to determine what the partition size and count to initialize the cache entry for a key. If necessary, keep a separate table that tracks the partition size or maybe just the (rough) row count to use to determine when a new partition is needed. -- Jack Krupansky On Tue, Jan 5, 2016 at 11:07 AM, Jim Ancona wrote: > Thanks for responding! > > My natural partition key is a customer id. Our customers have widely > varying amounts of data. Since the vast majority of them have data that's > small enough to fit in a single partition, I'd like to avoid imposing > unnecessary overhead on the 99% just to avoid issues with the largest 1%. > > The approach to querying across multiple partitions you describe is pretty > much what I have in mind. The trick is to avoid having to query 50 > partitions to return a few hundred or thousand rows. > > I agree that sequentially filling partitions is something to avoid. That's > why I'm hoping someone can suggest a good alternative. > > Jim > > > > > > On Mon, Jan 4, 2016 at 8:07 PM, Clint Martin < > clintlmar...@coolfiretechnologies.com> wrote: > >> You should endeavor to use a repeatable method of segmenting your data. >> Swapping partitions every time you "fill one" seems like an anti pattern to >> me. but I suppose it really depends on what your primary key is. Can you >> share some more information on this? >> >> In the past I have utilized the consistent hash method you described (add >> an artificial row key segment by modulo some part of the clustering key by >> a fixed position count) combined with a lazy evaluation cursor. >> >> The lazy evaluation cursor essentially is set up to query X number of >> partitions simultaneously, but to execute those queries only add needed to >> fill the page size. To perform paging you have to know the last primary key >> that was returned so you can use that to limit the next iteration. >> >> You can trade latency for additional work load by controlling the number >> of concurrent executions you do as the iterating occurs. Or you can >> minimize the work on your cluster by querying each partition one at a time. >> >> Unfortunately due to the artificial partition key segment you cannot >> iterate or page in any particular order...(at least across partitions) >> Unless your hash function can also provide you some ordering guarantees. >> >> It all just depends on your requirements. >> >> Clint >> On Jan 4, 2016 10:13 AM, "Jim Ancona" wrote: >> >>> A problem that I have run into repeatedly when doing schema design is >>> how to control partition size while still allowing for efficient multi-row >>> queries. >>> >>> We want to limit partition size to some number between 10 and 100 >>> megabytes to avoid operational issues. The standard way to do that is to >>> figure out the maximum number of rows that your "natural partition key" >>> will ever need to support and then add an additional artificial partition >>> key that segments the rows sufficiently to get keep the partition size >>> under the maximum. In the case of time series data, this is often done by >>> bucketing by time period, i.e. creating a new partition every minute, hour >>> or day. For non-time series data by doing something like >>> Hash(clustering-key) mod desired-number-of-partitions. >>> >>> In my case, multi-row queries to support a REST API typic
Re: Data Modeling: Partition Size and Query Efficiency
Thanks for responding! My natural partition key is a customer id. Our customers have widely varying amounts of data. Since the vast majority of them have data that's small enough to fit in a single partition, I'd like to avoid imposing unnecessary overhead on the 99% just to avoid issues with the largest 1%. The approach to querying across multiple partitions you describe is pretty much what I have in mind. The trick is to avoid having to query 50 partitions to return a few hundred or thousand rows. I agree that sequentially filling partitions is something to avoid. That's why I'm hoping someone can suggest a good alternative. Jim On Mon, Jan 4, 2016 at 8:07 PM, Clint Martin < clintlmar...@coolfiretechnologies.com> wrote: > You should endeavor to use a repeatable method of segmenting your data. > Swapping partitions every time you "fill one" seems like an anti pattern to > me. but I suppose it really depends on what your primary key is. Can you > share some more information on this? > > In the past I have utilized the consistent hash method you described (add > an artificial row key segment by modulo some part of the clustering key by > a fixed position count) combined with a lazy evaluation cursor. > > The lazy evaluation cursor essentially is set up to query X number of > partitions simultaneously, but to execute those queries only add needed to > fill the page size. To perform paging you have to know the last primary key > that was returned so you can use that to limit the next iteration. > > You can trade latency for additional work load by controlling the number > of concurrent executions you do as the iterating occurs. Or you can > minimize the work on your cluster by querying each partition one at a time. > > Unfortunately due to the artificial partition key segment you cannot > iterate or page in any particular order...(at least across partitions) > Unless your hash function can also provide you some ordering guarantees. > > It all just depends on your requirements. > > Clint > On Jan 4, 2016 10:13 AM, "Jim Ancona" wrote: > >> A problem that I have run into repeatedly when doing schema design is how >> to control partition size while still allowing for efficient multi-row >> queries. >> >> We want to limit partition size to some number between 10 and 100 >> megabytes to avoid operational issues. The standard way to do that is to >> figure out the maximum number of rows that your "natural partition key" >> will ever need to support and then add an additional artificial partition >> key that segments the rows sufficiently to get keep the partition size >> under the maximum. In the case of time series data, this is often done by >> bucketing by time period, i.e. creating a new partition every minute, hour >> or day. For non-time series data by doing something like >> Hash(clustering-key) mod desired-number-of-partitions. >> >> In my case, multi-row queries to support a REST API typically return a >> page of results, where the page size might be anywhere from a few dozen up >> to thousands. For query efficiency I want the average number of rows per >> partition to be large enough that a query can be satisfied by reading a >> small number of partitions--ideally one. >> >> So I want to simultaneously limit the maximum number of rows per >> partition and yet maintain a large enough average number of rows per >> partition to make my queries efficient. But with my data the ratio between >> maximum and average can be very large (up to four orders of magnitude). >> >> Here is an example: >> >> >> Rows per Partition >> >> Partition Size >> >> Mode >> >> 1 >> >> 1 KB >> >> Median >> >> 500 >> >> 500 KB >> >> 90th percentile >> >> 5,000 >> >> 5 MB >> >> 99th percentile >> >> 50,000 >> >> 50 MB >> >> Maximum >> >> 2,500,000 >> >> 2.5 GB >> >> In this case, 99% of my data could fit in a single 50 MB partition. But >> if I use the standard approach, I have to split my partitions into 50 >> pieces to accommodate the largest data. That means that to query the 700 >> rows for my median case, I have to read 50 partitions instead of one. >> >> If you try to deal with this by starting a new partition when an old one >> fills up, you have a nasty distributed consensus problem, along with >> read-before-write. Cassandra LWT wasn't available the last time I dealt >> with this, but might help with the consensus part today. But there are >> still some nasty corner cases. >> >> I have some thoughts on other ways to solve this, but they all have >> drawbacks. So I thought I'd ask here and hope that someone has a better >> approach. >> >> Thanks in advance, >> >> Jim >> >>
Re: Data Modeling: Partition Size and Query Efficiency
You should endeavor to use a repeatable method of segmenting your data. Swapping partitions every time you "fill one" seems like an anti pattern to me. but I suppose it really depends on what your primary key is. Can you share some more information on this? In the past I have utilized the consistent hash method you described (add an artificial row key segment by modulo some part of the clustering key by a fixed position count) combined with a lazy evaluation cursor. The lazy evaluation cursor essentially is set up to query X number of partitions simultaneously, but to execute those queries only add needed to fill the page size. To perform paging you have to know the last primary key that was returned so you can use that to limit the next iteration. You can trade latency for additional work load by controlling the number of concurrent executions you do as the iterating occurs. Or you can minimize the work on your cluster by querying each partition one at a time. Unfortunately due to the artificial partition key segment you cannot iterate or page in any particular order...(at least across partitions) Unless your hash function can also provide you some ordering guarantees. It all just depends on your requirements. Clint On Jan 4, 2016 10:13 AM, "Jim Ancona" wrote: > A problem that I have run into repeatedly when doing schema design is how > to control partition size while still allowing for efficient multi-row > queries. > > We want to limit partition size to some number between 10 and 100 > megabytes to avoid operational issues. The standard way to do that is to > figure out the maximum number of rows that your "natural partition key" > will ever need to support and then add an additional artificial partition > key that segments the rows sufficiently to get keep the partition size > under the maximum. In the case of time series data, this is often done by > bucketing by time period, i.e. creating a new partition every minute, hour > or day. For non-time series data by doing something like > Hash(clustering-key) mod desired-number-of-partitions. > > In my case, multi-row queries to support a REST API typically return a > page of results, where the page size might be anywhere from a few dozen up > to thousands. For query efficiency I want the average number of rows per > partition to be large enough that a query can be satisfied by reading a > small number of partitions--ideally one. > > So I want to simultaneously limit the maximum number of rows per partition > and yet maintain a large enough average number of rows per partition to > make my queries efficient. But with my data the ratio between maximum and > average can be very large (up to four orders of magnitude). > > Here is an example: > > > Rows per Partition > > Partition Size > > Mode > > 1 > > 1 KB > > Median > > 500 > > 500 KB > > 90th percentile > > 5,000 > > 5 MB > > 99th percentile > > 50,000 > > 50 MB > > Maximum > > 2,500,000 > > 2.5 GB > > In this case, 99% of my data could fit in a single 50 MB partition. But if > I use the standard approach, I have to split my partitions into 50 pieces > to accommodate the largest data. That means that to query the 700 rows for > my median case, I have to read 50 partitions instead of one. > > If you try to deal with this by starting a new partition when an old one > fills up, you have a nasty distributed consensus problem, along with > read-before-write. Cassandra LWT wasn't available the last time I dealt > with this, but might help with the consensus part today. But there are > still some nasty corner cases. > > I have some thoughts on other ways to solve this, but they all have > drawbacks. So I thought I'd ask here and hope that someone has a better > approach. > > Thanks in advance, > > Jim > >
Data Modeling: Partition Size and Query Efficiency
A problem that I have run into repeatedly when doing schema design is how to control partition size while still allowing for efficient multi-row queries. We want to limit partition size to some number between 10 and 100 megabytes to avoid operational issues. The standard way to do that is to figure out the maximum number of rows that your "natural partition key" will ever need to support and then add an additional artificial partition key that segments the rows sufficiently to get keep the partition size under the maximum. In the case of time series data, this is often done by bucketing by time period, i.e. creating a new partition every minute, hour or day. For non-time series data by doing something like Hash(clustering-key) mod desired-number-of-partitions. In my case, multi-row queries to support a REST API typically return a page of results, where the page size might be anywhere from a few dozen up to thousands. For query efficiency I want the average number of rows per partition to be large enough that a query can be satisfied by reading a small number of partitions--ideally one. So I want to simultaneously limit the maximum number of rows per partition and yet maintain a large enough average number of rows per partition to make my queries efficient. But with my data the ratio between maximum and average can be very large (up to four orders of magnitude). Here is an example: Rows per Partition Partition Size Mode 1 1 KB Median 500 500 KB 90th percentile 5,000 5 MB 99th percentile 50,000 50 MB Maximum 2,500,000 2.5 GB In this case, 99% of my data could fit in a single 50 MB partition. But if I use the standard approach, I have to split my partitions into 50 pieces to accommodate the largest data. That means that to query the 700 rows for my median case, I have to read 50 partitions instead of one. If you try to deal with this by starting a new partition when an old one fills up, you have a nasty distributed consensus problem, along with read-before-write. Cassandra LWT wasn't available the last time I dealt with this, but might help with the consensus part today. But there are still some nasty corner cases. I have some thoughts on other ways to solve this, but they all have drawbacks. So I thought I'd ask here and hope that someone has a better approach. Thanks in advance, Jim