subject:"Partition size"

Re: Partition size, limits, recommendations for tables where all columns are part of the primary key

2020-06-09 Thread Alex Ott

Hi

Yes, basically rows have no cells as everything is in the partition
key/clustering columns.

You can always look unto the data using the sstabledump (this is for DSE
6.7 that I have running):

 sstabledump ac-1-bti-Data.db
[
  {
"partition" : {
  "key" : [ "977eb1f1-aa5b-11ea-b91a-db426f6f892c",
"977ed900-aa5b-11ea-b91a-db426f6f892c" ],
  "position" : 0
},
"rows" : [
  {
"type" : "row",
"position" : 78,
"clustering" : [ "test", "977ed901-aa5b-11ea-b91a-db426f6f892c" ],
"liveness_info" : { "tstamp" : "2020-06-09T14:14:54.863249Z" },
"cells" : [ ]
  }
]
  }
]

P.S. You can play with your schema, and do some performance tests using the
https://github.com/nosqlbench/


On Tue, Jun 9, 2020 at 3:51 PM Benjamin Christenson <
ben.christen...@kineticdata.com> wrote:

> Hello all, I am doing some data modeling and want to make sure that I
> understand some nuances to cell counts, partition sizes, and related
> recommendations.  Am I correct in my understanding that tables for which
> every column is in the primary key will always have 0 cells?
>
> For example, using https://cql-calculator.herokuapp.com/, I tested the
> following table definition with 100 (1 million) rows per partition and
> an average value size of 255 bytes, and it returned that there were 0 cells
> and the partition took up 32 bytes total:
>   CREATE TABLE IF NOT EXISTS widgets (
> id timeuuid,
> key_id timeuuid,
> parent_id timeuuid,
> value text,
> PRIMARY KEY ((parent_id, key_id), value, id)
>   )
>
> Obviously the total amount of disk space for this table must be more than
> 32 bytes.  In this situation, how should I be reasoning about partition
> sizes (in terms of the 2B cell limit, and 100MB-400MB partition size
> limit)?  Additionally, are there other limits / potential performance
> issues I should be concerned about?
>
> Ben Christenson
> Developer
>
> Kinetic Data, Inc.
> Your business. Your process.
> 651-556-0937  |  ben.christen...@kineticdata.com
> www.kineticdata.com  |  community.kineticdata.com
>
>

-- 
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Partition size, limits, recommendations for tables where all columns are part of the primary key

2020-06-09 Thread Benjamin Christenson

Hello all, I am doing some data modeling and want to make sure that I
understand some nuances to cell counts, partition sizes, and related
recommendations.  Am I correct in my understanding that tables for which
every column is in the primary key will always have 0 cells?

For example, using https://cql-calculator.herokuapp.com/, I tested the
following table definition with 100 (1 million) rows per partition and
an average value size of 255 bytes, and it returned that there were 0 cells
and the partition took up 32 bytes total:
  CREATE TABLE IF NOT EXISTS widgets (
id timeuuid,
key_id timeuuid,
parent_id timeuuid,
value text,
PRIMARY KEY ((parent_id, key_id), value, id)
  )

Obviously the total amount of disk space for this table must be more than
32 bytes.  In this situation, how should I be reasoning about partition
sizes (in terms of the 2B cell limit, and 100MB-400MB partition size
limit)?  Additionally, are there other limits / potential performance
issues I should be concerned about?

Ben Christenson
Developer

Kinetic Data, Inc.
Your business. Your process.
651-556-0937  |  ben.christen...@kineticdata.com
www.kineticdata.com  |  community.kineticdata.com

Re: how to check C* partition size

2018-01-10 Thread Alain RODRIGUEZ

Hello,

You can also graph metrics using Datadog / Grafana or any other monitoring
tool. Look at the max / mean partition size I would say, see:
http://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics.

There is also a metric called 'EstimatedPartitionSizeHistogram' yet it is a
gauge... I am not too sure about how to use this specific metric.

C*heers
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-01-08 16:47 GMT+00:00 Ahmed Eljami :

> >Nodetool tablestats gives you a general idea.
>
> Since C* 3.X :)
>

Re: how to check C* partition size

2018-01-08 Thread Ahmed Eljami

>Nodetool tablestats gives you a general idea.

Since C* 3.X :)

RE: how to check C* partition size

2018-01-08 Thread Meg Mara

Nodetool tablestats gives you a general idea.

Meg Mara

From: Peng Xiao [mailto:2535...@qq.com]
Sent: Sunday, January 07, 2018 9:26 AM
To: user 
Subject: how to check C* partition size

Hi guys,

Could anyone please help on this simple question?
How to check C* partition size and related information.
looks nodetool ring only shows the token distribution.

Thanks

Re: how to check C* partition size

2018-01-07 Thread Jeff Jirsa

nodetool cfstats
nodetool cfhistograms


-- 
Jeff Jirsa


> On Jan 7, 2018, at 7:26 AM, Peng Xiao <2535...@qq.com> wrote:
> 
> Hi guys,
> 
> Could anyone please help on this simple question?
> How to check C* partition size and related information.
> looks nodetool ring only shows the token distribution.
> 
> Thanks

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

how to check C* partition size

2018-01-07 Thread Peng Xiao

Hi guys,


Could anyone please help on this simple question?
How to check C* partition size and related information.
looks nodetool ring only shows the token distribution.


Thanks

Re: effect of partition size

2017-12-11 Thread Jeff Jirsa

Yes, that's LIKELY "better".



On Mon, Dec 11, 2017 at 8:10 AM, Micha  wrote:

> ok, thanks for the answer.
>
> So the better approach here is to adjust the table schema to get the
> partition size to around 100MB max.
> This means using a partition key with multiple parts and making more
> selects instead of one when querying the data (which may increase
> parallelism).
>
>  Michael
>
>
>
>

Re: effect of partition size

2017-12-11 Thread Micha

ok, thanks for the answer.

So the better approach here is to adjust the table schema to get the
partition size to around 100MB max.
This means using a partition key with multiple parts and making more
selects instead of one when querying the data (which may increase
parallelism).

 Michael




-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: effect of partition size

2017-12-11 Thread Jeff Jirsa

There's a few, and there have been various proposals (some in progress) to
deal with them. The two most obvious problems are:

The primary problem for most people is that wide partitions cause JVM heap
pressure on reads (CASSANDRA-11206, CASSANDRA-9754). This is because we
break the wide partitions into 64k chunks for indexing, and then load the
entire index for a partition into memory at once. You 820MB partition would
then create ~12000 index objects, each with 2 clustering keys (start of
index, end of index). When the read is done, the objects are released, and
the JVM has to clean it up - that's expensive (and can lead to GC
pauses). CASSANDRA-11206 lazily loads these objects for 3.6 and higher,
CASSANDRA-9754 will make it a b-tree on disk - look for #9754 in the 4.0
era.  In this category, you can end up with a huge addition to your key
cache that is either immediately invalidated, or invalidates a number of
other rows - key cache is one of the most important caches in cassandra, so
having a huge row wipe it out is bad.

The second problem is repair, both anti-entropy and read repair. The unit
we use for repair is a partition. If you have huge partitions, when you
repair, you repair the whole partition. You've got 820MB of data, but may
100 bytes difference? For anti-entropy repairs right now: we're streaming
820MB-100 bytes of data, and letting compaction clean it up. For
anti-entropy repairs, CASSANDRA-8911 is a proposal to do that more
efficiently. For read repairs: we'll end up reading most of the partition
and sending mutations for the whole thing all at once, which can be a lot
of updates if you're very out of sync.

The typical recommendation is to keep rows around 10-100MB. In your case,
you're ~800. Whether or not that's "too big" is based on your read latency
requirements, read concurrency, and whether or not 800MB is the upper
bound. It may be ok if you're rarely reading it and it doesnt grow. Or it
may be that you're reading it a lot and you need to re-model your data.

On Mon, Dec 11, 2017 at 5:44 AM, Micha  wrote:

> Hi,
>
> What are the effects of large partitions?
>
> I have a few tables which have partitions sizes as:
>
> 95%  24000
> 98%  42000
> 99%  85000
>
> Max  82000
>
>
> So, should I redesign the schema to get this max smaller or doesn't it
> matter much, since 99% of the partitions are <= 85000 ?
>
> Thanks for answering
>  Michael
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

effect of partition size

2017-12-11 Thread Micha

Hi,

What are the effects of large partitions?

I have a few tables which have partitions sizes as:

95%  24000
98%  42000
99%  85000

Max  82000


So, should I redesign the schema to get this max smaller or doesn't it
matter much, since 99% of the partitions are <= 85000 ?

Thanks for answering
 Michael




-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: How to obtain partition size

2017-03-13 Thread Oskar Kjellin

How about this tool?

https://github.com/instaclustr/cassandra-sstable-tools

> On 13 Mar 2017, at 17:56, Artur R  wrote:
> 
> Hello!
> 
> I can't find where C* stores information about partitions size (if stores it 
> at all).
> So, the questions;
> 
>  1. How to obtain the size (in rows or in bytes - doesn't matter) of some 
> particular partition?
> I know that there is system.size_estimates table with mean_partition_size, 
> but it's only mean size among all partitions.
> 
>  2. How to obtain the size of entire table?
> Again, does "mean_partition_size * partitions_count" (fields from 
> system.size_estimates table) == real size of the table?
> 
>  3. Is it possible to obtain size of rows by some clustering key within some 
> partition?
> 
> 
> Maybe one can obtain these information using Java driver or from C* system 
> tables?
> 
> 
>

How to obtain partition size

2017-03-13 Thread Artur R

Hello!

I can't find where C* stores information about partitions size (if stores
it at all).
So, the questions;

 1. How to obtain the size (in rows or in bytes - doesn't matter) of some
particular partition?
I know that there is *system.size_estimates* table with
*mean_partition_size*, but it's only mean size among all partitions.

 2. How to obtain the size of entire table?
Again, does "mean_partition_size * partitions_count" (fields from
*system.size_estimates* table) == real size of the table?

 3. Is it possible to obtain size of rows by some clustering key within
some partition?


Maybe one can obtain these information using Java driver or from C* system
tables?

Re: Metric to monitor partition size

2017-01-13 Thread Bryan Cheng

We're on 2.X so this information may not apply to your version, but you
should see:

1) A log statement upon compaction, like "Writing large partition",
including the primary partition key (see
https://issues.apache.org/jira/browse/CASSANDRA-9643). Configurable
threshold in cassandra.yaml

2) Problematic partition distributions in nodetool cfhistograms, although
without the primary partition key

3) Potentially large partitions in sstables themselves using sstable
parsing utilities. There's also a patch for sstablekeys here, but I've
never used it (https://issues.apache.org/jira/browse/CASSANDRA-8720)

While you _could_  monitor partitions and stop writing to that partition
key when the size reaches a certain threshold (roughly acquired through a
method like above) I'm struggling to think of a case where you'd actually
want to do that- pushing partitions to some maximum size is generally not a
great idea. Ideally you'd want your partitions as small as you can manage
them without making your queries absolutely neurotic.

On Thu, Jan 12, 2017 at 6:08 AM, Saumitra S 
wrote:

> Is there any metric or way to find out if any partition has grown beyond a
> certain size or certain row count?
>
> If a partition reaches a certain size or limit, I want to stop sending
> further write requests to it. Is it possible?
>
>
>

Metric to monitor partition size

2017-01-12 Thread Saumitra S

Is there any metric or way to find out if any partition has grown beyond a
certain size or certain row count?

If a partition reaches a certain size or limit, I want to stop sending
further write requests to it. Is it possible?

Partition size estimation formula in 3.0

2016-09-19 Thread Jérôme Mainaud

Hello,

Until 3.0, we had a nice formula to estimate partition size :

  sizeof(partition keys)
+ sizeof(static columns)
+ countof(rows) * sizeof(regular columns)
+ countof(rows) * countof(regular columns) * sizeof(clustering columns)
+ 8 * count(values in partition)

With the 3.0 storage engine, the size is supposed to be smaller.
And I'm looking for the new formula.

I reckon the formula to become :

  sizeof(partition keys)
+ sizeof(static columns)
+ countof(rows) * sizeof(regular columns)
+ countof(rows) * sizeof(clustering columns)
+ 8 * count(values in partition)

That is the clustering column values are no more repeated for each regular
column in the row.

Could anyone confirm me that new formula or am I missing something ?

Thank you,

-- 
Jérôme Mainaud
jer...@mainaud.com

Re: Partition size

2016-09-12 Thread Jeremy Hanna

Generally if you foresee the partitions getting out of control in terms of 
size, a method often employed is to bucket according to some criteria.  For 
example, if I have a time series use case, I might bucket by month or week.  
That presumes you can foresee it though.  As far as limiting that capability, I 
can see that being in the ballpark of 
https://issues.apache.org/jira/browse/CASSANDRA-8303 
<https://issues.apache.org/jira/browse/CASSANDRA-8303> but a bit trickier than 
the limits mentioned in that ticket.

> On Sep 12, 2016, at 12:17 PM, Anshu Vajpayee  wrote:
> 
> Thanks Jeff.  I got the answer now. 
> Is there any way to put guardrail  to avoid large partition from cassandra 
> side?  I know it is modeling problem and cassandra writes warning on system. 
> log for large partition.  But I think there should be a way to put 
> restriction for it from Cassandra side. 
> 
> On 12 Sep 2016 9:50 p.m., "Jeff Jirsa"  <mailto:jji...@apache.org>> wrote:
> On 2016-09-08 18:53 (-0700), Anshu Vajpayee  <mailto:anshu.vajpa...@gmail.com>> wrote:
> > Is there any way to get partition size for a  partition key ?
> >
> 
> Anshu,
> 
> The simple answer to your question is that it is not currently possible to 
> get a partition size for an arbitrary key without quite a lot of work 
> (basically you'd have to write a tool that iterated over the data on disk, 
> which is nontrivial).
> 
> There exists a ticket to expose this: 
> https://issues.apache.org/jira/browse/CASSANDRA-12367 
> <https://issues.apache.org/jira/browse/CASSANDRA-12367>
> 
> It's not clear when that ticket will land, but I expect you'll see an API for 
> getting the size of a partition key in the near future.
> 
>

Re: Partition size

2016-09-12 Thread Jeff Jirsa

On 2016-09-12 10:17 (-0700), Anshu Vajpayee  wrote: 
> Thanks Jeff.  I got the answer now.
> Is there any way to put guardrail  to avoid large partition from cassandra
> side?  I know it is modeling problem and cassandra writes warning on
> system. log for large partition.  But I think there should be a way to put
> restriction for it from Cassandra side.

Perhaps not surprisingly, folks active in the other ticket (for determining 
partition size) also have a ticket to blacklist large partitions:

https://issues.apache.org/jira/browse/CASSANDRA-12106

Again, not complete, but it's an active topic of discussion and may appear in 
future versions. In the mean time, having your application maintain a list of 
'blacklisted' partitions may be a suitable workaround.

Re: Partition size

2016-09-12 Thread Anshu Vajpayee

Thanks Jeff.  I got the answer now.
Is there any way to put guardrail  to avoid large partition from cassandra
side?  I know it is modeling problem and cassandra writes warning on
system. log for large partition.  But I think there should be a way to put
restriction for it from Cassandra side.
On 12 Sep 2016 9:50 p.m., "Jeff Jirsa"  wrote:

> On 2016-09-08 18:53 (-0700), Anshu Vajpayee 
> wrote:
> > Is there any way to get partition size for a  partition key ?
> >
>
> Anshu,
>
> The simple answer to your question is that it is not currently possible to
> get a partition size for an arbitrary key without quite a lot of work
> (basically you'd have to write a tool that iterated over the data on disk,
> which is nontrivial).
>
> There exists a ticket to expose this: https://issues.apache.org/
> jira/browse/CASSANDRA-12367
>
> It's not clear when that ticket will land, but I expect you'll see an API
> for getting the size of a partition key in the near future.
>
>
>

Re: Partition size

2016-09-12 Thread San Luoji

h. In the grand scheme of
> things the single e-mail that started this particular discussion is in
> the noise. However, a consistent pattern of such e-mails would be much
> more troubling. My intent was to ensure that such a pattern did not form.
>
> Whether people agree with my response or not, the community is hopefully
> more aware of the issue than it was previously.
>
> Mark
>
>
> > On Friday, 9 September 2016, Mark Thomas  > <mailto:ma...@apache.org>> wrote:
> >
> > On 09/09/2016 16:46, Mark Curtis wrote:
> > > If your partition sizes are over 100MB iirc then you'll normally
> see
> > > warnings in your system.log, this will outline the partition key,
> at
> > > least in Cassandra 2.0 and 2.1 as I recall.
> > >
> > > Your best friend here is nodetool cfstats which shows you the
> > > min/mean/max partition sizes for your table. It's quite often used
> to
> > > pinpoint large partitons on nodes in a cluster.
> > >
> > > More info
> > > here:
> > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html
> > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html>
> >
> > Folks,
> >
> > It is *Apache* Cassandra. If you are going to point to docs, please
> > point to the official Apache docs unless there is a very good reason
> > not to.
> >
> > In this case:
> >
> > http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> > <http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >
> >
> > looks to the place.
> >
> > Mark
> >
> >
> > >
> > > Thanks
> > >
> > > Mark
> > >
> > >
> > > On 9 September 2016 at 02:53, Anshu Vajpayee <
> anshu.vajpa...@gmail.com
> > > <mailto:anshu.vajpa...@gmail.com>> wrote:
> > >
> > > Is there any way to get partition size for a  partition key ?
> > >
> > >
> >
>
>

Re: Partition size

2016-09-12 Thread Jeff Jirsa

On 2016-09-08 18:53 (-0700), Anshu Vajpayee  wrote: 
> Is there any way to get partition size for a  partition key ?
> 

Anshu,

The simple answer to your question is that it is not currently possible to get 
a partition size for an arbitrary key without quite a lot of work (basically 
you'd have to write a tool that iterated over the data on disk, which is 
nontrivial).

There exists a ticket to expose this: 
https://issues.apache.org/jira/browse/CASSANDRA-12367

It's not clear when that ticket will land, but I expect you'll see an API for 
getting the size of a partition key in the near future.

Re: Partition size

2016-09-12 Thread Edward Capriolo

uthor to see if they'd
>> be
>> > willing to contribute something to the docs.
>> >
>> > > Would you have reacted this way if Aaron Morton linked a blog
>> post by
>> > > thelastpickle?  Or a random user posted their own resources?
>> Obviously not.
>> >
>> > Wrong. My reaction was based on the content of the message (a link
>> to
>> > 3rd party docs in response to a question when an equivalent link to
>> > project hosted docs was available) not on who sent it or their
>> employer.
>> >
>> > > I was initially all for the ASF endeavour to counteract DataStax'
>> > > outsized influence on the project, and was hopeful you might
>> achieve
>> > > some positive change.  Perhaps you may well still do.  But it
>> seems to
>> > > me that the ASF behaviour is beginning to cross from constructive
>> > > criticism of the project participants to prejudicially hostile
>> behaviour
>> > > against certain community members - and that is unlikely to
>> result in a
>> > > better project.
>> > >
>> > > You should be treating everyone consistently, in a manner that
>> promotes
>> > > project health.
>> >
>> > It is not healthy if community members are directing users to 3rd
>> party
>> > documentation in preference to the project's own documentation. If
>> it is
>> > happening because the project's documentation is non-existent /
>> wrong /
>> > poorly written / etc. then that is understandable (and would be an
>> issue
>> > the project needed to address) but that was not the case in this
>> > instance.
>> >
>> > There are many aspects to community health. In the grand scheme of
>> > things the single e-mail that started this particular discussion is
>> in
>> > the noise. However, a consistent pattern of such e-mails would be
>> much
>> > more troubling. My intent was to ensure that such a pattern did not
>> > form.
>> >
>> > Whether people agree with my response or not, the community is
>> hopefully
>> > more aware of the issue than it was previously.
>> >
>> > Mark
>> >
>> >
>> > > On Friday, 9 September 2016, Mark Thomas > <mailto:ma...@apache.org>
>> > > <mailto:ma...@apache.org <mailto:ma...@apache.org>>> wrote:
>> > >
>> > > On 09/09/2016 16:46, Mark Curtis wrote:
>> > > > If your partition sizes are over 100MB iirc then you'll
>> > normally see
>> > > > warnings in your system.log, this will outline the partition
>> > key, at
>> > > > least in Cassandra 2.0 and 2.1 as I recall.
>> > > >
>> > > > Your best friend here is nodetool cfstats which shows you
>> the
>> > > > min/mean/max partition sizes for your table. It's quite
>> > often used to
>> > > > pinpoint large partitons on nodes in a cluster.
>> > > >
>> > > > More info
>> > > > here:
>> > >
>> >  https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/t
>> oolsCFstats.html
>> > <https://docs.datastax.com/en/cassandra/2.1/cassandra/
>> tools/toolsCFstats.html>
>> > >
>> >  <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
>> toolsCFstats.html
>> > <https://docs.datastax.com/en/cassandra/2.1/cassandra/
>> tools/toolsCFstats.html>>
>> > >
>> > > Folks,
>> > >
>> > > It is *Apache* Cassandra. If you are going to point to docs,
>> > please
>> > > point to the official Apache docs unless there is a very good
>> > reason
>> > > not to.
>> > >
>> > > In this case:
>> > >
>> > >
>> >  http://cassandra.apache.org/doc/latest/configuration/cassand
>> ra_config_file.html#compaction_large_partition_warning_threshold_mb
>> > <http://cassandra.apache.org/doc/latest/configuration/cassa
>> ndra_config_file.html#compaction_large_partition_warning_threshold_mb>
>> > >
>> >  <http://cassandra.apache.org/doc/latest/configuration/cassan
>> dra_config_file.html#compaction_large_partition_warning_threshold_mb
>> > <http://cassandra.apache.org/doc/latest/configuration/cassa
>> ndra_config_file.html#compaction_large_partition_warning_threshold_mb>>
>> > >
>> > > looks to the place.
>> > >
>> > > Mark
>> > >
>> > >
>> > > >
>> > > > Thanks
>> > > >
>> > > > Mark
>> > > >
>> > > >
>> > > > On 9 September 2016 at 02:53, Anshu Vajpayee
>> > mailto:anshu.vajpa...@gmail.com>
>> > > > <mailto:anshu.vajpa...@gmail.com
>> > <mailto:anshu.vajpa...@gmail.com>>> wrote:
>> > > >
>> > > > Is there any way to get partition size for a  partition
>> > key ?
>> > > >
>> > > >
>> > >
>> >
>> >
>>
>>
>

Re: Partition size

2016-09-12 Thread Benedict Elliott Smith

r project.
> > >
> > > You should be treating everyone consistently, in a manner that
> promotes
> > > project health.
> >
> > It is not healthy if community members are directing users to 3rd
> party
> > documentation in preference to the project's own documentation. If
> it is
> > happening because the project's documentation is non-existent /
> wrong /
> > poorly written / etc. then that is understandable (and would be an
> issue
> > the project needed to address) but that was not the case in this
> > instance.
> >
> > There are many aspects to community health. In the grand scheme of
> > things the single e-mail that started this particular discussion is
> in
> > the noise. However, a consistent pattern of such e-mails would be
> much
> > more troubling. My intent was to ensure that such a pattern did not
> > form.
> >
> > Whether people agree with my response or not, the community is
> hopefully
> > more aware of the issue than it was previously.
> >
> > Mark
> >
> >
> > > On Friday, 9 September 2016, Mark Thomas  <mailto:ma...@apache.org>
> > > <mailto:ma...@apache.org <mailto:ma...@apache.org>>> wrote:
> > >
> > > On 09/09/2016 16:46, Mark Curtis wrote:
> > > > If your partition sizes are over 100MB iirc then you'll
> > normally see
> > > > warnings in your system.log, this will outline the partition
> > key, at
> > > > least in Cassandra 2.0 and 2.1 as I recall.
> > > >
> > > > Your best friend here is nodetool cfstats which shows you the
> > > > min/mean/max partition sizes for your table. It's quite
> > often used to
> > > > pinpoint large partitons on nodes in a cluster.
> > > >
> > > > More info
> > > > here:
> > >
> >  https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html
> > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html>
> > >
> >  <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html
> > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html>>
> > >
> > > Folks,
> > >
> > > It is *Apache* Cassandra. If you are going to point to docs,
> > please
> > > point to the official Apache docs unless there is a very good
> > reason
> > > not to.
> > >
> > > In this case:
> > >
> > >
> >  http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> > <http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >
> > >
> >  <http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> > <http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >>
> > >
> > > looks to the place.
> > >
> > > Mark
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > Mark
> > > >
> > > >
> > > > On 9 September 2016 at 02:53, Anshu Vajpayee
> > mailto:anshu.vajpa...@gmail.com>
> > > > <mailto:anshu.vajpa...@gmail.com
> > <mailto:anshu.vajpa...@gmail.com>>> wrote:
> > > >
> > > > Is there any way to get partition size for a  partition
> > key ?
> > > >
> > > >
> > >
> >
> >
>
>

Re: Partition size

2016-09-12 Thread Mark Thomas

t; >
> > On 09/09/2016 16:46, Mark Curtis wrote:
> > > If your partition sizes are over 100MB iirc then you'll
> normally see
> > > warnings in your system.log, this will outline the partition
> key, at
> > > least in Cassandra 2.0 and 2.1 as I recall.
> > >
> > > Your best friend here is nodetool cfstats which shows you the
> > > min/mean/max partition sizes for your table. It's quite
> often used to
> > > pinpoint large partitons on nodes in a cluster.
> > >
> > > More info
> > > here:
> >   
>  
> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html
> 
> <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html>
> >   
>  
> <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html
> 
> <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html>>
> >
> > Folks,
> >
> > It is *Apache* Cassandra. If you are going to point to docs,
> please
> > point to the official Apache docs unless there is a very good
> reason
> > not to.
> >
> > In this case:
> >
> >   
>  
> http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> 
> <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb>
> >   
>  
> <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> 
> <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb>>
> >
> > looks to the place.
> >
> > Mark
> >
> >
> > >
> > > Thanks
> > >
> > > Mark
> > >
> > >
> > > On 9 September 2016 at 02:53, Anshu Vajpayee
> mailto:anshu.vajpa...@gmail.com>
> > > <mailto:anshu.vajpa...@gmail.com
> <mailto:anshu.vajpa...@gmail.com>>> wrote:
> > >
> > > Is there any way to get partition size for a  partition
> key ?
> > >
> > >
> >
> 
>

Re: Partition size

2016-09-12 Thread Benedict Elliott Smith

a question when an equivalent link to
> project hosted docs was available) not on who sent it or their employer.
>
> > I was initially all for the ASF endeavour to counteract DataStax'
> > outsized influence on the project, and was hopeful you might achieve
> > some positive change.  Perhaps you may well still do.  But it seems to
> > me that the ASF behaviour is beginning to cross from constructive
> > criticism of the project participants to prejudicially hostile behaviour
> > against certain community members - and that is unlikely to result in a
> > better project.
> >
> > You should be treating everyone consistently, in a manner that promotes
> > project health.
>
> It is not healthy if community members are directing users to 3rd party
> documentation in preference to the project's own documentation. If it is
> happening because the project's documentation is non-existent / wrong /
> poorly written / etc. then that is understandable (and would be an issue
> the project needed to address) but that was not the case in this instance.
>
> There are many aspects to community health. In the grand scheme of
> things the single e-mail that started this particular discussion is in
> the noise. However, a consistent pattern of such e-mails would be much
> more troubling. My intent was to ensure that such a pattern did not form.
>
> Whether people agree with my response or not, the community is hopefully
> more aware of the issue than it was previously.
>
> Mark
>
>
> > On Friday, 9 September 2016, Mark Thomas  > <mailto:ma...@apache.org>> wrote:
> >
> > On 09/09/2016 16:46, Mark Curtis wrote:
> > > If your partition sizes are over 100MB iirc then you'll normally
> see
> > > warnings in your system.log, this will outline the partition key,
> at
> > > least in Cassandra 2.0 and 2.1 as I recall.
> > >
> > > Your best friend here is nodetool cfstats which shows you the
> > > min/mean/max partition sizes for your table. It's quite often used
> to
> > > pinpoint large partitons on nodes in a cluster.
> > >
> > > More info
> > > here:
> > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html
> > <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html>
> >
> > Folks,
> >
> > It is *Apache* Cassandra. If you are going to point to docs, please
> > point to the official Apache docs unless there is a very good reason
> > not to.
> >
> > In this case:
> >
> > http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> > <http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >
> >
> > looks to the place.
> >
> > Mark
> >
> >
> > >
> > > Thanks
> > >
> > > Mark
> > >
> > >
> > > On 9 September 2016 at 02:53, Anshu Vajpayee <
> anshu.vajpa...@gmail.com
> > > <mailto:anshu.vajpa...@gmail.com>> wrote:
> > >
> > > Is there any way to get partition size for a  partition key ?
> > >
> > >
> >
>
>

Re: Partition size

2016-09-12 Thread Mark Thomas

 info
> > here:
> 
> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html
> 
> <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html>
> 
> Folks,
> 
> It is *Apache* Cassandra. If you are going to point to docs, please
> point to the official Apache docs unless there is a very good reason
> not to.
> 
> In this case:
> 
> 
> http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> 
> <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb>
> 
> looks to the place.
> 
> Mark
> 
> 
> >
> > Thanks
> >
> > Mark
> >
> >
> > On 9 September 2016 at 02:53, Anshu Vajpayee  > <mailto:anshu.vajpa...@gmail.com>> wrote:
> >
> > Is there any way to get partition size for a  partition key ?
> >
> >
>

Re: Partition size

2016-09-09 Thread Jonathan Haddad

I fully agree with Benedict here.  I would much prefer to keep this sort of
toxic behavior off the ML.  People can link to whatever helpful docs /
blogs they choose.

On Fri, Sep 9, 2016 at 1:12 PM Benedict Elliott Smith 
wrote:

> Come on. This kind of inconsistent 'policing' is not helpful.
>
> By all means, push the *committers* to improve the project docs as is
> happening, and to promote the internal resources over external ones.
>
> But Mark has absolutely no formal connection with the project, and his
> contributions have only been to file a couple of JIRA (all of which have so
> far been ignored by those of his colleagues who *are* active community
> members, I'll note!).  Shaming him for not linking docs that describe
> something *other* than what he was even talking about is crossing the
> line IMO.
>
> Linking to third-party resources is commonplace, the only difference I can
> see here is that these have been called "docs"  by the authors, instead of
> a blog post, and Mark has a DataStax email address.
>
> Would you have reacted this way if Aaron Morton linked a blog post by
> thelastpickle?  Or a random user posted their own resources?  Obviously not.
>
> I was initially all for the ASF endeavour to counteract DataStax' outsized
> influence on the project, and was hopeful you might achieve some positive
> change.  Perhaps you may well still do.  But it seems to me that the ASF
> behaviour is beginning to cross from constructive criticism of the project
> participants to prejudicially hostile behaviour against certain community
> members - and that is unlikely to result in a better project.
>
> You should be treating everyone consistently, in a manner that promotes
> project health.
>
>
>
> On Friday, 9 September 2016, Mark Thomas  wrote:
>
>> On 09/09/2016 16:46, Mark Curtis wrote:
>> > If your partition sizes are over 100MB iirc then you'll normally see
>> > warnings in your system.log, this will outline the partition key, at
>> > least in Cassandra 2.0 and 2.1 as I recall.
>> >
>> > Your best friend here is nodetool cfstats which shows you the
>> > min/mean/max partition sizes for your table. It's quite often used to
>> > pinpoint large partitons on nodes in a cluster.
>> >
>> > More info
>> > here:
>> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html
>>
>> Folks,
>>
>> It is *Apache* Cassandra. If you are going to point to docs, please
>> point to the official Apache docs unless there is a very good reason not
>> to.
>>
>> In this case:
>>
>>
>> http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
>>
>> looks to the place.
>>
>> Mark
>>
>>
>> >
>> > Thanks
>> >
>> > Mark
>> >
>> >
>> > On 9 September 2016 at 02:53, Anshu Vajpayee > > <mailto:anshu.vajpa...@gmail.com>> wrote:
>> >
>> > Is there any way to get partition size for a  partition key ?
>> >
>> >
>>
>>

Re: Partition size

2016-09-09 Thread Benedict Elliott Smith

Come on. This kind of inconsistent 'policing' is not helpful.

By all means, push the *committers* to improve the project docs as is
happening, and to promote the internal resources over external ones.

But Mark has absolutely no formal connection with the project, and his
contributions have only been to file a couple of JIRA (all of which have so
far been ignored by those of his colleagues who *are* active community
members, I'll note!).  Shaming him for not linking docs that describe
something *other* than what he was even talking about is crossing the line
IMO.

Linking to third-party resources is commonplace, the only difference I can
see here is that these have been called "docs"  by the authors, instead of
a blog post, and Mark has a DataStax email address.

Would you have reacted this way if Aaron Morton linked a blog post by
thelastpickle?  Or a random user posted their own resources?  Obviously not.

I was initially all for the ASF endeavour to counteract DataStax' outsized
influence on the project, and was hopeful you might achieve some positive
change.  Perhaps you may well still do.  But it seems to me that the ASF
behaviour is beginning to cross from constructive criticism of the project
participants to prejudicially hostile behaviour against certain community
members - and that is unlikely to result in a better project.

You should be treating everyone consistently, in a manner that promotes
project health.

On Friday, 9 September 2016, Mark Thomas  wrote:

> On 09/09/2016 16:46, Mark Curtis wrote:
> > If your partition sizes are over 100MB iirc then you'll normally see
> > warnings in your system.log, this will outline the partition key, at
> > least in Cassandra 2.0 and 2.1 as I recall.
> >
> > Your best friend here is nodetool cfstats which shows you the
> > min/mean/max partition sizes for your table. It's quite often used to
> > pinpoint large partitons on nodes in a cluster.
> >
> > More info
> > here: https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/t
> oolsCFstats.html
>
> Folks,
>
> It is *Apache* Cassandra. If you are going to point to docs, please
> point to the official Apache docs unless there is a very good reason not
> to.
>
> In this case:
>
> http://cassandra.apache.org/doc/latest/configuration/cassand
> ra_config_file.html#compaction_large_partition_warning_threshold_mb
>
> looks to the place.
>
> Mark
>
>
> >
> > Thanks
> >
> > Mark
> >
> >
> > On 9 September 2016 at 02:53, Anshu Vajpayee  > <mailto:anshu.vajpa...@gmail.com>> wrote:
> >
> > Is there any way to get partition size for a  partition key ?
> >
> >
>
>

Re: Partition size

2016-09-09 Thread Jeff Jirsa



On 9/9/16, 12:14 PM, "Mark Thomas"  wrote:

> If you are going to point to docs, please
>point to the official Apache docs unless there is a very good reason not to.
>

(And if the good reason is that there’s a deficiency in the apache Cassandra 
docs, please make it known on the list or in a jira so someone can write what’s 
missing)




smime.p7s
Description: S/MIME cryptographic signature

Re: Partition size

2016-09-09 Thread Jeff Jirsa

On 9/9/16, 8:47 AM, "Rakesh Kumar"  wrote:

>> If your partition sizes are over 100MB iirc then you'll normally see
>> warnings in your system.log, this will outline the partition key, at least
>> in Cassandra 2.0 and 2.1 as I recall.
>
>Has it improved in C* 3.x. What is considered a good partition size in C* 3.x

In modern versions (2.1 and newer), the “real” risk of large partitions is that 
they generate a lot of garbage on read – it’s not a 1:1 equivalence, but it’s 
linear, and a partition that’s 10x as large generates 10x as much garbage.

You can tune around it (very large new gen, for example), but it’s best fixed 
at the data model most of the time.

The long term fix will be Cassandra-9754, which is a work in progress. The 
short term fix for 3.x was http://issues.apache.org/jira/browse/CASSANDRA-11206 
, which went into 3.6 and higher

In the notes on 11206, you’ll see that Robert Stupp tested up to an 8GB 
partition – while nobody’s going to recommend you create a data model with 8gb 
partitions, I imagine you may find partitions in that rough order of magnitude 
acceptable.

smime.p7s
Description: S/MIME cryptographic signature

Re: Partition size

2016-09-09 Thread Mark Thomas

On 09/09/2016 16:46, Mark Curtis wrote:
> If your partition sizes are over 100MB iirc then you'll normally see
> warnings in your system.log, this will outline the partition key, at
> least in Cassandra 2.0 and 2.1 as I recall.
> 
> Your best friend here is nodetool cfstats which shows you the
> min/mean/max partition sizes for your table. It's quite often used to
> pinpoint large partitons on nodes in a cluster.
> 
> More info
> here: 
> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html

Folks,

It is *Apache* Cassandra. If you are going to point to docs, please
point to the official Apache docs unless there is a very good reason not to.

In this case:

http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb

looks to the place.

Mark


> 
> Thanks
> 
> Mark
> 
> 
> On 9 September 2016 at 02:53, Anshu Vajpayee  <mailto:anshu.vajpa...@gmail.com>> wrote:
> 
> Is there any way to get partition size for a  partition key ?
> 
>

Re: Partition size

2016-09-09 Thread Mark Curtis

On 9 September 2016 at 16:47, Rakesh Kumar 
wrote:

> On Fri, Sep 9, 2016 at 11:46 AM, Mark Curtis 
> wrote:
> > If your partition sizes are over 100MB iirc then you'll normally see
> > warnings in your system.log, this will outline the partition key, at
> least
> > in Cassandra 2.0 and 2.1 as I recall.
>
> Has it improved in C* 3.x. What is considered a good partition size in C*
> 3.x
>

The 100MB is just a default setting you can set this up or down as you need
it:

https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__compaction_large_partition_warning_threshold_mb

There isn't really a "good" or "bad" value, it all depends on the data
model, your query patterns and required response times as to what's
acceptable for your application. The 100MB default is just a guide.

If you're seeing partitions of 1GB and above then you may very well start
to see problems. Again cfstats is your friend here!

-Mark

Re: Partition size

2016-09-09 Thread Rakesh Kumar

On Fri, Sep 9, 2016 at 11:46 AM, Mark Curtis  wrote:
> If your partition sizes are over 100MB iirc then you'll normally see
> warnings in your system.log, this will outline the partition key, at least
> in Cassandra 2.0 and 2.1 as I recall.

Has it improved in C* 3.x. What is considered a good partition size in C* 3.x

Re: Partition size

2016-09-09 Thread Mark Curtis

If your partition sizes are over 100MB iirc then you'll normally see
warnings in your system.log, this will outline the partition key, at least
in Cassandra 2.0 and 2.1 as I recall.

Your best friend here is nodetool cfstats which shows you the min/mean/max
partition sizes for your table. It's quite often used to pinpoint large
partitons on nodes in a cluster.

More info here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html

Thanks

Mark

On 9 September 2016 at 02:53, Anshu Vajpayee 
wrote:

> Is there any way to get partition size for a  partition key ?
>

Partition size

2016-09-08 Thread Anshu Vajpayee

Is there any way to get partition size for a  partition key ?

Estimating partition size for C2.X and C3.X and Time Series Data Modelling.

2016-06-20 Thread G P

Hello,

I'm currently enrolled in a master's degree and my thesis project involves the
usage of Big Data tools in the context of Smart Grid applications. I explored
sever storage solutions and found Cassandra to be fitting to my problem.
The data is mostly Time Series data, incoming from multiple PLCs, currently
being captured and stored by a proprietary SCADA software connected to a MSSQL
server. Reading into C* storage engine and how Time Series should be modelled,
it is inevitable that I have to use a sort of time bucketing for splitting into
multiple partitions.

Here is the issue, in the MSSQL server, each PLC has very wide tables (5 at the
moment for one building) with around 36 columns of data being collected every
10 seconds. Data is being queried as much as 15 columns at a time with time
ranges varying between 1 hour and a whole month. A simple mapping of the same
tables in MSSQL to C* is not recommended due to the way C*2.X stores its data.

I took the DS220: Data Modelling Course, that showcases two formulas for
estimating a partition size based on the Table design.

[cid:image003.png@01D1CB16.9A41FD30]
[cid:image004.png@01D1CB16.9A41FD30]
Note: This Ps formula does not account for column name length, TTLs, counter
columns, and additional overhead.

If my calculations are correct, with a table such as the one below and a the
time resolution of 10 seconds, the Ps (Partition Size) would be shy of 10 MB
(value often recommended) if I partitioned it weekly.

CREATE TABLE TEST (
BuildingAnalyzer text,
Time timestamp,
P1 double,
P2 double,
P3 double,
Acte1 int,
Acte2 int,
Acte3 int,
PRIMARY KEY (BuildingAnalyzer, Time)
)

However, as of C*3.0, a major refactor of the storage engine brought efficiency
in storage costs. From what I could gather in [1], clustering columns and
column name are no longer repeated for each value in a record and, among other
things, the timestamps for conflict resolution (the 8 × Nv of the 2nd formula)
can be stored only once per record if they have the same value and are encoded
as varints.

I also read [2], which explains the storage in intricate detail, adding too
much complexity to a simple estimation formula.

Is there any way to estimate partition size of a table with similar formulas as
the ones above?
Should I just model my tables similar to what is done with metric collection
(table with columns, "parametername" and "value")?

[1]http://www.datastax.com/2015/12/storage-engine-30

[2]
http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html

Sorry for the long wall of text,
Best regards,
Gil Pinheiro.

Re: on-disk size vs partition-size in cfhistograms

2016-05-20 Thread Alain RODRIGUEZ

Hi Joseph,

The approach i took was to insert increasing number of rows into a replica
> of the table to sized,  watch the size of the "data" directory (after doing
> nodetool flush and compact), and calculate the average size per row (total
> directory size/count of rows). Can this be considered a valid approach to
> extrapolate for future growth of data ?


You also need to consider the replication factor you are going to use and
the percentage of the data this node you are looking at is owning.
Also, when you run "nodetool compact" you get the minimal possible size,
when in real conditions you probably never will never be in this state. If
you update the same row again and again, shards of the row will be spread
in multiple sstables, with more overhead. Plus if you plan to TTL data or
to delete some, you will always having some tombstones in there too, and
maybe for long depending on how you tune Cassandra and on you use case I
guess.

So I would say this approach is not very accurate. My guess is you will end
up using more space than you think. But it is also harder to do capacity
planning from nothing than from a working system.

It seems the size in cfhisto has a wide variation with the calculated value
> using the approach detailed above (avg 2KB/row). Could this difference be
> due to compression, or are there any other factors at play here?


It could be compression indeed. To check that, you need to dig into the
code. What Cassandra version are you planning to use? By the way, If disk
space matters to you as it seems to me, you might want to use Cassandra
3.0+: http://www.datastax.com/2015/12/storage-engine-30,
http://www.planetcassandra.org/blog/this-week-in-cassandra-3-0-storage-engine-deep-dive-3112016/,
http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
.


> What would be the typical use/interpretation of the "partition size"
> metric.


I guess people use that to spot wide rows mainly, but if you are happy
summing those, it should be good as long as you know what you are summing.
Each Cassandra operator has his tips and own usage of the tools available
and might have a distinct way of performing operations depending on its
needs and own experience :-). So if it looks relevant to you, go ahead. For
example, if you find out that this is the data before compression, then
just applying the compression ratio to your sum should be good. Still take
care of my first point above.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-05-06 13:27 GMT+02:00 Joseph Tech :

> Hi,
>
> I am trying to get some baselines for capacity planning. The approach i
> took was to insert increasing number of rows into a replica of the table to
> sized,  watch the size of the "data" directory (after doing nodetool flush
> and compact), and calculate the average size per row (total directory
> size/count of rows). Can this be considered a valid approach to extrapolate
> for future growth of data ?
>
> Related to this, is there any information we can gather from
> partition-size of cfhistograms (snipped output for my table below) :
>
> Partition Size (bytes)
>642 bytes: 221
>770 bytes: 2328
>924 bytes: 328858
> ..
> 8239 bytes: 153178
> ...
>  24601 bytes: 16973
>  29521 bytes: 10805
> ...
> 219342 bytes: 23
> 263210 bytes: 6
> 315852 bytes: 4
>
> It seems the size in cfhisto has a wide variation with the calculated
> value using the approach detailed above (avg 2KB/row). Could this
> difference be due to compression, or are there any other factors at play
> here? . What would be the typical use/interpretation of the "partition
> size" metric.
>
> The table definition is like :
>
> CREATE TABLE abc (
>   key1 text,
>   col1 text,
>   PRIMARY KEY ((key1))
> ) WITH
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.00 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='99.0PERCENTILE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'sstable_size_in_mb': '50', 'class':
> 'LeveledCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
>
> Thanks,
> Joseph
>
>
>
>
>

on-disk size vs partition-size in cfhistograms

2016-05-06 Thread Joseph Tech

Hi,

I am trying to get some baselines for capacity planning. The approach i
took was to insert increasing number of rows into a replica of the table to
sized,  watch the size of the "data" directory (after doing nodetool flush
and compact), and calculate the average size per row (total directory
size/count of rows). Can this be considered a valid approach to extrapolate
for future growth of data ?

Related to this, is there any information we can gather from partition-size
of cfhistograms (snipped output for my table below) :

Partition Size (bytes)
   642 bytes: 221
   770 bytes: 2328
   924 bytes: 328858
..
8239 bytes: 153178
...
 24601 bytes: 16973
 29521 bytes: 10805
...
219342 bytes: 23
263210 bytes: 6
315852 bytes: 4

It seems the size in cfhisto has a wide variation with the calculated value
using the approach detailed above (avg 2KB/row). Could this difference be
due to compression, or are there any other factors at play here? . What
would be the typical use/interpretation of the "partition size" metric.

The table definition is like :

CREATE TABLE abc (
  key1 text,
  col1 text,
  PRIMARY KEY ((key1))
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.10 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.00 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'sstable_size_in_mb': '50', 'class':
'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

Thanks,
Joseph

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-06 Thread Jim Ancona

On Tue, Jan 5, 2016 at 5:52 PM, Jonathan Haddad  wrote:

> You could keep a "num_buckets" value associated with the client's account,
> which can be adjusted accordingly as usage increases.
>

Yes, but the adjustment problem is tricky when there are multiple
concurrent writers. What happens when you change the number of buckets?
Does existing data have to be re-written into new buckets? If so, how do
you make sure that's only done once for each bucket size increase? Or
perhaps I'm misunderstanding your suggestion?

Jim


> On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona  wrote:
>
>> On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin <
>> clintlmar...@coolfiretechnologies.com> wrote:
>>
>>> What sort of data is your clustering key composed of? That might help
>>> some in determining a way to achieve what you're looking for.
>>>
>> Just a UUID that acts as an object identifier.
>>
>>>
>>> Clint
>>> On Jan 5, 2016 2:28 PM, "Jim Ancona"  wrote:
>>>
 Hi Nate,

 Yes, I've been thinking about treating customers as either small or
 big, where "small" ones have a single partition and big ones have 50 (or
 whatever number I need to keep sizes reasonable). There's still the problem
 of how to handle a small customer who becomes too big, but that will happen
 much less frequently than a customer filling a partition.

 Jim

 On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall 
 wrote:

>
>> In this case, 99% of my data could fit in a single 50 MB partition.
>> But if I use the standard approach, I have to split my partitions into 50
>> pieces to accommodate the largest data. That means that to query the 700
>> rows for my median case, I have to read 50 partitions instead of one.
>>
>> If you try to deal with this by starting a new partition when an old
>> one fills up, you have a nasty distributed consensus problem, along with
>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>> with this, but might help with the consensus part today. But there are
>> still some nasty corner cases.
>>
>> I have some thoughts on other ways to solve this, but they all have
>> drawbacks. So I thought I'd ask here and hope that someone has a better
>> approach.
>>
>>
> Hi Jim - good to see you around again.
>
> If you can segment this upstream by customer/account/whatever,
> handling the outliers as an entirely different code path (potentially
> different cluster as the workload will be quite different at that point 
> and
> have different tuning requirements) would be your best bet. Then a
> read-before-write makes sense given it is happening on such a small number
> of API queries.
>
>
> --
> -
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jonathan Haddad

You could keep a "num_buckets" value associated with the client's account,
which can be adjusted accordingly as usage increases.

On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona  wrote:

> On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin <
> clintlmar...@coolfiretechnologies.com> wrote:
>
>> What sort of data is your clustering key composed of? That might help
>> some in determining a way to achieve what you're looking for.
>>
> Just a UUID that acts as an object identifier.
>
>>
>> Clint
>> On Jan 5, 2016 2:28 PM, "Jim Ancona"  wrote:
>>
>>> Hi Nate,
>>>
>>> Yes, I've been thinking about treating customers as either small or big,
>>> where "small" ones have a single partition and big ones have 50 (or
>>> whatever number I need to keep sizes reasonable). There's still the problem
>>> of how to handle a small customer who becomes too big, but that will happen
>>> much less frequently than a customer filling a partition.
>>>
>>> Jim
>>>
>>> On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall 
>>> wrote:
>>>

> In this case, 99% of my data could fit in a single 50 MB partition.
> But if I use the standard approach, I have to split my partitions into 50
> pieces to accommodate the largest data. That means that to query the 700
> rows for my median case, I have to read 50 partitions instead of one.
>
> If you try to deal with this by starting a new partition when an old
> one fills up, you have a nasty distributed consensus problem, along with
> read-before-write. Cassandra LWT wasn't available the last time I dealt
> with this, but might help with the consensus part today. But there are
> still some nasty corner cases.
>
> I have some thoughts on other ways to solve this, but they all have
> drawbacks. So I thought I'd ask here and hope that someone has a better
> approach.
>
>
 Hi Jim - good to see you around again.

 If you can segment this upstream by customer/account/whatever, handling
 the outliers as an entirely different code path (potentially different
 cluster as the workload will be quite different at that point and have
 different tuning requirements) would be your best bet. Then a
 read-before-write makes sense given it is happening on such a small number
 of API queries.


 --
 -
 Nate McCall
 Austin, TX
 @zznate

 Co-Founder & Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

>>>
>>>

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona

On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin <
clintlmar...@coolfiretechnologies.com> wrote:

> What sort of data is your clustering key composed of? That might help some
> in determining a way to achieve what you're looking for.
>
Just a UUID that acts as an object identifier.

>
> Clint
> On Jan 5, 2016 2:28 PM, "Jim Ancona"  wrote:
>
>> Hi Nate,
>>
>> Yes, I've been thinking about treating customers as either small or big,
>> where "small" ones have a single partition and big ones have 50 (or
>> whatever number I need to keep sizes reasonable). There's still the problem
>> of how to handle a small customer who becomes too big, but that will happen
>> much less frequently than a customer filling a partition.
>>
>> Jim
>>
>> On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall 
>> wrote:
>>
>>>
 In this case, 99% of my data could fit in a single 50 MB partition. But
 if I use the standard approach, I have to split my partitions into 50
 pieces to accommodate the largest data. That means that to query the 700
 rows for my median case, I have to read 50 partitions instead of one.

 If you try to deal with this by starting a new partition when an old
 one fills up, you have a nasty distributed consensus problem, along with
 read-before-write. Cassandra LWT wasn't available the last time I dealt
 with this, but might help with the consensus part today. But there are
 still some nasty corner cases.

 I have some thoughts on other ways to solve this, but they all have
 drawbacks. So I thought I'd ask here and hope that someone has a better
 approach.


>>> Hi Jim - good to see you around again.
>>>
>>> If you can segment this upstream by customer/account/whatever, handling
>>> the outliers as an entirely different code path (potentially different
>>> cluster as the workload will be quite different at that point and have
>>> different tuning requirements) would be your best bet. Then a
>>> read-before-write makes sense given it is happening on such a small number
>>> of API queries.
>>>
>>>
>>> --
>>> -
>>> Nate McCall
>>> Austin, TX
>>> @zznate
>>>
>>> Co-Founder & Sr. Technical Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Clint Martin

What sort of data is your clustering key composed of? That might help some
in determining a way to achieve what you're looking for.

Clint
On Jan 5, 2016 2:28 PM, "Jim Ancona"  wrote:

> Hi Nate,
>
> Yes, I've been thinking about treating customers as either small or big,
> where "small" ones have a single partition and big ones have 50 (or
> whatever number I need to keep sizes reasonable). There's still the problem
> of how to handle a small customer who becomes too big, but that will happen
> much less frequently than a customer filling a partition.
>
> Jim
>
> On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall 
> wrote:
>
>>
>>> In this case, 99% of my data could fit in a single 50 MB partition. But
>>> if I use the standard approach, I have to split my partitions into 50
>>> pieces to accommodate the largest data. That means that to query the 700
>>> rows for my median case, I have to read 50 partitions instead of one.
>>>
>>> If you try to deal with this by starting a new partition when an old one
>>> fills up, you have a nasty distributed consensus problem, along with
>>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>>> with this, but might help with the consensus part today. But there are
>>> still some nasty corner cases.
>>>
>>> I have some thoughts on other ways to solve this, but they all have
>>> drawbacks. So I thought I'd ask here and hope that someone has a better
>>> approach.
>>>
>>>
>> Hi Jim - good to see you around again.
>>
>> If you can segment this upstream by customer/account/whatever, handling
>> the outliers as an entirely different code path (potentially different
>> cluster as the workload will be quite different at that point and have
>> different tuning requirements) would be your best bet. Then a
>> read-before-write makes sense given it is happening on such a small number
>> of API queries.
>>
>>
>> --
>> -
>> Nate McCall
>> Austin, TX
>> @zznate
>>
>> Co-Founder & Sr. Technical Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona

Hi Nate,

Yes, I've been thinking about treating customers as either small or big,
where "small" ones have a single partition and big ones have 50 (or
whatever number I need to keep sizes reasonable). There's still the problem
of how to handle a small customer who becomes too big, but that will happen
much less frequently than a customer filling a partition.

Jim

On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall  wrote:

>
>> In this case, 99% of my data could fit in a single 50 MB partition. But
>> if I use the standard approach, I have to split my partitions into 50
>> pieces to accommodate the largest data. That means that to query the 700
>> rows for my median case, I have to read 50 partitions instead of one.
>>
>> If you try to deal with this by starting a new partition when an old one
>> fills up, you have a nasty distributed consensus problem, along with
>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>> with this, but might help with the consensus part today. But there are
>> still some nasty corner cases.
>>
>> I have some thoughts on other ways to solve this, but they all have
>> drawbacks. So I thought I'd ask here and hope that someone has a better
>> approach.
>>
>>
> Hi Jim - good to see you around again.
>
> If you can segment this upstream by customer/account/whatever, handling
> the outliers as an entirely different code path (potentially different
> cluster as the workload will be quite different at that point and have
> different tuning requirements) would be your best bet. Then a
> read-before-write makes sense given it is happening on such a small number
> of API queries.
>
>
> --
> -
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona

Hi Jack,

Thanks for your response. My answers inline...

On Tue, Jan 5, 2016 at 11:52 AM, Jack Krupansky 
wrote:

> Jim, I don't quite get why you think you would need to query 50 partitions
> to return merely hundreds or thousands of rows. Please elaborate. I mean,
> sure, for that extreme 100th percentile, yes, you would query a lot of
> partitions, but for the 90th percentile it would be just one. Even the 99th
> percentile would just be one or at most a few.
>
Exactly, but, as I mentioned in my email, the normal way of segmenting
large partitions is to use some deterministic bucketing mechanism to bucket
rows into different partitions. If you know of a way to make the number of
buckets vary with the number of rows, I'd love to hear about it.

It would help if you could elaborate on the actual access pattern - how
> rapidly is the data coming in and from where. You can do just a little more
> work at the app level and and use Cassandra more effectively.
>
 The write pattern is batches of inserts/updates mixed with some single row
inserts/updates. Not surprisingly, the customers with more data also do
more writes.


> As always, we look to queries to determine what the Cassandra data model
> should look like, so elaborate what your app needs to see. What exactly is
> the app querying for - a single key, a slice, or... what?
>
The use case here is sequential access to some or all or a customer's rows
in order to filter based on other criteria. The order doesn't matter much,
as long as it's well-defined.


> And, as always, you commonly need to store the data in multiple query
> tables so that the data model matches the desired query pattern.
>
> Are the row sizes very dynamic, with some extremely large, or is it just
> the number of rows that is making size an issue?
>
No, row sizes don't vary much, just the number of rows per customer.


>
> Maybe let the app keep a small cache of active partitions and their
> current size so that the app can decide when to switch to a new bucket. Do
> a couple of extra queries when a key is not in that cache to determine what
> the partition size and count to initialize the cache entry for a key. If
> necessary, keep a separate table that tracks the partition size or maybe
> just the (rough) row count to use to determine when a new partition is
> needed.
>

I've done almost exactly what you suggest in a previous application. The
issue is that the cache of active partitions needs to be consistent for
multiple writers and the transition from one bucket to the next really
wants to be transactional. Hence my reference to a "nasty distributed
consensus problem" and Clint's reference to an "anti-pattern". I'd like to
avoid it if I can.

Jim


>
> -- Jack Krupansky
>
> On Tue, Jan 5, 2016 at 11:07 AM, Jim Ancona  wrote:
>
>> Thanks for responding!
>>
>> My natural partition key is a customer id. Our customers have widely
>> varying amounts of data. Since the vast majority of them have data that's
>> small enough to fit in a single partition, I'd like to avoid imposing
>> unnecessary overhead on the 99% just to avoid issues with the largest 1%.
>>
>> The approach to querying across multiple partitions you describe is
>> pretty much what I have in mind. The trick is to avoid having to query 50
>> partitions to return a few hundred or thousand rows.
>>
>> I agree that sequentially filling partitions is something to avoid.
>> That's why I'm hoping someone can suggest a good alternative.
>>
>> Jim
>>
>>
>>
>>
>>
>> On Mon, Jan 4, 2016 at 8:07 PM, Clint Martin <
>> clintlmar...@coolfiretechnologies.com> wrote:
>>
>>> You should endeavor to use a repeatable method of segmenting your data.
>>> Swapping partitions every time you "fill one" seems like an anti pattern to
>>> me. but I suppose it really depends on what your primary key is. Can you
>>> share some more information on this?
>>>
>>> In the past I have utilized the consistent hash method you described
>>> (add an artificial row key segment by modulo some part of the clustering
>>> key by a fixed position count) combined with a lazy evaluation cursor.
>>>
>>> The lazy evaluation cursor essentially is set up to query X number of
>>> partitions simultaneously, but to execute those queries only add needed to
>>> fill the page size. To perform paging you have to know the last primary key
>>> that was returned so you can use that to limit the next iteration.
>>>
>>> You can trade latency for additional work load by controlling the number
>>> of conc

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Nate McCall

>
>
> In this case, 99% of my data could fit in a single 50 MB partition. But if
> I use the standard approach, I have to split my partitions into 50 pieces
> to accommodate the largest data. That means that to query the 700 rows for
> my median case, I have to read 50 partitions instead of one.
>
> If you try to deal with this by starting a new partition when an old one
> fills up, you have a nasty distributed consensus problem, along with
> read-before-write. Cassandra LWT wasn't available the last time I dealt
> with this, but might help with the consensus part today. But there are
> still some nasty corner cases.
>
> I have some thoughts on other ways to solve this, but they all have
> drawbacks. So I thought I'd ask here and hope that someone has a better
> approach.
>
>
Hi Jim - good to see you around again.

If you can segment this upstream by customer/account/whatever, handling the
outliers as an entirely different code path (potentially different cluster
as the workload will be quite different at that point and have different
tuning requirements) would be your best bet. Then a read-before-write makes
sense given it is happening on such a small number of API queries.


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jack Krupansky

Jim, I don't quite get why you think you would need to query 50 partitions
to return merely hundreds or thousands of rows. Please elaborate. I mean,
sure, for that extreme 100th percentile, yes, you would query a lot of
partitions, but for the 90th percentile it would be just one. Even the 99th
percentile would just be one or at most a few.

It would help if you could elaborate on the actual access pattern - how
rapidly is the data coming in and from where. You can do just a little more
work at the app level and and use Cassandra more effectively.

As always, we look to queries to determine what the Cassandra data model
should look like, so elaborate what your app needs to see. What exactly is
the app querying for - a single key, a slice, or... what?

And, as always, you commonly need to store the data in multiple query
tables so that the data model matches the desired query pattern.

Are the row sizes very dynamic, with some extremely large, or is it just
the number of rows that is making size an issue?

Maybe let the app keep a small cache of active partitions and their current
size so that the app can decide when to switch to a new bucket. Do a couple
of extra queries when a key is not in that cache to determine what the
partition size and count to initialize the cache entry for a key. If
necessary, keep a separate table that tracks the partition size or maybe
just the (rough) row count to use to determine when a new partition is
needed.

-- Jack Krupansky

On Tue, Jan 5, 2016 at 11:07 AM, Jim Ancona  wrote:

> Thanks for responding!
>
> My natural partition key is a customer id. Our customers have widely
> varying amounts of data. Since the vast majority of them have data that's
> small enough to fit in a single partition, I'd like to avoid imposing
> unnecessary overhead on the 99% just to avoid issues with the largest 1%.
>
> The approach to querying across multiple partitions you describe is pretty
> much what I have in mind. The trick is to avoid having to query 50
> partitions to return a few hundred or thousand rows.
>
> I agree that sequentially filling partitions is something to avoid. That's
> why I'm hoping someone can suggest a good alternative.
>
> Jim
>
>
>
>
>
> On Mon, Jan 4, 2016 at 8:07 PM, Clint Martin <
> clintlmar...@coolfiretechnologies.com> wrote:
>
>> You should endeavor to use a repeatable method of segmenting your data.
>> Swapping partitions every time you "fill one" seems like an anti pattern to
>> me. but I suppose it really depends on what your primary key is. Can you
>> share some more information on this?
>>
>> In the past I have utilized the consistent hash method you described (add
>> an artificial row key segment by modulo some part of the clustering key by
>> a fixed position count) combined with a lazy evaluation cursor.
>>
>> The lazy evaluation cursor essentially is set up to query X number of
>> partitions simultaneously, but to execute those queries only add needed to
>> fill the page size. To perform paging you have to know the last primary key
>> that was returned so you can use that to limit the next iteration.
>>
>> You can trade latency for additional work load by controlling the number
>> of concurrent executions you do as the iterating occurs. Or you can
>> minimize the work on your cluster by querying each partition one at a time.
>>
>> Unfortunately due to the artificial partition key segment you cannot
>> iterate or page in any particular order...(at least across partitions)
>> Unless your hash function can also provide you some ordering guarantees.
>>
>> It all just depends on your requirements.
>>
>> Clint
>> On Jan 4, 2016 10:13 AM, "Jim Ancona"  wrote:
>>
>>> A problem that I have run into repeatedly when doing schema design is
>>> how to control partition size while still allowing for efficient multi-row
>>> queries.
>>>
>>> We want to limit partition size to some number between 10 and 100
>>> megabytes to avoid operational issues. The standard way to do that is to
>>> figure out the maximum number of rows that your "natural partition key"
>>> will ever need to support and then add an additional artificial partition
>>> key that segments the rows sufficiently to get keep the partition size
>>> under the maximum. In the case of time series data, this is often done by
>>> bucketing by time period, i.e. creating a new partition every minute, hour
>>> or day. For non-time series data by doing something like
>>> Hash(clustering-key) mod desired-number-of-partitions.
>>>
>>> In my case, multi-row queries to support a REST API typic

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona

Thanks for responding!

My natural partition key is a customer id. Our customers have widely
varying amounts of data. Since the vast majority of them have data that's
small enough to fit in a single partition, I'd like to avoid imposing
unnecessary overhead on the 99% just to avoid issues with the largest 1%.

The approach to querying across multiple partitions you describe is pretty
much what I have in mind. The trick is to avoid having to query 50
partitions to return a few hundred or thousand rows.

I agree that sequentially filling partitions is something to avoid. That's
why I'm hoping someone can suggest a good alternative.

Jim





On Mon, Jan 4, 2016 at 8:07 PM, Clint Martin <
clintlmar...@coolfiretechnologies.com> wrote:

> You should endeavor to use a repeatable method of segmenting your data.
> Swapping partitions every time you "fill one" seems like an anti pattern to
> me. but I suppose it really depends on what your primary key is. Can you
> share some more information on this?
>
> In the past I have utilized the consistent hash method you described (add
> an artificial row key segment by modulo some part of the clustering key by
> a fixed position count) combined with a lazy evaluation cursor.
>
> The lazy evaluation cursor essentially is set up to query X number of
> partitions simultaneously, but to execute those queries only add needed to
> fill the page size. To perform paging you have to know the last primary key
> that was returned so you can use that to limit the next iteration.
>
> You can trade latency for additional work load by controlling the number
> of concurrent executions you do as the iterating occurs. Or you can
> minimize the work on your cluster by querying each partition one at a time.
>
> Unfortunately due to the artificial partition key segment you cannot
> iterate or page in any particular order...(at least across partitions)
> Unless your hash function can also provide you some ordering guarantees.
>
> It all just depends on your requirements.
>
> Clint
> On Jan 4, 2016 10:13 AM, "Jim Ancona"  wrote:
>
>> A problem that I have run into repeatedly when doing schema design is how
>> to control partition size while still allowing for efficient multi-row
>> queries.
>>
>> We want to limit partition size to some number between 10 and 100
>> megabytes to avoid operational issues. The standard way to do that is to
>> figure out the maximum number of rows that your "natural partition key"
>> will ever need to support and then add an additional artificial partition
>> key that segments the rows sufficiently to get keep the partition size
>> under the maximum. In the case of time series data, this is often done by
>> bucketing by time period, i.e. creating a new partition every minute, hour
>> or day. For non-time series data by doing something like
>> Hash(clustering-key) mod desired-number-of-partitions.
>>
>> In my case, multi-row queries to support a REST API typically return a
>> page of results, where the page size might be anywhere from a few dozen up
>> to thousands. For query efficiency I want the average number of rows per
>> partition to be large enough that a query can be satisfied by reading a
>> small number of partitions--ideally one.
>>
>> So I want to simultaneously limit the maximum number of rows per
>> partition and yet maintain a large enough average number of rows per
>> partition to make my queries efficient. But with my data the ratio between
>> maximum and average can be very large (up to four orders of magnitude).
>>
>> Here is an example:
>>
>>
>> Rows per Partition
>>
>> Partition Size
>>
>> Mode
>>
>> 1
>>
>> 1 KB
>>
>> Median
>>
>> 500
>>
>> 500 KB
>>
>> 90th percentile
>>
>> 5,000
>>
>> 5 MB
>>
>> 99th percentile
>>
>> 50,000
>>
>> 50 MB
>>
>> Maximum
>>
>> 2,500,000
>>
>> 2.5 GB
>>
>> In this case, 99% of my data could fit in a single 50 MB partition. But
>> if I use the standard approach, I have to split my partitions into 50
>> pieces to accommodate the largest data. That means that to query the 700
>> rows for my median case, I have to read 50 partitions instead of one.
>>
>> If you try to deal with this by starting a new partition when an old one
>> fills up, you have a nasty distributed consensus problem, along with
>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>> with this, but might help with the consensus part today. But there are
>> still some nasty corner cases.
>>
>> I have some thoughts on other ways to solve this, but they all have
>> drawbacks. So I thought I'd ask here and hope that someone has a better
>> approach.
>>
>> Thanks in advance,
>>
>> Jim
>>
>>

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-04 Thread Clint Martin

You should endeavor to use a repeatable method of segmenting your data.
Swapping partitions every time you "fill one" seems like an anti pattern to
me. but I suppose it really depends on what your primary key is. Can you
share some more information on this?

In the past I have utilized the consistent hash method you described (add
an artificial row key segment by modulo some part of the clustering key by
a fixed position count) combined with a lazy evaluation cursor.

The lazy evaluation cursor essentially is set up to query X number of
partitions simultaneously, but to execute those queries only add needed to
fill the page size. To perform paging you have to know the last primary key
that was returned so you can use that to limit the next iteration.

You can trade latency for additional work load by controlling the number of
concurrent executions you do as the iterating occurs. Or you can minimize
the work on your cluster by querying each partition one at a time.

Unfortunately due to the artificial partition key segment you cannot
iterate or page in any particular order...(at least across partitions)
Unless your hash function can also provide you some ordering guarantees.

It all just depends on your requirements.

Clint
On Jan 4, 2016 10:13 AM, "Jim Ancona"  wrote:

> A problem that I have run into repeatedly when doing schema design is how
> to control partition size while still allowing for efficient multi-row
> queries.
>
> We want to limit partition size to some number between 10 and 100
> megabytes to avoid operational issues. The standard way to do that is to
> figure out the maximum number of rows that your "natural partition key"
> will ever need to support and then add an additional artificial partition
> key that segments the rows sufficiently to get keep the partition size
> under the maximum. In the case of time series data, this is often done by
> bucketing by time period, i.e. creating a new partition every minute, hour
> or day. For non-time series data by doing something like
> Hash(clustering-key) mod desired-number-of-partitions.
>
> In my case, multi-row queries to support a REST API typically return a
> page of results, where the page size might be anywhere from a few dozen up
> to thousands. For query efficiency I want the average number of rows per
> partition to be large enough that a query can be satisfied by reading a
> small number of partitions--ideally one.
>
> So I want to simultaneously limit the maximum number of rows per partition
> and yet maintain a large enough average number of rows per partition to
> make my queries efficient. But with my data the ratio between maximum and
> average can be very large (up to four orders of magnitude).
>
> Here is an example:
>
>
> Rows per Partition
>
> Partition Size
>
> Mode
>
> 1
>
> 1 KB
>
> Median
>
> 500
>
> 500 KB
>
> 90th percentile
>
> 5,000
>
> 5 MB
>
> 99th percentile
>
> 50,000
>
> 50 MB
>
> Maximum
>
> 2,500,000
>
> 2.5 GB
>
> In this case, 99% of my data could fit in a single 50 MB partition. But if
> I use the standard approach, I have to split my partitions into 50 pieces
> to accommodate the largest data. That means that to query the 700 rows for
> my median case, I have to read 50 partitions instead of one.
>
> If you try to deal with this by starting a new partition when an old one
> fills up, you have a nasty distributed consensus problem, along with
> read-before-write. Cassandra LWT wasn't available the last time I dealt
> with this, but might help with the consensus part today. But there are
> still some nasty corner cases.
>
> I have some thoughts on other ways to solve this, but they all have
> drawbacks. So I thought I'd ask here and hope that someone has a better
> approach.
>
> Thanks in advance,
>
> Jim
>
>

Data Modeling: Partition Size and Query Efficiency

2016-01-04 Thread Jim Ancona

A problem that I have run into repeatedly when doing schema design is how
to control partition size while still allowing for efficient multi-row
queries.

We want to limit partition size to some number between 10 and 100 megabytes
to avoid operational issues. The standard way to do that is to figure out
the maximum number of rows that your "natural partition key" will ever need
to support and then add an additional artificial partition key that
segments the rows sufficiently to get keep the partition size under the
maximum. In the case of time series data, this is often done by bucketing
by time period, i.e. creating a new partition every minute, hour or day.
For non-time series data by doing something like Hash(clustering-key) mod
desired-number-of-partitions.

In my case, multi-row queries to support a REST API typically return a page
of results, where the page size might be anywhere from a few dozen up to
thousands. For query efficiency I want the average number of rows per
partition to be large enough that a query can be satisfied by reading a
small number of partitions--ideally one.

So I want to simultaneously limit the maximum number of rows per partition
and yet maintain a large enough average number of rows per partition to
make my queries efficient. But with my data the ratio between maximum and
average can be very large (up to four orders of magnitude).

Here is an example:


Rows per Partition

Partition Size

Mode

1

1 KB

Median

500

500 KB

90th percentile

5,000

5 MB

99th percentile

50,000

50 MB

Maximum

2,500,000

2.5 GB

In this case, 99% of my data could fit in a single 50 MB partition. But if
I use the standard approach, I have to split my partitions into 50 pieces
to accommodate the largest data. That means that to query the 700 rows for
my median case, I have to read 50 partitions instead of one.

If you try to deal with this by starting a new partition when an old one
fills up, you have a nasty distributed consensus problem, along with
read-before-write. Cassandra LWT wasn't available the last time I dealt
with this, but might help with the consensus part today. But there are
still some nasty corner cases.

I have some thoughts on other ways to solve this, but they all have
drawbacks. So I thought I'd ask here and hope that someone has a better
approach.

Thanks in advance,

Jim

49 matches

Mail list logo