Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Carlos Rolo
Great explanation, thanks Jeff!

On 7 Mar 2018 17:49, "Javier Pareja"  wrote:

> Thank you for your time Jeff, very helpful.I  couldn't find anything out
> there about the subject and I suspected that this could be the case.
>
> Regarding the clustering key in this case:
> Back in the RDBMS world, you will always assign a sequential (or as
> sequential as possible) clustering key to a table to minimize fragmentation
> and increase the speed of the insertions. In the Cassandra world, does the
> same apply to the clustering key? For example, is it a good idea to assign
> a UUID to a clustering key, or would a timestamp be a better choice? I am
> thinking that partitions need to keep some sort of binary index for the
> clustering keys and for relatively large partitions it can be relatively
> expensive to maintain.
>
> F Javier Pareja
>
> On Wed, Mar 7, 2018 at 5:20 PM, Jeff Jirsa  wrote:
>
>>
>>
>> On Wed, Mar 7, 2018 at 7:13 AM, Carlos Rolo  wrote:
>>
>>> Hi Jeff,
>>>
>>> Could you expand: "Tables without clustering keys are often deceptively
>>> expensive to compact, as a lot of work (relative to the other cell
>>> boundaries) happens on partition boundaries." This is something I didn't
>>> know and highly interesting to know more about!
>>>
>>>
>>>
>> We do a lot "by partition". We build column indexes by partition. We
>> update the partition index on each partition. We invalidate key cache by
>> partition. They're not super expensive, but they take time, and tables with
>> tiny partitions can actually be slower to compact.
>>
>> There's no magic cutoff where it does/doesn't make sense, my comment is
>> mostly a warning that the edges of the "normal" use cases tend to be less
>> optimized than the common case. Having a table with a hundred billion
>> records, where the key is numeric and the value is a single byte (let's say
>> you're keeping track of whether or not a specific sensor has ever detected
>> some magic event, and you have 100B sensors, that table will be close to
>> the worst-case example of this behavior).
>>
>
>

-- 


--





Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
Thank you for your time Jeff, very helpful.I  couldn't find anything out
there about the subject and I suspected that this could be the case.

Regarding the clustering key in this case:
Back in the RDBMS world, you will always assign a sequential (or as
sequential as possible) clustering key to a table to minimize fragmentation
and increase the speed of the insertions. In the Cassandra world, does the
same apply to the clustering key? For example, is it a good idea to assign
a UUID to a clustering key, or would a timestamp be a better choice? I am
thinking that partitions need to keep some sort of binary index for the
clustering keys and for relatively large partitions it can be relatively
expensive to maintain.

F Javier Pareja

On Wed, Mar 7, 2018 at 5:20 PM, Jeff Jirsa  wrote:

>
>
> On Wed, Mar 7, 2018 at 7:13 AM, Carlos Rolo  wrote:
>
>> Hi Jeff,
>>
>> Could you expand: "Tables without clustering keys are often deceptively
>> expensive to compact, as a lot of work (relative to the other cell
>> boundaries) happens on partition boundaries." This is something I didn't
>> know and highly interesting to know more about!
>>
>>
>>
> We do a lot "by partition". We build column indexes by partition. We
> update the partition index on each partition. We invalidate key cache by
> partition. They're not super expensive, but they take time, and tables with
> tiny partitions can actually be slower to compact.
>
> There's no magic cutoff where it does/doesn't make sense, my comment is
> mostly a warning that the edges of the "normal" use cases tend to be less
> optimized than the common case. Having a table with a hundred billion
> records, where the key is numeric and the value is a single byte (let's say
> you're keeping track of whether or not a specific sensor has ever detected
> some magic event, and you have 100B sensors, that table will be close to
> the worst-case example of this behavior).
>


Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Jeff Jirsa
On Wed, Mar 7, 2018 at 7:13 AM, Carlos Rolo  wrote:

> Hi Jeff,
>
> Could you expand: "Tables without clustering keys are often deceptively
> expensive to compact, as a lot of work (relative to the other cell
> boundaries) happens on partition boundaries." This is something I didn't
> know and highly interesting to know more about!
>
>
>
We do a lot "by partition". We build column indexes by partition. We update
the partition index on each partition. We invalidate key cache by
partition. They're not super expensive, but they take time, and tables with
tiny partitions can actually be slower to compact.

There's no magic cutoff where it does/doesn't make sense, my comment is
mostly a warning that the edges of the "normal" use cases tend to be less
optimized than the common case. Having a table with a hundred billion
records, where the key is numeric and the value is a single byte (let's say
you're keeping track of whether or not a specific sensor has ever detected
some magic event, and you have 100B sensors, that table will be close to
the worst-case example of this behavior).


Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
Thank you Jeff,

So, if I understood your email correctly, there is no restriction but I
should be using clustering for performance reasons.
I am expecting to store 10B rows per year in this table and each row will
have a user defined type with an approx size of 1500 bytes.
The access to the data in this table will be random as it stores the "raw"
data. There will be other tables with processed data organized by time in a
clustering column to access in sequence for the system.
Each row represents an event and it has a UUID which I am planning to use
as the partition key. Should I find another field for the partition and use
the UUID for the clustering instead?


F Javier Pareja

On Wed, Mar 7, 2018 at 2:36 PM, Jeff Jirsa  wrote:

> There is no limit
>
> The token range of murmur3 is 2^64, but Cassandra properly handles token
> overlaps (we use a key that’s effectively a tuple of the token/hash and the
> underlying key itself), so having more than 2^64 partitions won’t hurt
> anything in theory
>
> That said, having that many partitions would be an incredibly huge data
> set, and unless modeled properly, would be very likely to be unwieldy in
> practice.
>
> Tables without clustering keys are often deceptively expensive to compact,
> as a lot of work (relative to the other cell boundaries) happens on
> partition boundaries.
>
> --
> Jeff Jirsa
>
>
> > On Mar 7, 2018, at 3:06 AM, Javier Pareja 
> wrote:
> >
> > Hello all,
> >
> > I have been trying to find an answer to the following but I have had no
> luck so far:
> > Is there any limit to the number of partitions that a table can have?
> > Let's say a table has a partition key an no clustering key, is there a
> recommended limit on the number of values that this partition key can have?
> Is it recommended to have a clustering key to reduce this number by storing
> several rows in each partition instead of one row per partition.
> >
> > Regards,
> >
> > F Javier Pareja
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Carlos Rolo
Hi Jeff,

Could you expand: "Tables without clustering keys are often deceptively
expensive to compact, as a lot of work (relative to the other cell
boundaries) happens on partition boundaries." This is something I didn't
know and highly interesting to know more about!

--
Carlos Rolo

On Wed, Mar 7, 2018 at 2:36 PM, Jeff Jirsa  wrote:

> There is no limit
>
> The token range of murmur3 is 2^64, but Cassandra properly handles token
> overlaps (we use a key that’s effectively a tuple of the token/hash and the
> underlying key itself), so having more than 2^64 partitions won’t hurt
> anything in theory
>
> That said, having that many partitions would be an incredibly huge data
> set, and unless modeled properly, would be very likely to be unwieldy in
> practice.
>
> Tables without clustering keys are often deceptively expensive to compact,
> as a lot of work (relative to the other cell boundaries) happens on
> partition boundaries.
>
> --
> Jeff Jirsa
>
>
> > On Mar 7, 2018, at 3:06 AM, Javier Pareja 
> wrote:
> >
> > Hello all,
> >
> > I have been trying to find an answer to the following but I have had no
> luck so far:
> > Is there any limit to the number of partitions that a table can have?
> > Let's say a table has a partition key an no clustering key, is there a
> recommended limit on the number of values that this partition key can have?
> Is it recommended to have a clustering key to reduce this number by storing
> several rows in each partition instead of one row per partition.
> >
> > Regards,
> >
> > F Javier Pareja
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

-- 


--





Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Jeff Jirsa
There is no limit

The token range of murmur3 is 2^64, but Cassandra properly handles token 
overlaps (we use a key that’s effectively a tuple of the token/hash and the 
underlying key itself), so having more than 2^64 partitions won’t hurt anything 
in theory

That said, having that many partitions would be an incredibly huge data set, 
and unless modeled properly, would be very likely to be unwieldy in practice.

Tables without clustering keys are often deceptively expensive to compact, as a 
lot of work (relative to the other cell boundaries) happens on partition 
boundaries.

-- 
Jeff Jirsa


> On Mar 7, 2018, at 3:06 AM, Javier Pareja  wrote:
> 
> Hello all,
> 
> I have been trying to find an answer to the following but I have had no luck 
> so far:
> Is there any limit to the number of partitions that a table can have?
> Let's say a table has a partition key an no clustering key, is there a 
> recommended limit on the number of values that this partition key can have? 
> Is it recommended to have a clustering key to reduce this number by storing 
> several rows in each partition instead of one row per partition.
> 
> Regards,
> 
> F Javier Pareja

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
Thank you Rahul, but is it a good practice to use a large range here? Or
would it be better to create partitions with more than 1 row (by using a
clustering key)?
>From a data query point of view I will be accessing the rows by a UID one
at a time.

F Javier Pareja

On Wed, Mar 7, 2018 at 11:12 AM, Rahul Singh 
wrote:

> The range is 2*2^63
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Mar 7, 2018, 6:06 AM -0500, Javier Pareja ,
> wrote:
>
> Hello all,
>
> I have been trying to find an answer to the following but I have had no
> luck so far:
> Is there any limit to the number of partitions that a table can have?
> Let's say a table has a partition key an no clustering key, is there a
> recommended limit on the number of values that this partition key can have?
> Is it recommended to have a clustering key to reduce this number by storing
> several rows in each partition instead of one row per partition.
>
> Regards,
>
> F Javier Pareja
>
>


Re: [External] Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Tom van der Woerdt
Hi Javier,

When our users ask this question, I tend to answer "keep it above a
billion". More partitions is better.

I'm not aware of any actual limits on partition count. Practically it's
almost always limited by the disk space in a server.

Tom van der Woerdt
Site Reliability Engineer

Booking.com B.V.
Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
[image: Booking.com] 
The world's #1 accommodation site
43 languages, 198+ offices worldwide, 120,000+ global destinations,
1,550,000+ room nights booked every day
No booking fees, best price always guaranteed
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)

On Wed, Mar 7, 2018 at 12:06 PM, Javier Pareja 
wrote:

> Hello all,
>
> I have been trying to find an answer to the following but I have had no
> luck so far:
> Is there any limit to the number of partitions that a table can have?
> Let's say a table has a partition key an no clustering key, is there a
> recommended limit on the number of values that this partition key can have?
> Is it recommended to have a clustering key to reduce this number by storing
> several rows in each partition instead of one row per partition.
>
> Regards,
>
> F Javier Pareja
>


Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Rahul Singh
The range is 2*2^63

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 7, 2018, 6:06 AM -0500, Javier Pareja , wrote:
> Hello all,
>
> I have been trying to find an answer to the following but I have had no luck 
> so far:
> Is there any limit to the number of partitions that a table can have?
> Let's say a table has a partition key an no clustering key, is there a 
> recommended limit on the number of values that this partition key can have? 
> Is it recommended to have a clustering key to reduce this number by storing 
> several rows in each partition instead of one row per partition.
>
> Regards,
>
> F Javier Pareja


Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
Hello all,

I have been trying to find an answer to the following but I have had no
luck so far:
Is there any limit to the number of partitions that a table can have?
Let's say a table has a partition key an no clustering key, is there a
recommended limit on the number of values that this partition key can have?
Is it recommended to have a clustering key to reduce this number by storing
several rows in each partition instead of one row per partition.

Regards,

F Javier Pareja