Re: Partition size

2016-09-09 Thread Jonathan Haddad
I fully agree with Benedict here.  I would much prefer to keep this sort of
toxic behavior off the ML.  People can link to whatever helpful docs /
blogs they choose.

On Fri, Sep 9, 2016 at 1:12 PM Benedict Elliott Smith 
wrote:

> Come on. This kind of inconsistent 'policing' is not helpful.
>
> By all means, push the *committers* to improve the project docs as is
> happening, and to promote the internal resources over external ones.
>
> But Mark has absolutely no formal connection with the project, and his
> contributions have only been to file a couple of JIRA (all of which have so
> far been ignored by those of his colleagues who *are* active community
> members, I'll note!).  Shaming him for not linking docs that describe
> something *other* than what he was even talking about is crossing the
> line IMO.
>
> Linking to third-party resources is commonplace, the only difference I can
> see here is that these have been called "docs"  by the authors, instead of
> a blog post, and Mark has a DataStax email address.
>
> Would you have reacted this way if Aaron Morton linked a blog post by
> thelastpickle?  Or a random user posted their own resources?  Obviously not.
>
> I was initially all for the ASF endeavour to counteract DataStax' outsized
> influence on the project, and was hopeful you might achieve some positive
> change.  Perhaps you may well still do.  But it seems to me that the ASF
> behaviour is beginning to cross from constructive criticism of the project
> participants to prejudicially hostile behaviour against certain community
> members - and that is unlikely to result in a better project.
>
> You should be treating everyone consistently, in a manner that promotes
> project health.
>
>
>
> On Friday, 9 September 2016, Mark Thomas  wrote:
>
>> On 09/09/2016 16:46, Mark Curtis wrote:
>> > If your partition sizes are over 100MB iirc then you'll normally see
>> > warnings in your system.log, this will outline the partition key, at
>> > least in Cassandra 2.0 and 2.1 as I recall.
>> >
>> > Your best friend here is nodetool cfstats which shows you the
>> > min/mean/max partition sizes for your table. It's quite often used to
>> > pinpoint large partitons on nodes in a cluster.
>> >
>> > More info
>> > here:
>> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html
>>
>> Folks,
>>
>> It is *Apache* Cassandra. If you are going to point to docs, please
>> point to the official Apache docs unless there is a very good reason not
>> to.
>>
>> In this case:
>>
>>
>> http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
>>
>> looks to the place.
>>
>> Mark
>>
>>
>> >
>> > Thanks
>> >
>> > Mark
>> >
>> >
>> > On 9 September 2016 at 02:53, Anshu Vajpayee > > > wrote:
>> >
>> > Is there any way to get partition size for a  partition key ?
>> >
>> >
>>
>>


Re: Partition size

2016-09-09 Thread Benedict Elliott Smith
Come on. This kind of inconsistent 'policing' is not helpful.

By all means, push the *committers* to improve the project docs as is
happening, and to promote the internal resources over external ones.

But Mark has absolutely no formal connection with the project, and his
contributions have only been to file a couple of JIRA (all of which have so
far been ignored by those of his colleagues who *are* active community
members, I'll note!).  Shaming him for not linking docs that describe
something *other* than what he was even talking about is crossing the line
IMO.

Linking to third-party resources is commonplace, the only difference I can
see here is that these have been called "docs"  by the authors, instead of
a blog post, and Mark has a DataStax email address.

Would you have reacted this way if Aaron Morton linked a blog post by
thelastpickle?  Or a random user posted their own resources?  Obviously not.

I was initially all for the ASF endeavour to counteract DataStax' outsized
influence on the project, and was hopeful you might achieve some positive
change.  Perhaps you may well still do.  But it seems to me that the ASF
behaviour is beginning to cross from constructive criticism of the project
participants to prejudicially hostile behaviour against certain community
members - and that is unlikely to result in a better project.

You should be treating everyone consistently, in a manner that promotes
project health.



On Friday, 9 September 2016, Mark Thomas  wrote:

> On 09/09/2016 16:46, Mark Curtis wrote:
> > If your partition sizes are over 100MB iirc then you'll normally see
> > warnings in your system.log, this will outline the partition key, at
> > least in Cassandra 2.0 and 2.1 as I recall.
> >
> > Your best friend here is nodetool cfstats which shows you the
> > min/mean/max partition sizes for your table. It's quite often used to
> > pinpoint large partitons on nodes in a cluster.
> >
> > More info
> > here: https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/t
> oolsCFstats.html
>
> Folks,
>
> It is *Apache* Cassandra. If you are going to point to docs, please
> point to the official Apache docs unless there is a very good reason not
> to.
>
> In this case:
>
> http://cassandra.apache.org/doc/latest/configuration/cassand
> ra_config_file.html#compaction_large_partition_warning_threshold_mb
>
> looks to the place.
>
> Mark
>
>
> >
> > Thanks
> >
> > Mark
> >
> >
> > On 9 September 2016 at 02:53, Anshu Vajpayee  > > wrote:
> >
> > Is there any way to get partition size for a  partition key ?
> >
> >
>
>


Re: Partition size

2016-09-09 Thread Jeff Jirsa


On 9/9/16, 12:14 PM, "Mark Thomas"  wrote:

> If you are going to point to docs, please
>point to the official Apache docs unless there is a very good reason not to.
>

(And if the good reason is that there’s a deficiency in the apache Cassandra 
docs, please make it known on the list or in a jira so someone can write what’s 
missing)




smime.p7s
Description: S/MIME cryptographic signature


Re: Partition size

2016-09-09 Thread Jeff Jirsa


On 9/9/16, 8:47 AM, "Rakesh Kumar"  wrote:

>> If your partition sizes are over 100MB iirc then you'll normally see
>> warnings in your system.log, this will outline the partition key, at least
>> in Cassandra 2.0 and 2.1 as I recall.
>
>Has it improved in C* 3.x. What is considered a good partition size in C* 3.x

In modern versions (2.1 and newer), the “real” risk of large partitions is that 
they generate a lot of garbage on read – it’s not a 1:1 equivalence, but it’s 
linear, and a partition that’s 10x as large generates 10x as much garbage.

You can tune around it (very large new gen, for example), but it’s best fixed 
at the data model most of the time.

The long term fix will be Cassandra-9754, which is a work in progress. The 
short term fix for 3.x was http://issues.apache.org/jira/browse/CASSANDRA-11206 
, which went into 3.6 and higher

In the notes on 11206, you’ll see that Robert Stupp tested up to an 8GB 
partition – while nobody’s going to recommend you create a data model with 8gb 
partitions, I imagine you may find partitions in that rough order of magnitude 
acceptable.


smime.p7s
Description: S/MIME cryptographic signature


Re: Partition size

2016-09-09 Thread Mark Thomas
On 09/09/2016 16:46, Mark Curtis wrote:
> If your partition sizes are over 100MB iirc then you'll normally see
> warnings in your system.log, this will outline the partition key, at
> least in Cassandra 2.0 and 2.1 as I recall.
> 
> Your best friend here is nodetool cfstats which shows you the
> min/mean/max partition sizes for your table. It's quite often used to
> pinpoint large partitons on nodes in a cluster.
> 
> More info
> here: 
> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html

Folks,

It is *Apache* Cassandra. If you are going to point to docs, please
point to the official Apache docs unless there is a very good reason not to.

In this case:

http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb

looks to the place.

Mark


> 
> Thanks
> 
> Mark
> 
> 
> On 9 September 2016 at 02:53, Anshu Vajpayee  > wrote:
> 
> Is there any way to get partition size for a  partition key ?
> 
> 



Re: Partition size

2016-09-09 Thread Mark Curtis
On 9 September 2016 at 16:47, Rakesh Kumar 
wrote:

> On Fri, Sep 9, 2016 at 11:46 AM, Mark Curtis 
> wrote:
> > If your partition sizes are over 100MB iirc then you'll normally see
> > warnings in your system.log, this will outline the partition key, at
> least
> > in Cassandra 2.0 and 2.1 as I recall.
>
> Has it improved in C* 3.x. What is considered a good partition size in C*
> 3.x
>

The 100MB is just a default setting you can set this up or down as you need
it:

https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__compaction_large_partition_warning_threshold_mb

There isn't really a "good" or "bad" value, it all depends on the data
model, your query patterns and required response times as to what's
acceptable for your application. The 100MB default is just a guide.

If you're seeing partitions of 1GB and above then you may very well start
to see problems. Again cfstats is your friend here!

-Mark


Re: Partition size

2016-09-09 Thread Rakesh Kumar
On Fri, Sep 9, 2016 at 11:46 AM, Mark Curtis  wrote:
> If your partition sizes are over 100MB iirc then you'll normally see
> warnings in your system.log, this will outline the partition key, at least
> in Cassandra 2.0 and 2.1 as I recall.

Has it improved in C* 3.x. What is considered a good partition size in C* 3.x


Re: Partition size

2016-09-09 Thread Mark Curtis
If your partition sizes are over 100MB iirc then you'll normally see
warnings in your system.log, this will outline the partition key, at least
in Cassandra 2.0 and 2.1 as I recall.

Your best friend here is nodetool cfstats which shows you the min/mean/max
partition sizes for your table. It's quite often used to pinpoint large
partitons on nodes in a cluster.

More info here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html

Thanks

Mark


On 9 September 2016 at 02:53, Anshu Vajpayee 
wrote:

> Is there any way to get partition size for a  partition key ?
>


Re: Is a blob storage cost of cassandra is the same as bigint storage cost for long variables?

2016-09-09 Thread Romain Hardouin
Note that LZ4 compression is used by default. If you want to disable 
compression you can do this:CREATE/ALTER TABLE ... WITH compression = { 
'sstable_compression' : '' };
Best,
Romain
 

Le Vendredi 9 septembre 2016 8h12, Alexandr Porunov 
 a écrit :
 

 Hello Romain,
Thank you very much for the explanation!
I have just run a simple test to compare both situations.I have run two VM 
equivalent machines.Machine 1:CREATE KEYSPACE "test" WITH REPLICATION = { 
'class' : 'SimpleStrategy', 'replication_factor' : 1 };
CREATE TABLE test.simple (  id bigint PRIMARY KEY);
Machine 2:CREATE KEYSPACE "test" WITH REPLICATION = { 'class' : 
'SimpleStrategy', 'replication_factor' : 1 };
CREATE TABLE test.simple (  id blob PRIMARY KEY);
And have put 13421772 primary keys from 1 to 13421772 in both machines.
Results:Machine 1: size of the data folder: 495864 bytesMachine 2: size of the 
data folder: 495004 bytes
So here is almost no any difference between them (even happened with blob 
storage cost 1 MB less).
I am happy about it because I need to store special encoded primary keys with 
80 bits each. So I can use blob as a primary key without hesitation.
Best regards,Alexandr
On Fri, Sep 9, 2016 at 1:20 AM, Romain Hardouin  wrote:

Hi,
Disk-wise it's the same because a bigint is serialized as a 8 bytes ByteBuffer 
and if you want to store a Long as bytes into a blob type it will take 8 bytes 
too, right?The difference is the validation. The blob ByteBuffer will be stored 
as is whereas the bigint will be validated. So technically the Long is slower, 
but I guess that's not noticeable.
Yes you can use a blob as a partition key. I would use the bigint both for 
validation and clarity. 
Best,
Romain 

Le Mercredi 7 septembre 2016 22h54, Alexandr Porunov 
 a écrit :
 

 Hello,

I need to store a "Long" Java variable.The question is: whether the storage 
cost is the same both for store hex representation of "Long" variable to the 
blob and for store "Long" variable to the bigint?Are there any performance pros 
or cons?Is it OK to use blob as primary key?
Sincerely,Alexandr

   



   

Duplicate fields in cassandra type

2016-09-09 Thread Олег Краюшкин
Hi Everyone,

have a nice day and let me ask for your help.

I'm using Cassandra 2.2.7. Since yesterday I'm getting the exception while
connecting to the cluster via java driver (I'm using v3.0.3 ):

[main] ERROR c.datastax.driver.core.SchemaParser - Error parsing
schema from Cassandra system tables: the schema in
Cluster#getMetadata() will appear incomplete or stale
java.lang.IllegalArgumentException: Multiple entries with same key:
type=[I@c86b9e3 and type=[I@10aa41f2
at 
com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136)
~[guava-19.0.jar:na]
at 
com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:98)
~[guava-19.0.jar:na]
at 
com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:84)
~[guava-19.0.jar:na]
at 
com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:295)
~[guava-19.0.jar:na]
at com.datastax.driver.core.UserType.(UserType.java:62)
~[cassandra-driver-core-3.0.3.jar:na]
at com.datastax.driver.core.UserType.build(UserType.java:85)
~[cassandra-driver-core-3.0.3.jar:na]
at 
com.datastax.driver.core.SchemaParser.buildUserTypes(SchemaParser.java:196)
~[cassandra-driver-core-3.0.3.jar:na]
at 
com.datastax.driver.core.SchemaParser.buildKeyspaces(SchemaParser.java:127)
~[cassandra-driver-core-3.0.3.jar:na]
at com.datastax.driver.core.SchemaParser.refresh(SchemaParser.java:64)
~[cassandra-driver-core-3.0.3.jar:na]
at 
com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:337)
[cassandra-driver-core-3.0.3.jar:na]
at 
com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:277)
[cassandra-driver-core-3.0.3.jar:na]
at 
com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:201)
[cassandra-driver-core-3.0.3.jar:na]
at 
com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
[cassandra-driver-core-3.0.3.jar:na]
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1424)
[cassandra-driver-core-3.0.3.jar:na]
at com.datastax.driver.core.Cluster.init(Cluster.java:163)
[cassandra-driver-core-3.0.3.jar:na]
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:334)
[cassandra-driver-core-3.0.3.jar:na]
at com.datastax.driver.core.Cluster.connect(Cluster.java:284)
[cassandra-driver-core-3.0.3.jar:na]

I discovered, that problem is related with some of my User Defined Types.
For example, I had UDF comment with the definition:

CREATE TYPE ks.comment (
id int,
from_id int,
date bigint,
text text,
attachments frozen>>
)

but now when I'm querying through cqlsh with DESC TYPE ks.comment, I can
see that all fields got duplicated:

CREATE TYPE veil.comment (
id int,
from_id int,
date bigint,
text text,
likes frozen,
id int,
from_id int,
date bigint,
text text,
likes frozen
)

Some of my other observations:

   - data seems fine. SELECT comments FROM post through cqlsh gives data
   without duplicates or empty space for duplicate fields.
   - only those UDT's got messy, which contains other UDT's as a field
   (field likes in the example above).


*First question:* how is that even possible? If you try to create such
type, you'll get an error like Duplicate field name .. in type ...

*Second question:* what are the possible ways to handle this? Can I just
ALTER these types to their initial definitions without risk of data loss?


Thanks a lot for your time, any suggestion would be appreciated.

P.S. Also, excuse me please for poor grammar and possible mistakes
with appealing to mailing list -- I'm quite new to this.


Re: Isolation in case of Single Partition Writes and Batching with LWT

2016-09-09 Thread Benedict Elliott Smith
Yes, each partition modified by a batch has its modifications applied
altogether, atomically (at the node level).

On Friday, 9 September 2016, Bhuvan Rawal  wrote:

> As per this
>  doc
> conditional batches can contain queries only belonging to that partition.
> On trying it in 3.6 I got this exception as expected:
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Batch with conditions cannot span multiple partitions"
>
> On trying single partition batch with multiple LWT statements cassandra
> accepted them at times and rejected the complete batch statement based on
> the other LWT. I mean in the below batch
> BEGIN BATCH
> Statement 1 IF SOME CONDITION;
> Statement 2 IF SOME CONDITION2;
> Statement 3;
> APPLY BATCH;
>
> LWT of Either of Statement 1/2 was being observed of batch to be
> successful or fail and as per this doc
>  "If
> one statement in a batch is a conditional update, the conditional logic
> must return true, or the entire batch fails." Thats what must be
> essentially happening and therefore having more than one lwt may not make a
> lot of sence.
>
> One query still remains though, can single partition batch considered to
> be isolated per replica. Say if there are 5 rows in a partition and we are
> updating all using lwt the clients should read either all of them old or
> all of them during batch update.
>
> Will be glad if someone can  clarify the above doubt.
>
>
>
> On Tue, Sep 6, 2016 at 11:18 PM, Bhuvan Rawal  > wrote:
>
>> Hi,
>>
>> We are working to solve on a multi threaded distributed design which in
>> which a thread reads current state from Cassandra (Single partition ~ 20
>> Rows), does some computation and saves it back in. But it needs to be
>> ensured that in between reading and writing by that thread any other thread
>> should not have saved any operation on that partition.
>>
>> We have thought of a solution for the same - *having a write_time column*
>> in the schema and making it static. Every time the thread picks up a job
>> read will be performed with LOCAL_QUORUM. While writing into Cassandra
>> batch will contain a LWT (IF write_time is read time) otherwise read will
>> be performed and computation will be done again and so on. This will ensure
>> that while saving partition is in a state it was read from.
>>
>> In order to avoid race condition we need to ensure couple of things:
>>
>> 1. While saving data in a batch with a single partition (*Rows may be
>> Updates, Deletes, Inserts)* are they Isolated per replica node. (Not
>> necessarily on a cluster as a whole). Is there a possibility of client
>> reading partial rows?
>>
>> 2. If we do a LOCAL_QUORUM read and LOCAL_QUORUM writes in this case
>> could there a chance of inconsistency in this case (When LWT is being used
>> in batches).
>>
>> 3. Is it possible to use multiple LWT in a single Batch? In general how
>> does LWT performs with Batch and is Paxos acted on before batch execution?
>>
>> Can someone help us with this?
>>
>> Thanks & Regards,
>> Bhuvan
>>
>>
>


Re: Isolation in case of Single Partition Writes and Batching with LWT

2016-09-09 Thread Bhuvan Rawal
As per this
 doc
conditional batches can contain queries only belonging to that partition.
On trying it in 3.6 I got this exception as expected:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Batch
with conditions cannot span multiple partitions"

On trying single partition batch with multiple LWT statements cassandra
accepted them at times and rejected the complete batch statement based on
the other LWT. I mean in the below batch
BEGIN BATCH
Statement 1 IF SOME CONDITION;
Statement 2 IF SOME CONDITION2;
Statement 3;
APPLY BATCH;

LWT of Either of Statement 1/2 was being observed of batch to be successful
or fail and as per this doc
 "If
one statement in a batch is a conditional update, the conditional logic
must return true, or the entire batch fails." Thats what must be
essentially happening and therefore having more than one lwt may not make a
lot of sence.

One query still remains though, can single partition batch considered to be
isolated per replica. Say if there are 5 rows in a partition and we are
updating all using lwt the clients should read either all of them old or
all of them during batch update.

Will be glad if someone can  clarify the above doubt.



On Tue, Sep 6, 2016 at 11:18 PM, Bhuvan Rawal  wrote:

> Hi,
>
> We are working to solve on a multi threaded distributed design which in
> which a thread reads current state from Cassandra (Single partition ~ 20
> Rows), does some computation and saves it back in. But it needs to be
> ensured that in between reading and writing by that thread any other thread
> should not have saved any operation on that partition.
>
> We have thought of a solution for the same - *having a write_time column*
> in the schema and making it static. Every time the thread picks up a job
> read will be performed with LOCAL_QUORUM. While writing into Cassandra
> batch will contain a LWT (IF write_time is read time) otherwise read will
> be performed and computation will be done again and so on. This will ensure
> that while saving partition is in a state it was read from.
>
> In order to avoid race condition we need to ensure couple of things:
>
> 1. While saving data in a batch with a single partition (*Rows may be
> Updates, Deletes, Inserts)* are they Isolated per replica node. (Not
> necessarily on a cluster as a whole). Is there a possibility of client
> reading partial rows?
>
> 2. If we do a LOCAL_QUORUM read and LOCAL_QUORUM writes in this case could
> there a chance of inconsistency in this case (When LWT is being used in
> batches).
>
> 3. Is it possible to use multiple LWT in a single Batch? In general how
> does LWT performs with Batch and is Paxos acted on before batch execution?
>
> Can someone help us with this?
>
> Thanks & Regards,
> Bhuvan
>
>