Re: 答复: Time serial column family design

2018-04-17 Thread Eric Plowe
Jon,

Great article. Thank you. (I have nothing to do with this issue, but I
appreciate nuggets of information I glean from the list)

Regards,

Eric
On Tue, Apr 17, 2018 at 10:57 PM Jonathan Haddad  wrote:

> To add to what Nate suggested, we have an entire blog post on scaling time
> series data models:
>
>
> http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html
>
> Jon
>
>
> On Tue, Apr 17, 2018 at 7:39 PM Nate McCall 
> wrote:
>
>> I disagree. Create date as a raw integer is an excellent surrogate for
>> controlling time series "buckets" as it gives you complete control over the
>> granularity. You can even have multiple granularities in the same table -
>> remember that partition key "misses" in Cassandra are pretty lightweight as
>> they won't make it past the bloom filter on the read path.
>>
>> On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja 
>> wrote:
>>
>>> Hi David,
>>>
>>> Could you describe why you chose to include the create date in the
>>> partition key? If the vin in enough "partitioning", meaning that the size
>>> (number of rows x size of row) of each partition is less than 100MB, then
>>> remove the date and just use the create_time, because the date is already
>>> included in that column anyways.
>>>
>>> For example if columns "a" and "b" (from your table) are of max 256 UTF8
>>> characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
>>> per partition. You can actually have many more but you don't want to go
>>> much higher for performance reasons.
>>>
>>> If this is not enough you could use create_month instead of create_date,
>>> for example, to reduce the partition size while not being too granular.
>>>
>>>
>>> On Tue, 17 Apr 2018, 22:17 Nate McCall,  wrote:
>>>
 Your table design will work fine as you have appropriately bucketed by
 an integer-based 'create_date' field.

 Your goal for this refactor should be to remove the "IN" clause from
 your code. This will move the rollup of multiple partition keys being
 retrieved into the client instead of relying on the coordinator assembling
 the results. You have to do more work and add some complexity, but the
 trade off will be much higher performance as you are removing the single
 coordinator as the bottleneck.

 On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni 
 wrote:

> Hi Nate,
>
> Thanks for your reply!
>
> Is there other way to design this table to meet this requirement?
>
>
>
> Best Regards,
>
>
>
> 倪项菲*/ **David Ni*
>
> 中移德电网络科技有限公司
>
> Virtue Intelligent Network Ltd, co.
>
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>
> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>
>
>
> *发件人:* Nate McCall 
> *发送时间:* 2018年4月17日 7:12
> *收件人:* Cassandra Users 
> *主题:* Re: Time serial column family design
>
>
>
>
>
> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date
> in (20180416, 20180415, 20180414, 20180413, 20180412….);
>
> But this cause the cql query is very long,and I don’t know whether
> there is limitation for the length of the cql.
>
> Please give me some advice,thanks in advance.
>
>
>
> Using the SELECT ... IN syntax  means that:
>
> - the driver will not be able to route the queries to the nodes which
> have the partition
>
> - a single coordinator must scatter-gather the query and results
>
>
>
> Break this up into a series of single statements using the
> executeAsync method and gather the results via something like Futures in
> Guava or similar.
>



 --
 -
 Nate McCall
 Wellington, NZ
 @zznate

 CTO
 Apache Cassandra Consulting
 http://www.thelastpickle.com

>>>
>>
>>
>> --
>> -
>> Nate McCall
>> Wellington, NZ
>> @zznate
>>
>> CTO
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>


Re: 答复: Time serial column family design

2018-04-17 Thread Jonathan Haddad
To add to what Nate suggested, we have an entire blog post on scaling time
series data models:

http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html

Jon


On Tue, Apr 17, 2018 at 7:39 PM Nate McCall  wrote:

> I disagree. Create date as a raw integer is an excellent surrogate for
> controlling time series "buckets" as it gives you complete control over the
> granularity. You can even have multiple granularities in the same table -
> remember that partition key "misses" in Cassandra are pretty lightweight as
> they won't make it past the bloom filter on the read path.
>
> On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja 
> wrote:
>
>> Hi David,
>>
>> Could you describe why you chose to include the create date in the
>> partition key? If the vin in enough "partitioning", meaning that the size
>> (number of rows x size of row) of each partition is less than 100MB, then
>> remove the date and just use the create_time, because the date is already
>> included in that column anyways.
>>
>> For example if columns "a" and "b" (from your table) are of max 256 UTF8
>> characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
>> per partition. You can actually have many more but you don't want to go
>> much higher for performance reasons.
>>
>> If this is not enough you could use create_month instead of create_date,
>> for example, to reduce the partition size while not being too granular.
>>
>>
>> On Tue, 17 Apr 2018, 22:17 Nate McCall,  wrote:
>>
>>> Your table design will work fine as you have appropriately bucketed by
>>> an integer-based 'create_date' field.
>>>
>>> Your goal for this refactor should be to remove the "IN" clause from
>>> your code. This will move the rollup of multiple partition keys being
>>> retrieved into the client instead of relying on the coordinator assembling
>>> the results. You have to do more work and add some complexity, but the
>>> trade off will be much higher performance as you are removing the single
>>> coordinator as the bottleneck.
>>>
>>> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni 
>>> wrote:
>>>
 Hi Nate,

 Thanks for your reply!

 Is there other way to design this table to meet this requirement?



 Best Regards,



 倪项菲*/ **David Ni*

 中移德电网络科技有限公司

 Virtue Intelligent Network Ltd, co.

 Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

 Mob: +86 13797007811|Tel: + 86 27 5024 2516



 *发件人:* Nate McCall 
 *发送时间:* 2018年4月17日 7:12
 *收件人:* Cassandra Users 
 *主题:* Re: Time serial column family design





 Select * from test where vin =“ZD41578123DSAFWE12313” and create_date
 in (20180416, 20180415, 20180414, 20180413, 20180412….);

 But this cause the cql query is very long,and I don’t know whether
 there is limitation for the length of the cql.

 Please give me some advice,thanks in advance.



 Using the SELECT ... IN syntax  means that:

 - the driver will not be able to route the queries to the nodes which
 have the partition

 - a single coordinator must scatter-gather the query and results



 Break this up into a series of single statements using the executeAsync
 method and gather the results via something like Futures in Guava or
 similar.

>>>
>>>
>>>
>>> --
>>> -
>>> Nate McCall
>>> Wellington, NZ
>>> @zznate
>>>
>>> CTO
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>
>
> --
> -
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


答复: 答复: Time serial column family design

2018-04-17 Thread Xiangfei Ni
Hi Javier,
VIN is the Vehicle Identity Number, the Vehicle upload the inform from can-bus 
every 10 second,this table contains about 20 columns,so if we can just VIN as 
the partition key, every vehicle just has only one partition,the partition will 
become very large and never stop increasing,this is why we use the create_date 
in the partition key,this sounds good .
But we have requirement that we need to query the history data for a 
vehicle,for example,we need to query the vehicle data from 2018-01-01 until 
now.If we use create_month in the partition key,we can only get whole month 
data but not exact day data.
I found an article:
https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/
so your suggestion is to get the data by below code:
[cid:image001.png@01D3D702.064C0C70]
We need to test it.
Is there other design pattern to meet this requirement with better 
performance?
Best Regards,

倪项菲/ David Ni
中移德电网络科技有限公司
Virtue Intelligent Network Ltd, co.
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
Mob: +86 13797007811|Tel: + 86 27 5024 2516

发件人: Javier Pareja 
发送时间: 2018年4月18日 6:00
收件人: user@cassandra.apache.org
主题: Re: 答复: Time serial column family design

Hi David,

Could you describe why you chose to include the create date in the partition 
key? If the vin in enough "partitioning", meaning that the size (number of rows 
x size of row) of each partition is less than 100MB, then remove the date and 
just use the create_time, because the date is already included in that column 
anyways.

For example if columns "a" and "b" (from your table) are of max 256 UTF8 
characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows per 
partition. You can actually have many more but you don't want to go much higher 
for performance reasons.

If this is not enough you could use create_month instead of create_date, for 
example, to reduce the partition size while not being too granular.


On Tue, 17 Apr 2018, 22:17 Nate McCall, 
> wrote:
Your table design will work fine as you have appropriately bucketed by an 
integer-based 'create_date' field.

Your goal for this refactor should be to remove the "IN" clause from your code. 
This will move the rollup of multiple partition keys being retrieved into the 
client instead of relying on the coordinator assembling the results. You have 
to do more work and add some complexity, but the trade off will be much higher 
performance as you are removing the single coordinator as the bottleneck.

On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni 
> wrote:
Hi Nate,
Thanks for your reply!
Is there other way to design this table to meet this requirement?

Best Regards,

倪项菲/ David Ni
中移德电网络科技有限公司
Virtue Intelligent Network Ltd, co.
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
Mob: +86 13797007811|Tel: + 86 27 5024 2516

发件人: Nate McCall >
发送时间: 2018年4月17日 7:12
收件人: Cassandra Users 
>
主题: Re: Time serial column family design


Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in 
(20180416, 20180415, 20180414, 20180413, 20180412….);
But this cause the cql query is very long,and I don’t know whether there is 
limitation for the length of the cql.
Please give me some advice,thanks in advance.

Using the SELECT ... IN syntax  means that:
- the driver will not be able to route the queries to the nodes which have the 
partition
- a single coordinator must scatter-gather the query and results

Break this up into a series of single statements using the executeAsync method 
and gather the results via something like Futures in Guava or similar.



--
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: 答复: Time serial column family design

2018-04-17 Thread Nate McCall
I disagree. Create date as a raw integer is an excellent surrogate for
controlling time series "buckets" as it gives you complete control over the
granularity. You can even have multiple granularities in the same table -
remember that partition key "misses" in Cassandra are pretty lightweight as
they won't make it past the bloom filter on the read path.

On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja 
wrote:

> Hi David,
>
> Could you describe why you chose to include the create date in the
> partition key? If the vin in enough "partitioning", meaning that the size
> (number of rows x size of row) of each partition is less than 100MB, then
> remove the date and just use the create_time, because the date is already
> included in that column anyways.
>
> For example if columns "a" and "b" (from your table) are of max 256 UTF8
> characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
> per partition. You can actually have many more but you don't want to go
> much higher for performance reasons.
>
> If this is not enough you could use create_month instead of create_date,
> for example, to reduce the partition size while not being too granular.
>
>
> On Tue, 17 Apr 2018, 22:17 Nate McCall,  wrote:
>
>> Your table design will work fine as you have appropriately bucketed by an
>> integer-based 'create_date' field.
>>
>> Your goal for this refactor should be to remove the "IN" clause from your
>> code. This will move the rollup of multiple partition keys being retrieved
>> into the client instead of relying on the coordinator assembling the
>> results. You have to do more work and add some complexity, but the trade
>> off will be much higher performance as you are removing the single
>> coordinator as the bottleneck.
>>
>> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni 
>> wrote:
>>
>>> Hi Nate,
>>>
>>> Thanks for your reply!
>>>
>>> Is there other way to design this table to meet this requirement?
>>>
>>>
>>>
>>> Best Regards,
>>>
>>>
>>>
>>> 倪项菲*/ **David Ni*
>>>
>>> 中移德电网络科技有限公司
>>>
>>> Virtue Intelligent Network Ltd, co.
>>>
>>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>>>
>>> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>>>
>>>
>>>
>>> *发件人:* Nate McCall 
>>> *发送时间:* 2018年4月17日 7:12
>>> *收件人:* Cassandra Users 
>>> *主题:* Re: Time serial column family design
>>>
>>>
>>>
>>>
>>>
>>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in
>>> (20180416, 20180415, 20180414, 20180413, 20180412….);
>>>
>>> But this cause the cql query is very long,and I don’t know whether there
>>> is limitation for the length of the cql.
>>>
>>> Please give me some advice,thanks in advance.
>>>
>>>
>>>
>>> Using the SELECT ... IN syntax  means that:
>>>
>>> - the driver will not be able to route the queries to the nodes which
>>> have the partition
>>>
>>> - a single coordinator must scatter-gather the query and results
>>>
>>>
>>>
>>> Break this up into a series of single statements using the executeAsync
>>> method and gather the results via something like Futures in Guava or
>>> similar.
>>>
>>
>>
>>
>> --
>> -
>> Nate McCall
>> Wellington, NZ
>> @zznate
>>
>> CTO
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Is it safe to use paxos protocol in LWT from patent perspective ?

2018-04-17 Thread Hiroyuki Yamada
Hi all,

I'm wondering if it is safe to use paxos protocol in LWT from patent
perspective.
I found some paxos-related patents here.


Does anyone know about this ?

Best regards,
Hiroyuki

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: where does c* store the schema?

2018-04-17 Thread Blake Eggleston
Rahul, none of that is true at all. 

 

Each node stores schema locally in a non-replicated system table. Schema 
changes are disseminated directly to live nodes (not the write path), and the 
schema version is gossiped to other nodes. If a node misses a schema update, it 
will figure this out when it notices that it’s local schema version is behind 
the one being gossiped by the rest of the cluster, and will pull the updated 
schema from the other nodes in the cluster.

 

From: Rahul Singh 
Reply-To: 
Date: Tuesday, April 17, 2018 at 4:13 PM
To: 
Subject: Re: where does c* store the schema?

 

It uses a “everywhere” replication strategy and its recommended to do all alter 
/ create / drop statements with consistency level all — meaning it wouldn’t 
make the change to the schema if the nodes are up.


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation


On Apr 17, 2018, 12:31 AM -0500, Jinhua Luo , wrote:


Yes, I know it must be in system schema.

But how c* replicates the user defined schema to all nodes? If it
applies the same RWN model to them, then what's the R and W?
And when a failed node comes back to the cluster, how to recover the
schema updates it may miss during the outage?

2018-04-16 17:01 GMT+08:00 DuyHai Doan :


There is a system_schema keyspace to store all the schema information

https://docs.datastax.com/en/cql/3.3/cql/cql_using/useQuerySystem.html#useQuerySystem__table_bhg_1bw_4v

On Mon, Apr 16, 2018 at 10:48 AM, Jinhua Luo  wrote:



Hi All,

Does c* use predefined keyspace/tables to store the user defined schema?
If so, what's the RWN of those meta schema? And what's the procedure
to update them?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: multiple table directories for system_schema keyspace

2018-04-17 Thread Rahul Singh
Happens to any keyspace — not just system — if there are competing processes 
initializing the system , creating / altering new things without CL=all it may 
do this. I ran into a scenario where when permissions were flipped to a non 
Cassandra user, the Cassandra daemon lost access to the data so it 
reinitialized the system.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 17, 2018, 2:25 PM -0500, John Sanda , wrote:
> On a couple different occasions I have run into this exception at start up:
>
> Exception (org.apache.cassandra.exceptions.InvalidRequestException) 
> encountered during startup: Unknown type 
> org.apache.cassandra.exceptions.InvalidRequestException: Unknown type  type>
>         at 
> org.apache.cassandra.cql3.CQL3Type$Raw$RawUT.prepare(CQL3Type.java:745)
>         at 
> org.apache.cassandra.cql3.CQL3Type$Raw.prepareInternal(CQL3Type.java:533)
>         at 
> org.apache.cassandra.schema.CQLTypeParser.parse(CQLTypeParser.java:53)
>         at 
> org.apache.cassandra.schema.SchemaKeyspace.createColumnFromRow(SchemaKeyspace.java:1052)
>         at 
> org.apache.cassandra.schema.SchemaKeyspace.lambda$fetchColumns$12(SchemaKeyspace.java:1038)
>
> This was with Cassandra 3.0.12 running in Kubernetes, which means that IP 
> address changes for the Cassandra node can and will happen. Nowhere in client 
> code does the UDT get dropped. I came across 
> https://issues.apache.org/jira/browse/CASSANDRA-13739 which got me wondering 
> if this particular Cassandra node wound up with another version of the 
> system_schema.types table which did not have the UDT.
>
> In what circumstances could I end up with multiple table directories for the 
> tables in system_schema? Right now I am just guessing that I wound up with a 
> newer (or different) version of the system_schema.types table. Unfortunately, 
> I no longer have access to the environment to confirm/deny what was 
> happening. I just want to better understand so I can avoid it in the future.
>
>
> - John


Re: Cassandra read process

2018-04-17 Thread Rahul Singh
Did you look at the answer the guy gave?

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 17, 2018, 5:12 AM -0500, vishal1.sha...@ril.com, wrote:
> Dear Community,
>
> Can you please help in answering the question below:
>
> https://stackoverflow.com/questions/49769643/cassandra-read-process
>
> Thanks and regards,
> Vishal Sharma
>
> "Confidentiality Warning: This message and any attachments are intended only 
> for the use of the intended recipient(s), are confidential and may be 
> privileged. If you are not the intended recipient, you are hereby notified 
> that any review, re-transmission, conversion to hard copy, copying, 
> circulation or other use of this message and any attachments is strictly 
> prohibited. If you are not the intended recipient, please notify the sender 
> immediately by return email and delete this message and any attachments from 
> your system.
> Virus Warning: Although the company has taken reasonable precautions to 
> ensure no viruses are present in this email. The company cannot accept 
> responsibility for any loss or damage arising from the use of this email or 
> attachment."


Re: where does c* store the schema?

2018-04-17 Thread Rahul Singh
It uses a “everywhere” replication strategy and its recommended to do all alter 
/ create / drop statements with consistency level all — meaning it wouldn’t 
make the change to the schema if the nodes are up.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 17, 2018, 12:31 AM -0500, Jinhua Luo , wrote:
> Yes, I know it must be in system schema.
>
> But how c* replicates the user defined schema to all nodes? If it
> applies the same RWN model to them, then what's the R and W?
> And when a failed node comes back to the cluster, how to recover the
> schema updates it may miss during the outage?
>
> 2018-04-16 17:01 GMT+08:00 DuyHai Doan :
> > There is a system_schema keyspace to store all the schema information
> >
> > https://docs.datastax.com/en/cql/3.3/cql/cql_using/useQuerySystem.html#useQuerySystem__table_bhg_1bw_4v
> >
> > On Mon, Apr 16, 2018 at 10:48 AM, Jinhua Luo  wrote:
> > >
> > > Hi All,
> > >
> > > Does c* use predefined keyspace/tables to store the user defined schema?
> > > If so, what's the RWN of those meta schema? And what's the procedure
> > > to update them?
> > >
> > > -
> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: user-h...@cassandra.apache.org
> > >
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>


Re: copy from one table to another

2018-04-17 Thread Rahul Singh
1. Make a new table with the same schema.
For each node
2. Shutdown node
3. Copy data from Source sstable dir to new sstable dir.

This will do what you want.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 16, 2018, 4:21 PM -0500, Kyrylo Lebediev , 
wrote:
> Thanks,  Ali.
> I just need to copy a large table in production without actual copying by 
> using hardlinks. After this both tables should be used independently (RW). Is 
> this a supported way or not?
>
> Regards,
> Kyrill
> From: Ali Hubail 
> Sent: Monday, April 16, 2018 6:51:51 PM
> To: user@cassandra.apache.org
> Subject: Re: copy from one table to another
>
> If you want to copy a portion of the data to another table, you can also use 
> sstable cql writer. It is more of an advanced feature and can be tricky, but 
> doable.
> once you write the new sstables, you can then use the sstableloader to stream 
> the new data into the new table.
> check this out:
> https://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated
>
> I have recently used this to clean up 500 GB worth of sstable data in order 
> to purge tombstones that were mistakenly generated by the client.
> obviously this is not as fast as hardlinks + refresh, but it's much faster 
> and more efficient than using cql to copy data accross the tables.
> take advantage of CQLSSTableWriter.builder.sorted() if you can, and utilize 
> writetime if you have to.
>
> Ali Hubail
>
> Confidentiality warning: This message and any attachments are intended only 
> for the persons to whom this message is addressed, are confidential, and may 
> be privileged. If you are not the intended recipient, you are hereby notified 
> that any review, retransmission, conversion to hard copy, copying, 
> modification, circulation or other use of this message and any attachments is 
> strictly prohibited. If you receive this message in error, please notify the 
> sender immediately by return email, and delete this message and any 
> attachments from your system. Petrolink International Limited its 
> subsidiaries, holding companies and affiliates disclaims all responsibility 
> from and accepts no liability whatsoever for the consequences of any 
> unauthorized person acting, or refraining from acting, on any information 
> contained in this message. For security purposes, staff training, to assist 
> in resolving complaints and to improve our customer service, email 
> communications may be monitored and telephone calls may be recorded.
>
>
> Kyrylo Lebediev 
> 04/16/2018 10:37 AM
> Please respond to
> user@cassandra.apache.org
>
> To
> "user@cassandra.apache.org" ,
> cc
> Subject
> Re: copy from one table to another
>
>
>
>
>
> Any issues if we:
>
> 1) create an new empty table with the same structure as the old one
> 2) create hardlinks ("ln without -s"): 
> .../-/--* ---> 
> .../-/--*
> 3) run nodetool refresh -- newkeyspacename newtable
>
> and then query/modify both tables independently/simultaneously?
>
> In theory, as SSTables are immutable, this should work, but could there be 
> some hidden issues?
>
> Regards,
> Kyrill
>
> From: Dmitry Saprykin 
> Sent: Sunday, April 8, 2018 7:33:03 PM
> To: user@cassandra.apache.org
> Subject: Re: copy from one table to another
>
> You can copy hardlinks to ALL SSTables from old to new table and then delete 
> part of data you do not need in a new one.
>
> On Sun, Apr 8, 2018 at 10:20 AM, Nitan Kainth  wrote:
> If it for testing and you don’t need any specific data, just copy a set of 
> sstables with all files of that sequence and move to target tables directory 
> and rename it.
>
> Restart target node or run nodetool refresh
>
> Sent from my iPhone
>
> On Apr 8, 2018, at 4:15 AM, onmstester onmstester  wrote:
>
> Is there any way to copy some part of a table to another table in cassandra? 
> A large amount of data should be copied so i don't want to fetch data to 
> client and stream it back to cassandra using cql.
>
> Sent using Zoho Mail
>
>
>


Re: 答复: Time serial column family design

2018-04-17 Thread Javier Pareja
Hi David,

Could you describe why you chose to include the create date in the
partition key? If the vin in enough "partitioning", meaning that the size
(number of rows x size of row) of each partition is less than 100MB, then
remove the date and just use the create_time, because the date is already
included in that column anyways.

For example if columns "a" and "b" (from your table) are of max 256 UTF8
characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
per partition. You can actually have many more but you don't want to go
much higher for performance reasons.

If this is not enough you could use create_month instead of create_date,
for example, to reduce the partition size while not being too granular.


On Tue, 17 Apr 2018, 22:17 Nate McCall,  wrote:

> Your table design will work fine as you have appropriately bucketed by an
> integer-based 'create_date' field.
>
> Your goal for this refactor should be to remove the "IN" clause from your
> code. This will move the rollup of multiple partition keys being retrieved
> into the client instead of relying on the coordinator assembling the
> results. You have to do more work and add some complexity, but the trade
> off will be much higher performance as you are removing the single
> coordinator as the bottleneck.
>
> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni 
> wrote:
>
>> Hi Nate,
>>
>> Thanks for your reply!
>>
>> Is there other way to design this table to meet this requirement?
>>
>>
>>
>> Best Regards,
>>
>>
>>
>> 倪项菲*/ **David Ni*
>>
>> 中移德电网络科技有限公司
>>
>> Virtue Intelligent Network Ltd, co.
>>
>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>>
>> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>>
>>
>>
>> *发件人:* Nate McCall 
>> *发送时间:* 2018年4月17日 7:12
>> *收件人:* Cassandra Users 
>> *主题:* Re: Time serial column family design
>>
>>
>>
>>
>>
>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in
>> (20180416, 20180415, 20180414, 20180413, 20180412….);
>>
>> But this cause the cql query is very long,and I don’t know whether there
>> is limitation for the length of the cql.
>>
>> Please give me some advice,thanks in advance.
>>
>>
>>
>> Using the SELECT ... IN syntax  means that:
>>
>> - the driver will not be able to route the queries to the nodes which
>> have the partition
>>
>> - a single coordinator must scatter-gather the query and results
>>
>>
>>
>> Break this up into a series of single statements using the executeAsync
>> method and gather the results via something like Futures in Guava or
>> similar.
>>
>
>
>
> --
> -
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: 答复: Time serial column family design

2018-04-17 Thread Nate McCall
Your table design will work fine as you have appropriately bucketed by an
integer-based 'create_date' field.

Your goal for this refactor should be to remove the "IN" clause from your
code. This will move the rollup of multiple partition keys being retrieved
into the client instead of relying on the coordinator assembling the
results. You have to do more work and add some complexity, but the trade
off will be much higher performance as you are removing the single
coordinator as the bottleneck.

On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni  wrote:

> Hi Nate,
>
> Thanks for your reply!
>
> Is there other way to design this table to meet this requirement?
>
>
>
> Best Regards,
>
>
>
> 倪项菲*/ **David Ni*
>
> 中移德电网络科技有限公司
>
> Virtue Intelligent Network Ltd, co.
>
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>
> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>
>
>
> *发件人:* Nate McCall 
> *发送时间:* 2018年4月17日 7:12
> *收件人:* Cassandra Users 
> *主题:* Re: Time serial column family design
>
>
>
>
>
> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in
> (20180416, 20180415, 20180414, 20180413, 20180412….);
>
> But this cause the cql query is very long,and I don’t know whether there
> is limitation for the length of the cql.
>
> Please give me some advice,thanks in advance.
>
>
>
> Using the SELECT ... IN syntax  means that:
>
> - the driver will not be able to route the queries to the nodes which have
> the partition
>
> - a single coordinator must scatter-gather the query and results
>
>
>
> Break this up into a series of single statements using the executeAsync
> method and gather the results via something like Futures in Guava or
> similar.
>



-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: inexistent columns familes

2018-04-17 Thread Luigi Tagliamonte
got it thanks! will have to tackle this in another way.
Thank you.
Regards
L.

On Tue, Apr 17, 2018 at 1:58 PM, Jeff Jirsa  wrote:

> I imagine 3.0.16 has THIS bug, but it has far fewer other real bugs.
>
>
>
> On Tue, Apr 17, 2018 at 1:56 PM, Luigi Tagliamonte <
> luigi.tagliamont...@gmail.com> wrote:
>
>> Thank you Jeff,
>> my backup scripts works using the cf folders on disk :)
>> it parses all the keyspaces and for each performs: nodetool flush
>> ${keyspace} ${cf} and then nodetool snapshot ${keyspace} -cf ${cf}
>> Does 3.0.16 not having this "bug"?
>> Regards
>> L.
>>
>> On Tue, Apr 17, 2018 at 1:50 PM, Jeff Jirsa  wrote:
>>
>>> It's probably not ideal, but also not really a bug. We need to create
>>> the table in the schema to see if it exists on disk so we know whether or
>>> not to migrate it, and when we learn it's empty, we remove it from the
>>> schema but we dont delete the directory. It's not great, but it's not going
>>> to cause you any problems.
>>>
>>> That said: 3.0.11 may cause you problems, you should strongly consider
>>> 3.0.16 instead.
>>>
>>> On Tue, Apr 17, 2018 at 1:47 PM, Luigi Tagliamonte <
>>> luigi.tagliamont...@gmail.com> wrote:
>>>
 Hello everybody,
 i'm having a problem with a brand new cassandra:3.0.11 node. The
 following tables belonging to the system keyspace:

 - schema_aggregates
 - schema_columnfamilies
 - schema_columns
 - schema_functions
 - schema_keyspaces
 - schema_triggers
 - schema_usertypes


 get initialised on disk:

 *root@ip-10-48-93-149:/var/lib/cassandra/data/system# pwd*
 /var/lib/cassandra/data/system

 *root@ip-10-48-93-149:/var/lib/cassandra/data/system# ls -1*
 IndexInfo-9f5c6374d48532299a0a5094af9ad1e3
 available_ranges-c539fcabd65a31d18133d25605643ee3
 batches-919a4bc57a333573b03e13fc3f68b465
 batchlog-0290003c977e397cac3efdfdc01d626b
 built_views-4b3c50a9ea873d7691016dbc9c38494a
 compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca
 hints-2666e20573ef38b390fefecf96e8f0c7
 local-7ad54392bcdd35a684174e047860b377
 paxos-b7b7f0c2fd0a34108c053ef614bb7c2d
 peer_events-59dfeaea8db2334191ef109974d81484
 peers-37f71aca7dc2383ba70672528af04d4f
 range_xfers-55d764384e553f8b9f6e676d4af3976d
 schema_aggregates-a5fc57fc9d6c3bfda3fc01ad54686fea
 schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697
 schema_columns-296e9c049bec3085827dc17d3df2122a
 schema_functions-d1b675fe2b503ca48e49c0f81989dcad
 schema_keyspaces-b0f2235744583cdb9631c43e59ce3676
 schema_triggers-0359bc7171233ee19a4ab9dfb11fc125
 schema_usertypes-3aa752254f82350b8d5c430fa221fa0a
 size_estimates-618f817b005f3678b8a453f3930b8e86
 sstable_activity-5a1ff267ace03f128563cfae6103c65e
 views_builds_in_progress-b7f2c10878cd3c809cd5d609b2bd149c



 but if I describe the system keyspace those cf are not present.

 cassandra@cqlsh> DESCRIBE KEYSPACE system;

 CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}
 AND durable_writes = true;

 CREATE TABLE system.available_ranges (
 keyspace_name text PRIMARY KEY,
 ranges set
 ) WITH bloom_filter_fp_chance = 0.01
 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
 AND comment = 'available keyspace/ranges during bootstrap/replace
 that are ready to be served'
 AND compaction = {'class': 'org.apache.cassandra.db.compa
 ction.SizeTieredCompactionStrategy', 'max_threshold': '32',
 'min_threshold': '4'}
 AND compression = {'chunk_length_in_kb': '64', 'class': '
 org.apache.cassandra.io.compress.LZ4Compressor'}
 AND crc_check_chance = 1.0
 AND dclocal_read_repair_chance = 0.0
 AND default_time_to_live = 0
 AND gc_grace_seconds = 0
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 360
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99PERCENTILE';

 CREATE TABLE system.batches (
 id timeuuid PRIMARY KEY,
 mutations list,
 version int
 ) WITH bloom_filter_fp_chance = 0.01
 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
 AND comment = 'batches awaiting replay'
 AND compaction = {'class': 'org.apache.cassandra.db.compa
 ction.SizeTieredCompactionStrategy', 'max_threshold': '32',
 'min_threshold': '2'}
 AND compression = {'chunk_length_in_kb': '64', 'class': '
 org.apache.cassandra.io.compress.LZ4Compressor'}
 AND crc_check_chance = 1.0
 AND dclocal_read_repair_chance = 0.0
 AND default_time_to_live = 0
 AND gc_grace_seconds = 0
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 360
 AND min_index_interval = 128
 AND read_repair_chance 

Re: inexistent columns familes

2018-04-17 Thread Jeff Jirsa
I imagine 3.0.16 has THIS bug, but it has far fewer other real bugs.



On Tue, Apr 17, 2018 at 1:56 PM, Luigi Tagliamonte <
luigi.tagliamont...@gmail.com> wrote:

> Thank you Jeff,
> my backup scripts works using the cf folders on disk :)
> it parses all the keyspaces and for each performs: nodetool flush
> ${keyspace} ${cf} and then nodetool snapshot ${keyspace} -cf ${cf}
> Does 3.0.16 not having this "bug"?
> Regards
> L.
>
> On Tue, Apr 17, 2018 at 1:50 PM, Jeff Jirsa  wrote:
>
>> It's probably not ideal, but also not really a bug. We need to create the
>> table in the schema to see if it exists on disk so we know whether or not
>> to migrate it, and when we learn it's empty, we remove it from the schema
>> but we dont delete the directory. It's not great, but it's not going to
>> cause you any problems.
>>
>> That said: 3.0.11 may cause you problems, you should strongly consider
>> 3.0.16 instead.
>>
>> On Tue, Apr 17, 2018 at 1:47 PM, Luigi Tagliamonte <
>> luigi.tagliamont...@gmail.com> wrote:
>>
>>> Hello everybody,
>>> i'm having a problem with a brand new cassandra:3.0.11 node. The
>>> following tables belonging to the system keyspace:
>>>
>>> - schema_aggregates
>>> - schema_columnfamilies
>>> - schema_columns
>>> - schema_functions
>>> - schema_keyspaces
>>> - schema_triggers
>>> - schema_usertypes
>>>
>>>
>>> get initialised on disk:
>>>
>>> *root@ip-10-48-93-149:/var/lib/cassandra/data/system# pwd*
>>> /var/lib/cassandra/data/system
>>>
>>> *root@ip-10-48-93-149:/var/lib/cassandra/data/system# ls -1*
>>> IndexInfo-9f5c6374d48532299a0a5094af9ad1e3
>>> available_ranges-c539fcabd65a31d18133d25605643ee3
>>> batches-919a4bc57a333573b03e13fc3f68b465
>>> batchlog-0290003c977e397cac3efdfdc01d626b
>>> built_views-4b3c50a9ea873d7691016dbc9c38494a
>>> compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca
>>> hints-2666e20573ef38b390fefecf96e8f0c7
>>> local-7ad54392bcdd35a684174e047860b377
>>> paxos-b7b7f0c2fd0a34108c053ef614bb7c2d
>>> peer_events-59dfeaea8db2334191ef109974d81484
>>> peers-37f71aca7dc2383ba70672528af04d4f
>>> range_xfers-55d764384e553f8b9f6e676d4af3976d
>>> schema_aggregates-a5fc57fc9d6c3bfda3fc01ad54686fea
>>> schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697
>>> schema_columns-296e9c049bec3085827dc17d3df2122a
>>> schema_functions-d1b675fe2b503ca48e49c0f81989dcad
>>> schema_keyspaces-b0f2235744583cdb9631c43e59ce3676
>>> schema_triggers-0359bc7171233ee19a4ab9dfb11fc125
>>> schema_usertypes-3aa752254f82350b8d5c430fa221fa0a
>>> size_estimates-618f817b005f3678b8a453f3930b8e86
>>> sstable_activity-5a1ff267ace03f128563cfae6103c65e
>>> views_builds_in_progress-b7f2c10878cd3c809cd5d609b2bd149c
>>>
>>>
>>>
>>> but if I describe the system keyspace those cf are not present.
>>>
>>> cassandra@cqlsh> DESCRIBE KEYSPACE system;
>>>
>>> CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}
>>> AND durable_writes = true;
>>>
>>> CREATE TABLE system.available_ranges (
>>> keyspace_name text PRIMARY KEY,
>>> ranges set
>>> ) WITH bloom_filter_fp_chance = 0.01
>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>> AND comment = 'available keyspace/ranges during bootstrap/replace
>>> that are ready to be served'
>>> AND compaction = {'class': 'org.apache.cassandra.db.compa
>>> ction.SizeTieredCompactionStrategy', 'max_threshold': '32',
>>> 'min_threshold': '4'}
>>> AND compression = {'chunk_length_in_kb': '64', 'class': '
>>> org.apache.cassandra.io.compress.LZ4Compressor'}
>>> AND crc_check_chance = 1.0
>>> AND dclocal_read_repair_chance = 0.0
>>> AND default_time_to_live = 0
>>> AND gc_grace_seconds = 0
>>> AND max_index_interval = 2048
>>> AND memtable_flush_period_in_ms = 360
>>> AND min_index_interval = 128
>>> AND read_repair_chance = 0.0
>>> AND speculative_retry = '99PERCENTILE';
>>>
>>> CREATE TABLE system.batches (
>>> id timeuuid PRIMARY KEY,
>>> mutations list,
>>> version int
>>> ) WITH bloom_filter_fp_chance = 0.01
>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>> AND comment = 'batches awaiting replay'
>>> AND compaction = {'class': 'org.apache.cassandra.db.compa
>>> ction.SizeTieredCompactionStrategy', 'max_threshold': '32',
>>> 'min_threshold': '2'}
>>> AND compression = {'chunk_length_in_kb': '64', 'class': '
>>> org.apache.cassandra.io.compress.LZ4Compressor'}
>>> AND crc_check_chance = 1.0
>>> AND dclocal_read_repair_chance = 0.0
>>> AND default_time_to_live = 0
>>> AND gc_grace_seconds = 0
>>> AND max_index_interval = 2048
>>> AND memtable_flush_period_in_ms = 360
>>> AND min_index_interval = 128
>>> AND read_repair_chance = 0.0
>>> AND speculative_retry = '99PERCENTILE';
>>>
>>> CREATE TABLE system."IndexInfo" (
>>> table_name text,
>>> index_name text,
>>> PRIMARY KEY (table_name, index_name)
>>> ) WITH COMPACT STORAGE
>>> AND CLUSTERING ORDER BY (index_name ASC)
>>> 

Re: inexistent columns familes

2018-04-17 Thread Luigi Tagliamonte
Thank you Jeff,
my backup scripts works using the cf folders on disk :)
it parses all the keyspaces and for each performs: nodetool flush
${keyspace} ${cf} and then nodetool snapshot ${keyspace} -cf ${cf}
Does 3.0.16 not having this "bug"?
Regards
L.

On Tue, Apr 17, 2018 at 1:50 PM, Jeff Jirsa  wrote:

> It's probably not ideal, but also not really a bug. We need to create the
> table in the schema to see if it exists on disk so we know whether or not
> to migrate it, and when we learn it's empty, we remove it from the schema
> but we dont delete the directory. It's not great, but it's not going to
> cause you any problems.
>
> That said: 3.0.11 may cause you problems, you should strongly consider
> 3.0.16 instead.
>
> On Tue, Apr 17, 2018 at 1:47 PM, Luigi Tagliamonte <
> luigi.tagliamont...@gmail.com> wrote:
>
>> Hello everybody,
>> i'm having a problem with a brand new cassandra:3.0.11 node. The
>> following tables belonging to the system keyspace:
>>
>> - schema_aggregates
>> - schema_columnfamilies
>> - schema_columns
>> - schema_functions
>> - schema_keyspaces
>> - schema_triggers
>> - schema_usertypes
>>
>>
>> get initialised on disk:
>>
>> *root@ip-10-48-93-149:/var/lib/cassandra/data/system# pwd*
>> /var/lib/cassandra/data/system
>>
>> *root@ip-10-48-93-149:/var/lib/cassandra/data/system# ls -1*
>> IndexInfo-9f5c6374d48532299a0a5094af9ad1e3
>> available_ranges-c539fcabd65a31d18133d25605643ee3
>> batches-919a4bc57a333573b03e13fc3f68b465
>> batchlog-0290003c977e397cac3efdfdc01d626b
>> built_views-4b3c50a9ea873d7691016dbc9c38494a
>> compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca
>> hints-2666e20573ef38b390fefecf96e8f0c7
>> local-7ad54392bcdd35a684174e047860b377
>> paxos-b7b7f0c2fd0a34108c053ef614bb7c2d
>> peer_events-59dfeaea8db2334191ef109974d81484
>> peers-37f71aca7dc2383ba70672528af04d4f
>> range_xfers-55d764384e553f8b9f6e676d4af3976d
>> schema_aggregates-a5fc57fc9d6c3bfda3fc01ad54686fea
>> schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697
>> schema_columns-296e9c049bec3085827dc17d3df2122a
>> schema_functions-d1b675fe2b503ca48e49c0f81989dcad
>> schema_keyspaces-b0f2235744583cdb9631c43e59ce3676
>> schema_triggers-0359bc7171233ee19a4ab9dfb11fc125
>> schema_usertypes-3aa752254f82350b8d5c430fa221fa0a
>> size_estimates-618f817b005f3678b8a453f3930b8e86
>> sstable_activity-5a1ff267ace03f128563cfae6103c65e
>> views_builds_in_progress-b7f2c10878cd3c809cd5d609b2bd149c
>>
>>
>>
>> but if I describe the system keyspace those cf are not present.
>>
>> cassandra@cqlsh> DESCRIBE KEYSPACE system;
>>
>> CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}  AND
>> durable_writes = true;
>>
>> CREATE TABLE system.available_ranges (
>> keyspace_name text PRIMARY KEY,
>> ranges set
>> ) WITH bloom_filter_fp_chance = 0.01
>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>> AND comment = 'available keyspace/ranges during bootstrap/replace
>> that are ready to be served'
>> AND compaction = {'class': 'org.apache.cassandra.db.compa
>> ction.SizeTieredCompactionStrategy', 'max_threshold': '32',
>> 'min_threshold': '4'}
>> AND compression = {'chunk_length_in_kb': '64', 'class': '
>> org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 1.0
>> AND dclocal_read_repair_chance = 0.0
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 0
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 360
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE';
>>
>> CREATE TABLE system.batches (
>> id timeuuid PRIMARY KEY,
>> mutations list,
>> version int
>> ) WITH bloom_filter_fp_chance = 0.01
>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>> AND comment = 'batches awaiting replay'
>> AND compaction = {'class': 'org.apache.cassandra.db.compa
>> ction.SizeTieredCompactionStrategy', 'max_threshold': '32',
>> 'min_threshold': '2'}
>> AND compression = {'chunk_length_in_kb': '64', 'class': '
>> org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 1.0
>> AND dclocal_read_repair_chance = 0.0
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 0
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 360
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE';
>>
>> CREATE TABLE system."IndexInfo" (
>> table_name text,
>> index_name text,
>> PRIMARY KEY (table_name, index_name)
>> ) WITH COMPACT STORAGE
>> AND CLUSTERING ORDER BY (index_name ASC)
>> AND bloom_filter_fp_chance = 0.01
>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>> AND comment = 'built column indexes'
>> AND compaction = {'class': 'org.apache.cassandra.db.compa
>> ction.SizeTieredCompactionStrategy', 'max_threshold': '32',
>> 'min_threshold': 

Re: inexistent columns familes

2018-04-17 Thread Jeff Jirsa
It's probably not ideal, but also not really a bug. We need to create the
table in the schema to see if it exists on disk so we know whether or not
to migrate it, and when we learn it's empty, we remove it from the schema
but we dont delete the directory. It's not great, but it's not going to
cause you any problems.

That said: 3.0.11 may cause you problems, you should strongly consider
3.0.16 instead.

On Tue, Apr 17, 2018 at 1:47 PM, Luigi Tagliamonte <
luigi.tagliamont...@gmail.com> wrote:

> Hello everybody,
> i'm having a problem with a brand new cassandra:3.0.11 node. The following
> tables belonging to the system keyspace:
>
> - schema_aggregates
> - schema_columnfamilies
> - schema_columns
> - schema_functions
> - schema_keyspaces
> - schema_triggers
> - schema_usertypes
>
>
> get initialised on disk:
>
> *root@ip-10-48-93-149:/var/lib/cassandra/data/system# pwd*
> /var/lib/cassandra/data/system
>
> *root@ip-10-48-93-149:/var/lib/cassandra/data/system# ls -1*
> IndexInfo-9f5c6374d48532299a0a5094af9ad1e3
> available_ranges-c539fcabd65a31d18133d25605643ee3
> batches-919a4bc57a333573b03e13fc3f68b465
> batchlog-0290003c977e397cac3efdfdc01d626b
> built_views-4b3c50a9ea873d7691016dbc9c38494a
> compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca
> hints-2666e20573ef38b390fefecf96e8f0c7
> local-7ad54392bcdd35a684174e047860b377
> paxos-b7b7f0c2fd0a34108c053ef614bb7c2d
> peer_events-59dfeaea8db2334191ef109974d81484
> peers-37f71aca7dc2383ba70672528af04d4f
> range_xfers-55d764384e553f8b9f6e676d4af3976d
> schema_aggregates-a5fc57fc9d6c3bfda3fc01ad54686fea
> schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697
> schema_columns-296e9c049bec3085827dc17d3df2122a
> schema_functions-d1b675fe2b503ca48e49c0f81989dcad
> schema_keyspaces-b0f2235744583cdb9631c43e59ce3676
> schema_triggers-0359bc7171233ee19a4ab9dfb11fc125
> schema_usertypes-3aa752254f82350b8d5c430fa221fa0a
> size_estimates-618f817b005f3678b8a453f3930b8e86
> sstable_activity-5a1ff267ace03f128563cfae6103c65e
> views_builds_in_progress-b7f2c10878cd3c809cd5d609b2bd149c
>
>
>
> but if I describe the system keyspace those cf are not present.
>
> cassandra@cqlsh> DESCRIBE KEYSPACE system;
>
> CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}  AND
> durable_writes = true;
>
> CREATE TABLE system.available_ranges (
> keyspace_name text PRIMARY KEY,
> ranges set
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = 'available keyspace/ranges during bootstrap/replace that
> are ready to be served'
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.0
> AND default_time_to_live = 0
> AND gc_grace_seconds = 0
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 360
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
> CREATE TABLE system.batches (
> id timeuuid PRIMARY KEY,
> mutations list,
> version int
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = 'batches awaiting replay'
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '2'}
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.0
> AND default_time_to_live = 0
> AND gc_grace_seconds = 0
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 360
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
> CREATE TABLE system."IndexInfo" (
> table_name text,
> index_name text,
> PRIMARY KEY (table_name, index_name)
> ) WITH COMPACT STORAGE
> AND CLUSTERING ORDER BY (index_name ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = 'built column indexes'
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.0
> AND default_time_to_live = 0
> AND gc_grace_seconds = 0
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 360
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = 

inexistent columns familes

2018-04-17 Thread Luigi Tagliamonte
Hello everybody,
i'm having a problem with a brand new cassandra:3.0.11 node. The following
tables belonging to the system keyspace:

- schema_aggregates
- schema_columnfamilies
- schema_columns
- schema_functions
- schema_keyspaces
- schema_triggers
- schema_usertypes


get initialised on disk:

*root@ip-10-48-93-149:/var/lib/cassandra/data/system# pwd*
/var/lib/cassandra/data/system

*root@ip-10-48-93-149:/var/lib/cassandra/data/system# ls -1*
IndexInfo-9f5c6374d48532299a0a5094af9ad1e3
available_ranges-c539fcabd65a31d18133d25605643ee3
batches-919a4bc57a333573b03e13fc3f68b465
batchlog-0290003c977e397cac3efdfdc01d626b
built_views-4b3c50a9ea873d7691016dbc9c38494a
compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca
hints-2666e20573ef38b390fefecf96e8f0c7
local-7ad54392bcdd35a684174e047860b377
paxos-b7b7f0c2fd0a34108c053ef614bb7c2d
peer_events-59dfeaea8db2334191ef109974d81484
peers-37f71aca7dc2383ba70672528af04d4f
range_xfers-55d764384e553f8b9f6e676d4af3976d
schema_aggregates-a5fc57fc9d6c3bfda3fc01ad54686fea
schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697
schema_columns-296e9c049bec3085827dc17d3df2122a
schema_functions-d1b675fe2b503ca48e49c0f81989dcad
schema_keyspaces-b0f2235744583cdb9631c43e59ce3676
schema_triggers-0359bc7171233ee19a4ab9dfb11fc125
schema_usertypes-3aa752254f82350b8d5c430fa221fa0a
size_estimates-618f817b005f3678b8a453f3930b8e86
sstable_activity-5a1ff267ace03f128563cfae6103c65e
views_builds_in_progress-b7f2c10878cd3c809cd5d609b2bd149c



but if I describe the system keyspace those cf are not present.

cassandra@cqlsh> DESCRIBE KEYSPACE system;

CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}  AND
durable_writes = true;

CREATE TABLE system.available_ranges (
keyspace_name text PRIMARY KEY,
ranges set
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = 'available keyspace/ranges during bootstrap/replace that
are ready to be served'
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 360
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

CREATE TABLE system.batches (
id timeuuid PRIMARY KEY,
mutations list,
version int
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = 'batches awaiting replay'
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '2'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 360
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

CREATE TABLE system."IndexInfo" (
table_name text,
index_name text,
PRIMARY KEY (table_name, index_name)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (index_name ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = 'built column indexes'
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 360
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

CREATE TABLE system.views_builds_in_progress (
keyspace_name text,
view_name text,
generation_number int,
last_token text,
PRIMARY KEY (keyspace_name, view_name)
) WITH CLUSTERING ORDER BY (view_name ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = 'views builds current progress'
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND 

multiple table directories for system_schema keyspace

2018-04-17 Thread John Sanda
On a couple different occasions I have run into this exception at start up:

Exception (org.apache.cassandra.exceptions.InvalidRequestException)
encountered during startup: Unknown type 
org.apache.cassandra.exceptions.InvalidRequestException: Unknown type 
at
org.apache.cassandra.cql3.CQL3Type$Raw$RawUT.prepare(CQL3Type.java:745)
at
org.apache.cassandra.cql3.CQL3Type$Raw.prepareInternal(CQL3Type.java:533)
at
org.apache.cassandra.schema.CQLTypeParser.parse(CQLTypeParser.java:53)
at
org.apache.cassandra.schema.SchemaKeyspace.createColumnFromRow(SchemaKeyspace.java:1052)
at
org.apache.cassandra.schema.SchemaKeyspace.lambda$fetchColumns$12(SchemaKeyspace.java:1038)

This was with Cassandra 3.0.12 running in Kubernetes, which means that IP
address changes for the Cassandra node can and will happen. Nowhere in
client code does the UDT get dropped. I came across
https://issues.apache.org/jira/browse/CASSANDRA-13739 which got me
wondering if this particular Cassandra node wound up with another version
of the system_schema.types table which did not have the UDT.

In what circumstances could I end up with multiple table directories for
the tables in system_schema? Right now I am just guessing that I wound up
with a newer (or different) version of the system_schema.types table.
Unfortunately, I no longer have access to the environment to confirm/deny
what was happening. I just want to better understand so I can avoid it in
the future.


- John


Re: DigestMismatchException after upgrade from c*-2.1.17 to c*-3.0.15

2018-04-17 Thread Jeff Jirsa
This isn’t really an error and shouldn’t be logged because so few people 
understand it well enough to find it useful. Some number of digest mismatches 
are expected if you read as you write. I wouldn’t worry about it unless you’re 
having a problem.

-- 
Jeff Jirsa


> On Apr 17, 2018, at 12:29 AM, techpyaasa  wrote:
> 
> Hi,
> 
> We have recently upgraded our cassandra production cluster(2 datacenters , 
> each with 6 nodes, 3 groups) from c*-2.1.17 to c*-3.0.15.
> 
> After which we are getting too many exceptions as below.
> 
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
>> DecoratedKey(-1032881015386111041, 03c099b9959871a9) 
>> (a613f5fd9fc797b252e26fe9b9b1ed4e vs 15b7d82a9b454f5fd433317f68de435f)
>>  at 
>> org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>>  at 
>> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>  at 
>> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>>  at java.lang.Thread.run(Thread.java:745)
>> 
> 
> No hints present /no mutation dropped, but still the above exception is 
> thrown quite frequently.
> 
> Could someone help us out in finding out the root cause.
> 
> Thanks in advance
> TechPyaasa


Cassandra read process

2018-04-17 Thread Vishal1.Sharma
Dear Community,

Can you please help in answering the question below:

https://stackoverflow.com/questions/49769643/cassandra-read-process

Thanks and regards,
Vishal Sharma
"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachment."


SSTable count in Nodetool tablestats(LevelCompactionStrategy)

2018-04-17 Thread Vishal1.Sharma
Dear Community,

One of the tables in my keyspace is using LevelCompactionStrategy and when I 
used the nodetool tablestats keyspace.table_name command, I found some mismatch 
in the count of SSTables displayed at 2 different places. Please refer the 
attached image.

The command is giving SSTable count = 6 but if you add the numbers shown 
against SSTables in each level, then that comes out as 5. Why is there a 
difference?

Thanks and regards,
Vishal Sharma
"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachment."

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Memtable type and size allocation

2018-04-17 Thread Vishal1.Sharma
Dear Community,

In Cassandra 3.11.2, there are 3 choices for the type of Memtable allocation 
and as per my understanding, if I want to keep Memtables on JVM heap I can use 
heap_buffers and if I want to store Memtables outside of JVM heap then I've got 
2 options offheap_buffers and offheap_objects.

What exactly is the difference between the 2 choices given for off-heap 
allocation?

Also, the permitted memory space to be used for Memtables can be set at 2 
places in the YAML file, i.e. memtable_heap_space_in_mb and 
memtable_offheap_space_in_mb.

Do I need to configure some space in both heap and offheap, irrespective of the 
Memtable allocation type or do I need to set only one of them based on my 
Memtable allocation type i.e. memtable_heap_space_in_mb when using heap buffers 
and memtable_offheap_space_in_mb only when using either of the other 2 offheap 
options?

https://stackoverflow.com/questions/49874917/memtable-type-and-size-allocation

Thanks and regards,
Vishal Sharma
"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachment."


答复: Time serial column family design

2018-04-17 Thread Xiangfei Ni
Hi Nate,
Thanks for your reply!
Is there other way to design this table to meet this requirement?

Best Regards,

倪项菲/ David Ni
中移德电网络科技有限公司
Virtue Intelligent Network Ltd, co.
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
Mob: +86 13797007811|Tel: + 86 27 5024 2516

发件人: Nate McCall 
发送时间: 2018年4月17日 7:12
收件人: Cassandra Users 
主题: Re: Time serial column family design


Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in 
(20180416, 20180415, 20180414, 20180413, 20180412….);
But this cause the cql query is very long,and I don’t know whether there is 
limitation for the length of the cql.
Please give me some advice,thanks in advance.

Using the SELECT ... IN syntax  means that:
- the driver will not be able to route the queries to the nodes which have the 
partition
- a single coordinator must scatter-gather the query and results

Break this up into a series of single statements using the executeAsync method 
and gather the results via something like Futures in Guava or similar.


A Cassandra Storage Estimation Mechanism

2018-04-17 Thread onmstester onmstester
I was going to estimate Hardware requirements for a project which mainly uses 
Apache Cassandra. 

Because of rule "Cassandra nodes size better be  2 TB", the total disk 
usage determines number of nodes,

and in most cases the result of this calculation would be so OK for satisfying 
the required input rate.

So IMHO the storage estimation is the most important part of requirement 
analysis in this kind of projects.

There are some formula's on the net for theorotical storage estimation but the 
results would be some KB on each row while actual inserts shows a few hundred 
bytes!

So It seems like that the best estimation would be insert alot of real data in 
real schema of real production server.

But i can't have the real data and production cluster before the estimation!

So i came up with an estimation idea:



1. I'm using the real schema +  3 nodes cluster

2. Required assumptions: Real input rate (200K per seconds that would be 150 
Billions totally) and Real partition count(Unique Keys in partitions: 1.5 
millions totally)

3. Instead of 150 billions, i'm doing 1 , 10 and 100 millions write so i would 
use 10, 100 and 1000 partitions proportionally! after each run, i would use 
'nodetool flush'

and using du -sh keyspace_dir, i would check the total disk usage of the rate, 
for example for rate 1 million, disk usage was 90 MB, so for 150Bil it would be 
13 TB . then drop the schema and run the next rate.

I would continue this until differential of two consecuence results, would be a 
tiny number.

I've got a good estimation at rate 100 Millions. Actually i was doing the 
estimation for an already runnig production cluster

and i knew the answer beforehand (just wanted to be sure about the idea), and 
estimation was equal to answer finally! but i'm worried that it was accidental.

Finally the question: Is my estimation mechanism correct and would be 
applicable for any estimation and any project?

If not, how to estimate storage (How you estimate)?



Thanks in advance



Sent using Zoho Mail







DigestMismatchException after upgrade from c*-2.1.17 to c*-3.0.15

2018-04-17 Thread techpyaasa
Hi,

We have recently upgraded our cassandra production cluster(2 datacenters ,
each with 6 nodes, 3 groups) from c*-2.1.17 to c*-3.0.15.

After which we are getting too many exceptions as below.

org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(-1032881015386111041, 03c099b9959871a9)
> (a613f5fd9fc797b252e26fe9b9b1ed4e vs 15b7d82a9b454f5fd433317f68de435f) at
> org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
> at
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> at java.lang.Thread.run(Thread.java:745)
>

No hints present /no mutation dropped, but still the above exception is
thrown quite frequently.

Could someone help us out in finding out the root cause.

Thanks in advance
TechPyaasa