Re: UDF/UDA for json data aggregation

2016-11-16 Thread DuyHai Doan
"Can we do something like this" --> Yes but requires a lot of coding

"can we import java classes from thirdparty jars for parsing/updating json
in UDF defined" --> No because the reasoning behind this limitation is: IF
you need to import third-party jars for your UDF/UDA, it means that the
computation is quite complex and consequently it may be a bad idea to
execute complex processing server-side.

On Wed, Nov 16, 2016 at 10:16 PM, techpyaasa .  wrote:

> Hi all,
>
> Like to use UDF/UDA in c*-2.2 & above to aggregate attibutes values in a
> json data.
>
> For example below table.
>
>
> *"CREATE  TABLE test ( id bigint , time1 bigint , jsonData text , PRIMARY
> KEY(id,time1));*
>
>
>
> *cqlsh:test> INSERT INTO test (id , time1 , jsonData ) VALUES ( 1, 123,
> '{"node1":{"attr1":"91","attr2":"1","attr3":"333"},"node2":{"attr4":"1.01","attr5":"1.231","attr6":"1.12"}}');*
>
>
> *cqlsh:test> INSERT INTO test (id , time1 , jsonData ) VALUES ( 2, 345,
> '{"node1":{"attr1":"22","attr2":"4","attr3":"111"},"node2":{"attr4":"2.01","attr5":"3.231","attr6":"2.112"}}');*
> *cqlsh:test> INSERT INTO test (id , time1 , jsonData ) VALUES ( 3, 333,
> '{"node1":{"attr1":"17","attr2":"56","attr3":"167"},"node2":{"attr4":"1.11","attr5":"2.31","attr6":"3.112"}}');"*
>
>
> Using UDF/UDA , I want attributes values of json data to be aggregated
> something like below
>
> *"select json_aggr(jsonData) from test; //SUM*
>
>
> *{"node1":{"attr1":"130","attr2":"61","attr3":"611"},"node2":{"attr4":"4.13","attr5":"6.772","attr6":"6.344"}}"*
>
> //130=91+22+17 etc.,
>
>
> Can we do something like this , can we import java classes from thirdparty
> jars for parsing/updating json in UDF defined? And how?
> Can somebody please provide outline code for this?
>
> Thanks,
> Techpyaasa
>


UDF/UDA for json data aggregation

2016-11-16 Thread techpyaasa .
Hi all,

Like to use UDF/UDA in c*-2.2 & above to aggregate attibutes values in a
json data.

For example below table.


*"CREATE  TABLE test ( id bigint , time1 bigint , jsonData text , PRIMARY
KEY(id,time1));*



*cqlsh:test> INSERT INTO test (id , time1 , jsonData ) VALUES ( 1, 123,
'{"node1":{"attr1":"91","attr2":"1","attr3":"333"},"node2":{"attr4":"1.01","attr5":"1.231","attr6":"1.12"}}');*


*cqlsh:test> INSERT INTO test (id , time1 , jsonData ) VALUES ( 2, 345,
'{"node1":{"attr1":"22","attr2":"4","attr3":"111"},"node2":{"attr4":"2.01","attr5":"3.231","attr6":"2.112"}}');*
*cqlsh:test> INSERT INTO test (id , time1 , jsonData ) VALUES ( 3, 333,
'{"node1":{"attr1":"17","attr2":"56","attr3":"167"},"node2":{"attr4":"1.11","attr5":"2.31","attr6":"3.112"}}');"*


Using UDF/UDA , I want attributes values of json data to be aggregated
something like below

*"select json_aggr(jsonData) from test; //SUM*

*{"node1":{"attr1":"130","attr2":"61","attr3":"611"},"node2":{"attr4":"4.13","attr5":"6.772","attr6":"6.344"}}"*

//130=91+22+17 etc.,


Can we do something like this , can we import java classes from thirdparty
jars for parsing/updating json in UDF defined? And how?
Can somebody please provide outline code for this?

Thanks,
Techpyaasa


Clarify Support for 2.2 on Download Page

2016-11-16 Thread Derek Burdick
Hi, is it possible to update the language on the Apache Cassandra Download
page to reflect that version 2.2 will enter Critical Fix Only support after
November 21st?

The current language creates quite a bit of confusion in the community with
how long 2.2 and 2.1 will receive fixes from the community.

http://cassandra.apache.org/download/

Specifically these three lines:

   - Apache Cassandra 3.0 is supported until May 2017. The latest release
   is 3.0.9
   

(pgp
   

   , md5
   

and sha1
   
),
   released on 2016-09-20.
   - Apache Cassandra 2.2 is supported until November 2016. The latest
   release is 2.2.8
   

(pgp
   

   , md5
   

and sha1
   
),
   released on 2016-09-28.
   - Apache Cassandra 2.1 is supported until November 2016 with critical
   fixes only. The latest release is 2.1.16
   

(pgp
   

   , md5
   

and sha1
   
),
   released on 2016-10-10.


What would be the best approach to help get this changed?

-Derek


Re: Can nodes in c* cluster run different versions ?

2016-11-16 Thread techpyaasa .
Thank you @Alain

On Wed, Nov 16, 2016 at 9:13 PM, Alain RODRIGUEZ  wrote:

> Hey Techpyaasa,
>
> Are you aware of this documentation?
>
> https://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/
> upgrdCassandraDetails.html
>
> Basically yes, you can have multiple versions, but you want to make this
> multi-version time it as short as possible.
>
> As it might take few days as it will have 'upgrade sstables' , so just
>> wanted to know would be there any possibility during this mismatch of c*
>> version among nodes in cluster during this upgrade process?
>
>
> To make the upgrade as short as possible I use to migrate the whole
> cluster and only after upgrade sstables. This minimises the time you're
> running multiple version.
>
> The only (soft) limitation, is that you shouldn't be streaming data around
> (do not bootstrap, repair or remove a node). It might work, that's why I
> say "soft limitation" but it might not work, there is no guarantee, it
> mainly depends on changes that were made between versions I believe.
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> 2016-11-16 11:12 GMT+01:00 techpyaasa . :
>
>> Hi all,
>>
>> We are currently running c*-2.0.17 with 2 datacenters each with 18 nodes.
>>
>> We like to upgrade to c*-2.1.16. Can we upgrade first all nodes(one by
>> one) in one dc and then go to next data center.
>>
>> As it might take few days as it will have 'upgrade sstables' , so just
>> wanted to know would be there any possibility during this mismatch of c*
>> version among nodes in cluster during this upgrade process?
>>
>> Thanks
>> Techpyaasa
>>
>
>


[RELEASE] Apache Cassandra 3.0.10 released

2016-11-16 Thread Michael Shuler
The Cassandra team is pleased to announce the release of Apache
Cassandra version 3.0.10.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: (CHANGES.txt) https://goo.gl/qplvTw
[2]: (NEWS.txt) https://goo.gl/whvLcU
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Cassandra Node Restart Stuck in STARTING?

2016-11-16 Thread Daniel Subak
We're on Cassandra 3.7, running on Ubuntu 14.04.
In terms of system utilization, we saw one Cassandra process which was
using 100% CPU, but overall load was very low on the box. Disk utilization
was largely nominal.

On Wed, Nov 16, 2016 at 2:19 PM, Jeff Jirsa 
wrote:

> What version?
>
> Is the system doing anything (do you see high CPU / disk usage)?
>
>
>
> Sometimes restarts will trigger some changes to files on disk that are
> mostly invisible in the logs (https://issues.apache.org/
> jira/browse/CASSANDRA-11163 for example), but it’s usually during a
> different part of the startup process (you’d be seeing different log
> messages), and would eventually complete.
>
>
>
>
>
>
>
> *From: *Daniel Subak 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Wednesday, November 16, 2016 at 11:05 AM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Cassandra Node Restart Stuck in STARTING?
>
>
>
> Hey everyone,
>
> Ran into an issue running a node restart where "nodetool netstats"
> reported the node as "STARTING" with no streams when run locally. "nodetool
> status" run on other nodes reported that node as "DN". Both of those were
> expected. However, tailing the logs, there didn't seem to be anything
> noteworthy happening (below are the last few log lines in system.log.) Has
> anyone seen this behavior before? We'd love to be able to better monitor
> what is happening during a restart if anyone has some information on what
> happens during this phase! Happy to provide more info if needed, but even a
> high level general explanation would provide some clarity
>
> Thanks,
>
> Dan
>
>
> INFO  [main] 2016-11-16 18:07:55,907 ColumnFamilyStore.java:405 -
> Initializing system_schema.keyspaces
> INFO  [main] 2016-11-16 18:07:55,942 ColumnFamilyStore.java:405 -
> Initializing system_schema.tables
> INFO  [main] 2016-11-16 18:07:55,971 ColumnFamilyStore.java:405 -
> Initializing system_schema.columns
> INFO  [main] 2016-11-16 18:07:55,992 ColumnFamilyStore.java:405 -
> Initializing system_schema.triggers
> INFO  [main] 2016-11-16 18:07:56,010 ColumnFamilyStore.java:405 -
> Initializing system_schema.dropped_columns
> INFO  [main] 2016-11-16 18:07:56,026 ColumnFamilyStore.java:405 -
> Initializing system_schema.views
> INFO  [main] 2016-11-16 18:07:56,047 ColumnFamilyStore.java:405 -
> Initializing system_schema.types
> INFO  [main] 2016-11-16 18:07:56,066 ColumnFamilyStore.java:405 -
> Initializing system_schema.functions
> INFO  [main] 2016-11-16 18:07:56,081 ColumnFamilyStore.java:405 -
> Initializing system_schema.aggregates
> INFO  [main] 2016-11-16 18:07:56,093 ColumnFamilyStore.java:405 -
> Initializing system_schema.indexes
> INFO  [main] 2016-11-16 18:07:56,102 ViewManager.java:139 - Not submitting
> build tasks for views in keyspace system_schema as storage service is not
> initialized
>


Re: Cassandra Node Restart Stuck in STARTING?

2016-11-16 Thread Jeff Jirsa
What version? 

Is the system doing anything (do you see high CPU / disk usage)?

 

Sometimes restarts will trigger some changes to files on disk that are mostly 
invisible in the logs (https://issues.apache.org/jira/browse/CASSANDRA-11163 
for example), but it’s usually during a different part of the startup process 
(you’d be seeing different log messages), and would eventually complete. 

 

 

 

From: Daniel Subak 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, November 16, 2016 at 11:05 AM
To: "user@cassandra.apache.org" 
Subject: Cassandra Node Restart Stuck in STARTING?

 

Hey everyone,

Ran into an issue running a node restart where "nodetool netstats" reported the 
node as "STARTING" with no streams when run locally. "nodetool status" run on 
other nodes reported that node as "DN". Both of those were expected. However, 
tailing the logs, there didn't seem to be anything noteworthy happening (below 
are the last few log lines in system.log.) Has anyone seen this behavior 
before? We'd love to be able to better monitor what is happening during a 
restart if anyone has some information on what happens during this phase! Happy 
to provide more info if needed, but even a high level general explanation would 
provide some clarity

Thanks,

Dan


INFO  [main] 2016-11-16 18:07:55,907 ColumnFamilyStore.java:405 - Initializing 
system_schema.keyspaces
INFO  [main] 2016-11-16 18:07:55,942 ColumnFamilyStore.java:405 - Initializing 
system_schema.tables
INFO  [main] 2016-11-16 18:07:55,971 ColumnFamilyStore.java:405 - Initializing 
system_schema.columns
INFO  [main] 2016-11-16 18:07:55,992 ColumnFamilyStore.java:405 - Initializing 
system_schema.triggers
INFO  [main] 2016-11-16 18:07:56,010 ColumnFamilyStore.java:405 - Initializing 
system_schema.dropped_columns
INFO  [main] 2016-11-16 18:07:56,026 ColumnFamilyStore.java:405 - Initializing 
system_schema.views
INFO  [main] 2016-11-16 18:07:56,047 ColumnFamilyStore.java:405 - Initializing 
system_schema.types
INFO  [main] 2016-11-16 18:07:56,066 ColumnFamilyStore.java:405 - Initializing 
system_schema.functions
INFO  [main] 2016-11-16 18:07:56,081 ColumnFamilyStore.java:405 - Initializing 
system_schema.aggregates
INFO  [main] 2016-11-16 18:07:56,093 ColumnFamilyStore.java:405 - Initializing 
system_schema.indexes
INFO  [main] 2016-11-16 18:07:56,102 ViewManager.java:139 - Not submitting 
build tasks for views in keyspace system_schema as storage service is not 
initialized



smime.p7s
Description: S/MIME cryptographic signature


Re: Cassandra Node Restart Stuck in STARTING?

2016-11-16 Thread Surbhi Gupta
Attaching the system.log can give more details ...

On 16 November 2016 at 11:05, Daniel Subak  wrote:

> Hey everyone,
>
> Ran into an issue running a node restart where "nodetool netstats"
> reported the node as "STARTING" with no streams when run locally. "nodetool
> status" run on other nodes reported that node as "DN". Both of those were
> expected. However, tailing the logs, there didn't seem to be anything
> noteworthy happening (below are the last few log lines in system.log.) Has
> anyone seen this behavior before? We'd love to be able to better monitor
> what is happening during a restart if anyone has some information on what
> happens during this phase! Happy to provide more info if needed, but even a
> high level general explanation would provide some clarity
>
> Thanks,
> Dan
>
> INFO  [main] 2016-11-16 18:07:55,907 ColumnFamilyStore.java:405 -
> Initializing system_schema.keyspaces
> INFO  [main] 2016-11-16 18:07:55,942 ColumnFamilyStore.java:405 -
> Initializing system_schema.tables
> INFO  [main] 2016-11-16 18:07:55,971 ColumnFamilyStore.java:405 -
> Initializing system_schema.columns
> INFO  [main] 2016-11-16 18:07:55,992 ColumnFamilyStore.java:405 -
> Initializing system_schema.triggers
> INFO  [main] 2016-11-16 18:07:56,010 ColumnFamilyStore.java:405 -
> Initializing system_schema.dropped_columns
> INFO  [main] 2016-11-16 18:07:56,026 ColumnFamilyStore.java:405 -
> Initializing system_schema.views
> INFO  [main] 2016-11-16 18:07:56,047 ColumnFamilyStore.java:405 -
> Initializing system_schema.types
> INFO  [main] 2016-11-16 18:07:56,066 ColumnFamilyStore.java:405 -
> Initializing system_schema.functions
> INFO  [main] 2016-11-16 18:07:56,081 ColumnFamilyStore.java:405 -
> Initializing system_schema.aggregates
> INFO  [main] 2016-11-16 18:07:56,093 ColumnFamilyStore.java:405 -
> Initializing system_schema.indexes
> INFO  [main] 2016-11-16 18:07:56,102 ViewManager.java:139 - Not submitting
> build tasks for views in keyspace system_schema as storage service is not
> initialized
>


Cassandra Node Restart Stuck in STARTING?

2016-11-16 Thread Daniel Subak
Hey everyone,

Ran into an issue running a node restart where "nodetool netstats" reported
the node as "STARTING" with no streams when run locally. "nodetool status"
run on other nodes reported that node as "DN". Both of those were expected.
However, tailing the logs, there didn't seem to be anything noteworthy
happening (below are the last few log lines in system.log.) Has anyone seen
this behavior before? We'd love to be able to better monitor what is
happening during a restart if anyone has some information on what happens
during this phase! Happy to provide more info if needed, but even a high
level general explanation would provide some clarity

Thanks,
Dan

INFO  [main] 2016-11-16 18:07:55,907 ColumnFamilyStore.java:405 -
Initializing system_schema.keyspaces
INFO  [main] 2016-11-16 18:07:55,942 ColumnFamilyStore.java:405 -
Initializing system_schema.tables
INFO  [main] 2016-11-16 18:07:55,971 ColumnFamilyStore.java:405 -
Initializing system_schema.columns
INFO  [main] 2016-11-16 18:07:55,992 ColumnFamilyStore.java:405 -
Initializing system_schema.triggers
INFO  [main] 2016-11-16 18:07:56,010 ColumnFamilyStore.java:405 -
Initializing system_schema.dropped_columns
INFO  [main] 2016-11-16 18:07:56,026 ColumnFamilyStore.java:405 -
Initializing system_schema.views
INFO  [main] 2016-11-16 18:07:56,047 ColumnFamilyStore.java:405 -
Initializing system_schema.types
INFO  [main] 2016-11-16 18:07:56,066 ColumnFamilyStore.java:405 -
Initializing system_schema.functions
INFO  [main] 2016-11-16 18:07:56,081 ColumnFamilyStore.java:405 -
Initializing system_schema.aggregates
INFO  [main] 2016-11-16 18:07:56,093 ColumnFamilyStore.java:405 -
Initializing system_schema.indexes
INFO  [main] 2016-11-16 18:07:56,102 ViewManager.java:139 - Not submitting
build tasks for views in keyspace system_schema as storage service is not
initialized


Re: Tomstones impact on repairs both anti-entropy and read repair

2016-11-16 Thread Alain RODRIGUEZ
Hi,


> My question to the community is will tombstone cause issues in data
> consistency across the DCs.


It might, if your repairs are not succeeding for some reason or not running
fully (all the token ranges) within gc_grace_second (parameter at the table
level)

I wrote a blog post and talked about that at the summit, see:

http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
https://www.youtube.com/watch?v=lReTEcnzl7Y

Let me know if you still have questions after reading / listening to this.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-11-14 17:52 GMT+01:00 K F :

> Hi Folks,
>
> I have a table that has lot of tombstones generated and has caused
> inconsistent data across various datacenters. we run anti-entropy repairs
> and also have read_repair_chance tuned-up during our non busy hours. But
> yet when we try to compare data residing in various replicas across DCs, we
> see inconsistency.
>
> My question to the community is will tombstone cause issues in data
> consistency across the DCs.
>
> Thanks.
>


Re: Can nodes in c* cluster run different versions ?

2016-11-16 Thread Alain RODRIGUEZ
Hey Techpyaasa,

Are you aware of this documentation?

https://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgrdCassandraDetails.html

Basically yes, you can have multiple versions, but you want to make this
multi-version time it as short as possible.

As it might take few days as it will have 'upgrade sstables' , so just
> wanted to know would be there any possibility during this mismatch of c*
> version among nodes in cluster during this upgrade process?


To make the upgrade as short as possible I use to migrate the whole cluster
and only after upgrade sstables. This minimises the time you're running
multiple version.

The only (soft) limitation, is that you shouldn't be streaming data around
(do not bootstrap, repair or remove a node). It might work, that's why I
say "soft limitation" but it might not work, there is no guarantee, it
mainly depends on changes that were made between versions I believe.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-11-16 11:12 GMT+01:00 techpyaasa . :

> Hi all,
>
> We are currently running c*-2.0.17 with 2 datacenters each with 18 nodes.
>
> We like to upgrade to c*-2.1.16. Can we upgrade first all nodes(one by
> one) in one dc and then go to next data center.
>
> As it might take few days as it will have 'upgrade sstables' , so just
> wanted to know would be there any possibility during this mismatch of c*
> version among nodes in cluster during this upgrade process?
>
> Thanks
> Techpyaasa
>


Re: Some questions to updating and tombstone

2016-11-16 Thread Alain RODRIGUEZ
Hi Boying,

Old value is not tombstone, but remains until compaction


Be careful, the above is generally true but not necessary.

Tombstones can actually be generated while using update in some corner
cases. Using collections or prepared statements.

I wrote a detailed blog post about deletes and tombstones in Cassandra
precisely to avoid answering this kind of question again and again on the
mailing list, as explaining correctly is a bit hard and I am a lazy guy. I
also talked about it at the last Cassandra summit. If you are going to use
Cassandra (and deletes) I think one of these might be of interest to you:

http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
https://www.youtube.com/watch?v=lReTEcnzl7Y

If you still have questions after reading it, I would be very pleased to
help you further, but I believe this should be helpful.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-11-16 10:15 GMT+01:00 Shalom Sagges :

> Hi Fabrice,
>
> Just a small (out of the topic) question I couldn't find an answer to.
> What is a slice in Cassandra? (e.g. Maximum tombstones per slice)
>
> Thanks!
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
>  
>  We Create Meaningful Connections
>
> 
>
>
> On Tue, Nov 15, 2016 at 6:38 PM, Fabrice Facorat <
> fabrice.faco...@gmail.com> wrote:
>
>> If you don't want tombstones, don't generate them ;)
>>
>> More seriously, tombstones are generated when:
>> - doing a DELETE
>> - TTL expiration
>> - set a column to NULL
>>
>> However tombstones are an issue only if for the same value, you have many
>> tombstones (i.e you keep overwriting the same values with datas and
>> tombstones). Having 1 tombstone for 1 value is not an issue, having 1000
>> tombstone for 1 value is a problem. Do really your use case overwrite data
>> with DELETE or  NULL ?
>>
>> So that's why what you may want to know is how many tombstones you have
>> on average when reading a value. This is available in:
>> - nodetool cfstats ks.cf : Average tombstones per slice/Maximum
>> tombstones per slice
>> - JMX : org.apache.cassandra.metrics:keyspace=,name=TombstoneSca
>> nnedHistogram,scope=,type=ColumnFamily Max/Count/99thPercentile/Mean
>>
>>
>> 2016-11-15 10:05 GMT+01:00 Lu, Boying :
>>
>>> Thanks a lot for your help.
>>>
>>>
>>>
>>> We are using STCS strategy and not using TTL
>>>
>>>
>>>
>>> Is there any API that we can use to query the current number of
>>> tombstones in a CF?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Anuj Wadehra [mailto:anujw_2...@yahoo.co.in]
>>> *Sent:* 2016年11月14日 22:20
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Some questions to updating and tombstone
>>>
>>>
>>>
>>> Hi Boying,
>>>
>>>
>>>
>>> I agree with Vladimir.If compaction is not compacting the two sstables
>>> with updates soon, disk space issues will be wasted. For example, if the
>>> updates are not closer in time, first update might be in a big table by the
>>> time second update is being written in a new small table. STCS wont compact
>>> them together soon.
>>>
>>>
>>>
>>> Just adding column values with new timestamp shouldnt create any
>>> tombstones. But if data is not merged for long, disk space issues may
>>> arise. If you are STCS,just  yo get an idea about the extent of the problem
>>> you can run major compaction and see the amount of disk space created with
>>> that( dont do this in production as major compaction has its own side
>>> effects).
>>>
>>>
>>>
>>> Which compaction strategy are you using?
>>>
>>> Are these updates done with TTL?
>>>
>>>
>>>
>>> Thanks
>>> Anuj
>>>
>>>
>>>
>>> On Mon, 14 Nov, 2016 at 1:54 PM, Vladimir Yudovin
>>>
>>>  wrote:
>>>
>>> Hi Boying,
>>>
>>>
>>>
>>> UPDATE write new value with new time stamp. Old value is not tombstone,
>>> but remains until compaction. gc_grace_period is not related to this.
>>>
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>>
>>> *Winguzone  - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>>
>>>
>>>
>>>  On Mon, 14 Nov 2016 03:02:21 -0500*Lu, Boying >> >* wrote 
>>>
>>>
>>>
>>> Hi, All,
>>>
>>>
>>>
>>> Will the Cassandra generates a new tombstone when updating a column by
>>> using CQL update statement?
>>>
>>>
>>>
>>> And is there any way to get the number of tombstones of a column family
>>> since we want to void generating
>>>
>>> too many tombstones within gc_grace_period?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Boying
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Close the World, Open the Net
>> http://www.linux-wizard.net
>>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or 

Can nodes in c* cluster run different versions ?

2016-11-16 Thread techpyaasa .
Hi all,

We are currently running c*-2.0.17 with 2 datacenters each with 18 nodes.

We like to upgrade to c*-2.1.16. Can we upgrade first all nodes(one by one)
in one dc and then go to next data center.

As it might take few days as it will have 'upgrade sstables' , so just
wanted to know would be there any possibility during this mismatch of c*
version among nodes in cluster during this upgrade process?

Thanks
Techpyaasa


Re: Some questions to updating and tombstone

2016-11-16 Thread Shalom Sagges
Hi Fabrice,

Just a small (out of the topic) question I couldn't find an answer to. What
is a slice in Cassandra? (e.g. Maximum tombstones per slice)

Thanks!


Shalom Sagges
DBA
T: +972-74-700-4035
 
 We Create Meaningful Connections



On Tue, Nov 15, 2016 at 6:38 PM, Fabrice Facorat 
wrote:

> If you don't want tombstones, don't generate them ;)
>
> More seriously, tombstones are generated when:
> - doing a DELETE
> - TTL expiration
> - set a column to NULL
>
> However tombstones are an issue only if for the same value, you have many
> tombstones (i.e you keep overwriting the same values with datas and
> tombstones). Having 1 tombstone for 1 value is not an issue, having 1000
> tombstone for 1 value is a problem. Do really your use case overwrite data
> with DELETE or  NULL ?
>
> So that's why what you may want to know is how many tombstones you have on
> average when reading a value. This is available in:
> - nodetool cfstats ks.cf : Average tombstones per slice/Maximum
> tombstones per slice
> - JMX : org.apache.cassandra.metrics:keyspace=,name=
> TombstoneScannedHistogram,scope=,type=ColumnFamily
> Max/Count/99thPercentile/Mean
>
>
> 2016-11-15 10:05 GMT+01:00 Lu, Boying :
>
>> Thanks a lot for your help.
>>
>>
>>
>> We are using STCS strategy and not using TTL
>>
>>
>>
>> Is there any API that we can use to query the current number of
>> tombstones in a CF?
>>
>>
>>
>>
>>
>>
>>
>> *From:* Anuj Wadehra [mailto:anujw_2...@yahoo.co.in]
>> *Sent:* 2016年11月14日 22:20
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Some questions to updating and tombstone
>>
>>
>>
>> Hi Boying,
>>
>>
>>
>> I agree with Vladimir.If compaction is not compacting the two sstables
>> with updates soon, disk space issues will be wasted. For example, if the
>> updates are not closer in time, first update might be in a big table by the
>> time second update is being written in a new small table. STCS wont compact
>> them together soon.
>>
>>
>>
>> Just adding column values with new timestamp shouldnt create any
>> tombstones. But if data is not merged for long, disk space issues may
>> arise. If you are STCS,just  yo get an idea about the extent of the problem
>> you can run major compaction and see the amount of disk space created with
>> that( dont do this in production as major compaction has its own side
>> effects).
>>
>>
>>
>> Which compaction strategy are you using?
>>
>> Are these updates done with TTL?
>>
>>
>>
>> Thanks
>> Anuj
>>
>>
>>
>> On Mon, 14 Nov, 2016 at 1:54 PM, Vladimir Yudovin
>>
>>  wrote:
>>
>> Hi Boying,
>>
>>
>>
>> UPDATE write new value with new time stamp. Old value is not tombstone,
>> but remains until compaction. gc_grace_period is not related to this.
>>
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>>
>> *Winguzone  - Hosted Cloud Cassandra
>> Launch your cluster in minutes.*
>>
>>
>>
>>
>>
>>  On Mon, 14 Nov 2016 03:02:21 -0500*Lu, Boying > >* wrote 
>>
>>
>>
>> Hi, All,
>>
>>
>>
>> Will the Cassandra generates a new tombstone when updating a column by
>> using CQL update statement?
>>
>>
>>
>> And is there any way to get the number of tombstones of a column family
>> since we want to void generating
>>
>> too many tombstones within gc_grace_period?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Boying
>>
>>
>>
>>
>
>
> --
> Close the World, Open the Net
> http://www.linux-wizard.net
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.


Does recovery continue after truncating a table?

2016-11-16 Thread Yuji Ito
Hi,

I could find stale data after truncating a table.
It seems that truncating starts while recovery is being executed just after
a node restarts.
After the truncating finishes, recovery still continues?
Is it expected?

I use C* 2.2.8 and can reproduce it as below.

 [create table] 
cqlsh $ip -e "drop keyspace testdb;"
cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class':
'SimpleStrategy', 'replication_factor': '3'};"
cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);"

 [script] 
#!/bin/sh

node1_ip=
node2_ip=
node3_ip=
node3_user=
rows=1

echo "consistency quorum;" > init_data.cql
for key in $(seq 0 $(expr $rows - 1))
do
echo "insert into testdb.testtbl (key, val) values($key, ) IF NOT
EXISTS;" >> init_data.cql
done

while true
do
echo "truncate the table"
cqlsh $node1_ip -e "truncate table testdb.testtbl"
if [ $? -ne 0 ]; then
echo "truncating failed"
continue
else
break
fi
done

echo "kill C* process on node3"
pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon |
awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"

echo "insert $rows rows"
cqlsh $node1_ip -f init_data.cql > insert_log 2>&1

echo "restart C* process on node3"
pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start"

while true
do
echo "truncate the table again"
cqlsh $node1_ip -e "truncate table testdb.testtbl"
if [ $? -ne 0 ]; then
echo "truncating failed"
continue
else
break
fi
done

cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
count(*) from testdb.testtbl;"
sleep 10
cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
count(*) from testdb.testtbl;"


 [result] 
truncate the table
kill C* process on node3
insert 1 rows
restart C* process on node3
10.91.145.27: Starting Cassandra: OK
truncate the table again
:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
:1:TruncateError: Error during truncate: Cannot achieve consistency
level ALL
truncating failed
truncate the table again
Consistency level set to SERIAL.

 count
---
   300

(1 rows)

Warnings :
Aggregation query used without partition key

Consistency level set to SERIAL.

 count
---
  2304

(1 rows)

Warnings :
Aggregation query used without partition key


I found it when I was investigating data lost problem. (Ref. "failure node
rejoin" thread)
I'm not sure this problem is related to data lost.

Thanks,
yuji


Re: Too High resident memory of cassandra 2.2.8

2016-11-16 Thread ankit tyagi
Hi Jeff,

I used below command to findout total off heap.

bin/nodetool cfstats  | grep 'Off heap memory used' | cut -d ":"  -f2 | awk
'{sum+=$1} END{print sum}'
*1417134538*

As per your suggestion, it is around 1GB only.  we have around 50 tables
and 2 column family but still it doesn't make sense to having so much
 resident memory.



On Mon, Nov 14, 2016 at 7:15 PM, Jeff Jirsa 
wrote:

> nodetool cfstats will show it per table.
>
>
>
> The bloom filter / compression data is typically (unless you have very
> unusual settings in your schema) 1-3GB each per TB of data, so with 235’ish
> GB/server, it’s unlikely bloom filter or compression data.
>
>
>
> The memTable is AT LEAST 1MB per columnfamily/table, so if you know how
> many tables you have, that may be an initial lower bound guess.
>
>
>
>
>
>
>
> *From: *ankit tyagi 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Sunday, November 13, 2016 at 11:33 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: Too High resident memory of cassandra 2.2.8
>
>
>
> Hi Jeff,
>
> Below is the output of nodetool staus command.
>
>
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address Load   Tokens   OwnsHost ID
> Rack
>
> UN  192.168.68.156  *235.79 GB*  256  ?
> e7b1a44d-0cd2-4b60-b322-4f989933fc51  rack1
>
> UN  192.168.68.157  *234.65 GB*  256  ?
> 70406f0b-3620-401e-beaa-15deb4b799ce  rack1
>
> UN  192.168.69.146   256  ?   d32e1e4d-ec86-4c3f-9397-11f37ff7b4d3
>  rack1
>
> UN  192.168.69.147  *242.77 GB * 256  ?
> 646d9416-a467-4526-9656-959aa98404d0  rack1
>
> UN  192.168.69.148  *249.84 GB * 256  ?
> 9b0ab632-75f4-4781-a987-a00b8246ae97  rack1
>
> UN  192.168.69.149  *240.62 GB*  256  ?
> 406c4d3e-0933-4cba-935f-bfba16e6d878  rack1
>
>
>
>
>
> is there any command to find out the size of offheap memtable.
>
>
>
> On Mon, Nov 14, 2016 at 12:30 PM, Jeff Jirsa 
> wrote:
>
> Cassandra keeps certain data structures offheap, including bloom filters
> (scales with data size), compression metadata (scales with data size), and
> potentially memtables (scales with # of keyspaces/tables).
>
>
>
> How much data on your node? Onheap or offheap memtables?
>
>
>
>
>
>
>
> *From: *ankit tyagi 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Sunday, November 13, 2016 at 10:55 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Too High resident memory of cassandra 2.2.8
>
>
>
> Hi,
>
>
>
> we are using cassandra 2.2.8 version in production. we are seeing resident
> memory of cassndra process is very high 40G while heap size is only 8GB.
>
>
>
> root  23339  1 80 Nov11 ?2-09:38:08 /opt/java8/bin/java
> -ea -javaagent:bin/../lib/jamm-0.3.0.jar -XX:+CMSClassUnloadingEnabled
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42* -Xms8192M -Xmx8192M
> -Xmn2048M* -XX:+HeapDumpOnOutOfMemoryError -Xss256k
> -XX:StringTableSize=103 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseTLAB -XX:+PerfDisableSharedMem 
> -XX:CompileCommandFile=bin/../conf/hotspot_compiler
> -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1
> -XX:+UseCondCardMark -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution 
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintPromotionFailure -Xloggc:bin/../logs/gc.log
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
> -Djava.net.preferIPv4Stack=true -Dcassandra.jmx.local.port=7199
> -XX:+DisableExplicitGC -Djava.library.path=bin/../lib/sigar-bin
> -javaagent:/myntra/currentCassandra/lib/agent-1.2.jar=
> statsd.myntra.com:8125
> 
> -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=bin/../logs
> -Dcassandra.storagedir=bin/../data -cp bin/../conf:bin/../build/
> classes/main:bin/../build/classes/thrift:bin/../lib/
> agent-1.2.jar:bin/../lib/airline-0.6.jar:bin/../lib/
> antlr-runtime-3.5.2.jar:bin/../lib/apache-cassandra-2.2.8.
> jar:bin/../lib/apache-cassandra-clientutil-2.2.8.jar:bin/../lib/apache-
> cassandra-thrift-2.2.8.jar:bin/../lib/cassandra-driver-
> core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:bin/../
> lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:
> bin/../lib/commons-lang3-3.1.jar:bin/../lib/commons-math3-
> 3.2.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/
> concurrentlinkedhashmap-lru-1.4.jar:bin/../lib/crc32ex-0.1.
> 1.jar:bin/../lib/disruptor-3.0.1.jar:bin/../lib/ecj-4.4.2.
> jar:bin/../lib/guava-16.0.jar:bin/../lib/high-scale-l