RE: Inconsistent Reads after Restoring Snapshot

2016-04-28 Thread Anuj Wadehra
Sean,
I meant commit log archival was never part of "restoring snapshot" DataStax 
documentation. How commitlog archival is related to my concern? Please 
elaborate.
ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Thu, 28 Apr, 2016 at 9:24 PM, 
sean_r_dur...@homedepot.com wrote:   
https://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configLogArchive_t.html
 
  
 
Sean Durity
 
  
 
From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in]
Sent: Wednesday, April 27, 2016 10:44 PM
To: user@cassandra.apache.org
Subject: RE: Inconsistent Reads after Restoring Snapshot
 
  
 
No.We are not saving them.I have never read that in DataStax documentation.
 
  
 
Thanks
 
Anuj
 
Sent from Yahoo Mail on Android
 
  
 

On Thu, 28 Apr, 2016 at 12:45 AM, sean_r_dur...@homedepot.com
 
 wrote:
 
What about the commitlogs? Are you saving those off anywhere in between the 
snapshot and the crash?
 
 
 
 
 
Sean Durity
 
 
 
From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in]
Sent: Monday, April 25, 2016 10:26 PM
To: User
Subject: Inconsistent Reads after Restoring Snapshot
 
 
 
Hi,
 
 
 
We have 2.0.14. We use RF=3 and read/write at Quorum. Moreover, we dont use 
incremental backups. As per the documentation at 
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html
 , if i need to restore a Snapshot on SINGLE node in a cluster, I would run 
repair at the end. But while the repair is going on, reads may get inconsistent.
 
 
 
 
 
Consider following scenario:
 
10 AM Daily Snapshot taken of node A and moved to backup location
 
11 AM A record is inserted such that node A and B insert the record but there 
is a mutation drop on node C.
 
1 PM Node A crashes and data is restored from latest 10 AM snapshot. Now, only 
Node B has the record.
 
 
 
Now, my question is:
 
 
 
Till the repair is completed on node A,a read at Quorum may return inconsistent 
result based on the nodes from which data is read.If data is read from node A 
and node C, nothing is returned and if data is read from node A and node B, 
record is returned. This is a vital point which is not highlighted anywhere.
 
 
 
 
 
Please confirm my understanding.If my understanding is right, how to make sure 
that my reads are not inconsistent while a node is being repair after restoring 
a snapshot.
 
 
 
I think, autobootstrapping the node without joining the ring till the repair is 
completed, is an alternative option. But snapshots save lot of streaming as 
compared to bootstrap.
 
 
 
Will incremental backups guarantee that 
 
 
 
Thanks
 
Anuj
 
 
 
 
 
Sent from Yahoo Mail on Android
 
  
 

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.
 
  
 


The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.
  


tombstone_failure_threshold being ignored?

2016-04-28 Thread Rick Gunderson
We are running Cassandra 2.2.3, 2 data centers, 3 nodes in each. The 
replication factor per datacenter is 3. The Xmx setting on the Cassandra 
JVMs is 4GB.

We have a workload that generates loots of tombstones and Cassandra goes 
OOM in about 24 hours. We've adjusted the tombstone_failure_threshold down 
to 25000 but we never see the TombstoneOverwhelmingException before the 
nodes start going OOM.

The table operation that looks to be the culprit is a scan of partition 
keys (i.e. we are scanning across narrow rows, not scanning within a wide 
row). The heapdump shows we have a RangeSliceReply containing an ArrayList 
with 1,823,230 org.apache.cassandra.db.Row objects with a retained heap 
size of 441MiB.  A look inside one of the Row objects shows an 
org.apache.cassandra.db.DeletionInfo object so I assume that means the row 
has been tombstoned.

If all of the 1,823,239 Row objects are tombstoned (and it is likely that 
most of them are), is there a reason that the 
TombstoneOverwhelmingException never gets thrown? 



Regards,

Rick (R.) Gunderson 
Software Engineer
IBM Commerce, B2B Development - GDHA


Phone: 1-250-220-1053 
E-mail: rgunder...@ca.ibm.com 
Find me on:  


1803 Douglas St
Victoria, BC V8T 5C3 
Canada 




RE: Inconsistent Reads after Restoring Snapshot

2016-04-28 Thread SEAN_R_DURITY
https://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configLogArchive_t.html

Sean Durity

From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in]
Sent: Wednesday, April 27, 2016 10:44 PM
To: user@cassandra.apache.org
Subject: RE: Inconsistent Reads after Restoring Snapshot

No.We are not saving them.I have never read that in DataStax documentation.

Thanks
Anuj
Sent from Yahoo Mail on 
Android

On Thu, 28 Apr, 2016 at 12:45 AM, 
sean_r_dur...@homedepot.com
> wrote:
What about the commitlogs? Are you saving those off anywhere in between the 
snapshot and the crash?


Sean Durity

From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in]
Sent: Monday, April 25, 2016 10:26 PM
To: User
Subject: Inconsistent Reads after Restoring Snapshot

Hi,

We have 2.0.14. We use RF=3 and read/write at Quorum. Moreover, we dont use 
incremental backups. As per the documentation at 
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html
 , if i need to restore a Snapshot on SINGLE node in a cluster, I would run 
repair at the end. But while the repair is going on, reads may get inconsistent.


Consider following scenario:
10 AM Daily Snapshot taken of node A and moved to backup location
11 AM A record is inserted such that node A and B insert the record but there 
is a mutation drop on node C.
1 PM Node A crashes and data is restored from latest 10 AM snapshot. Now, only 
Node B has the record.

Now, my question is:

Till the repair is completed on node A,a read at Quorum may return inconsistent 
result based on the nodes from which data is read.If data is read from node A 
and node C, nothing is returned and if data is read from node A and node B, 
record is returned. This is a vital point which is not highlighted anywhere.


Please confirm my understanding.If my understanding is right, how to make sure 
that my reads are not inconsistent while a node is being repair after restoring 
a snapshot.

I think, autobootstrapping the node without joining the ring till the repair is 
completed, is an alternative option. But snapshots save lot of streaming as 
compared to bootstrap.

Will incremental backups guarantee that

Thanks
Anuj


Sent from Yahoo Mail on 
Android



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


[Announcement] Achilles 4.2.0 releasd

2016-04-28 Thread DuyHai Doan
Hello all

 I am pleased to announce the release of Achilles 4.2.0.

 The biggest change is the support for type-safe function calls in the
SELECT DSL as well as UDF/UDA declaration in Achilles.

 The generated DSL code enforces the type of each function call so that the
parameter types/return type of each function match the one of the column.

 For more details, the doc:
https://github.com/doanduyhai/Achilles/wiki/Functions-Mapping

Regards

Duy Hai DOAN


Re: Query regarding spark on cassandra

2016-04-28 Thread Siddharth Verma
Anyways, thanks for your reply.


On Thu, Apr 28, 2016 at 1:59 PM, Hannu Kröger  wrote:

> Ok, then I don’t understand the problem.
>
> Hannu
>
> On 28 Apr 2016, at 11:19, Siddharth Verma 
> wrote:
>
> Hi Hannu,
>
> Had the issue been caused due to read, the insert, and delete statement
> would have been erroneous.
> "I saw the stdout from web-ui of spark, and the query along with true was
> printed for both the queries.".
> The statements were correct as seen on the UI.
> Thanks,
> Siddharth Verma
>
>
>
> On Thu, Apr 28, 2016 at 1:22 PM, Hannu Kröger  wrote:
>
>> Hi,
>>
>> could it be consistency level issue? If you use ONE for reads and writes,
>> might be that sometimes you don't get what you are writing.
>>
>> See:
>>
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>>
>> Br,
>> Hannu
>>
>>
>> 2016-04-27 20:41 GMT+03:00 Siddharth Verma 
>> :
>>
>>> Hi,
>>> I dont know, if someone has faced this problem or not.
>>> I am running a job where some data is loaded from cassandra table. From
>>> that data, i make some insert and delete statements.
>>> and execute it (using forEach)
>>>
>>> Code snippet:
>>> boolean deleteStatus=
>>> connector.openSession().execute(delete).wasApplied();
>>> boolean  insertStatus =
>>> connector.openSession().execute(insert).wasApplied();
>>> System.out.println(delete+":"+deleteStatus);
>>> System.out.println(insert+":"+insertStatus);
>>>
>>> When i run it locally, i see the respective results in the table.
>>>
>>> However when i run it on a cluster, sometimes the result is displayed
>>> and sometime the changes don't take place.
>>> I saw the stdout from web-ui of spark, and the query along with true was
>>> printed for both the queries.
>>>
>>> I can't understand, what could be the issue.
>>>
>>> Any help would be appreciated.
>>>
>>> Thanks,
>>> Siddharth Verma
>>>
>>
>>
>
>


Re: Query regarding spark on cassandra

2016-04-28 Thread Hannu Kröger
Ok, then I don’t understand the problem.

Hannu

> On 28 Apr 2016, at 11:19, Siddharth Verma  
> wrote:
> 
> Hi Hannu,
> 
> Had the issue been caused due to read, the insert, and delete statement would 
> have been erroneous.
> "I saw the stdout from web-ui of spark, and the query along with true was 
> printed for both the queries.".
> The statements were correct as seen on the UI.
> Thanks,
> Siddharth Verma
> 
> 
> 
> On Thu, Apr 28, 2016 at 1:22 PM, Hannu Kröger  > wrote:
> Hi,
> 
> could it be consistency level issue? If you use ONE for reads and writes, 
> might be that sometimes you don't get what you are writing.
> 
> See:
> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>  
> 
> 
> Br,
> Hannu
> 
> 
> 2016-04-27 20:41 GMT+03:00 Siddharth Verma  >:
> Hi,
> I dont know, if someone has faced this problem or not.
> I am running a job where some data is loaded from cassandra table. From that 
> data, i make some insert and delete statements.
> and execute it (using forEach)
> 
> Code snippet:
> boolean deleteStatus= connector.openSession().execute(delete).wasApplied();
> boolean  insertStatus = connector.openSession().execute(insert).wasApplied();
> System.out.println(delete+":"+deleteStatus);
> System.out.println(insert+":"+insertStatus);
> 
> When i run it locally, i see the respective results in the table.
> 
> However when i run it on a cluster, sometimes the result is displayed and 
> sometime the changes don't take place.
> I saw the stdout from web-ui of spark, and the query along with true was 
> printed for both the queries.
> 
> I can't understand, what could be the issue.
> 
> Any help would be appreciated.
> 
> Thanks,
> Siddharth Verma
> 
> 



Re: Query regarding spark on cassandra

2016-04-28 Thread Siddharth Verma
Hi Hannu,

Had the issue been caused due to read, the insert, and delete statement
would have been erroneous.
"I saw the stdout from web-ui of spark, and the query along with true was
printed for both the queries.".
The statements were correct as seen on the UI.
Thanks,
Siddharth Verma



On Thu, Apr 28, 2016 at 1:22 PM, Hannu Kröger  wrote:

> Hi,
>
> could it be consistency level issue? If you use ONE for reads and writes,
> might be that sometimes you don't get what you are writing.
>
> See:
>
> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>
> Br,
> Hannu
>
>
> 2016-04-27 20:41 GMT+03:00 Siddharth Verma :
>
>> Hi,
>> I dont know, if someone has faced this problem or not.
>> I am running a job where some data is loaded from cassandra table. From
>> that data, i make some insert and delete statements.
>> and execute it (using forEach)
>>
>> Code snippet:
>> boolean deleteStatus=
>> connector.openSession().execute(delete).wasApplied();
>> boolean  insertStatus =
>> connector.openSession().execute(insert).wasApplied();
>> System.out.println(delete+":"+deleteStatus);
>> System.out.println(insert+":"+insertStatus);
>>
>> When i run it locally, i see the respective results in the table.
>>
>> However when i run it on a cluster, sometimes the result is displayed and
>> sometime the changes don't take place.
>> I saw the stdout from web-ui of spark, and the query along with true was
>> printed for both the queries.
>>
>> I can't understand, what could be the issue.
>>
>> Any help would be appreciated.
>>
>> Thanks,
>> Siddharth Verma
>>
>
>


Re: Discrepancy while paging through table, and static column updated inbetween

2016-04-28 Thread Siddharth Verma
Hi Tyler,
I have created a jira for another issue, which have encountered. It is not
limited only to our speculation about static column update.
https://issues.apache.org/jira/browse/CASSANDRA-11680

Thanks


On Tue, Apr 19, 2016 at 10:37 PM, Tyler Hobbs  wrote:

> This sounds similar to
> https://issues.apache.org/jira/browse/CASSANDRA-10010, but that only
> affected 2.x.  Can you open a Jira ticket with your table schema, the
> problematic query, and the details you posted here?
>
> On Tue, Apr 19, 2016 at 10:25 AM, Siddharth Verma <
> verma.siddha...@snapdeal.com> wrote:
>
>> Hi,
>>
>> We are using cassandra(dsc3.0.3) on production.
>>
>> For some purpose, we were doing a full table scan (setPagingState and
>> getPagingState used on ResultSet in java program), and there has been some
>> discrepancy when we ran the same job multiple times.
>> Each time some new data was added to the output, and some was left out.
>>
>> Side Note 1 :
>> Table structure
>> col1, col2, col3, col4, col5, col6
>> Primary key(col1, col2)
>> col5 is static column
>> col6 static column. Used to explicitly store updated time when col5
>> changed
>>
>>
>> Sample Data
>> 1,A,AA,AAA,STATIC,T1
>> 1,B,BB,BBB,STATIC,T1
>> 1,C,CC,CCC,STATIC,T1
>> 1,D,DD,DDD,STATIC,T1
>>
>> For some key, sometime col6 was updated while the job was running, so
>> some values were not printed for that partition key.
>>
>> Side Note 2 :
>> we did -> select col6, writetime(col6) from ... where col1=... and
>> col2=...
>> For the data that was missed out to make sure that particular entry
>> wasn't added later.
>>
>>
>> Side Note 3:
>> The above scenario that some col6 was updated while job was running,
>> therefore some entry for that partition key was ignored, is an assumption
>> from our end.
>> We can't understand why some entries were not printed in the table scan.
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Query regarding spark on cassandra

2016-04-28 Thread Hannu Kröger
Hi,

could it be consistency level issue? If you use ONE for reads and writes,
might be that sometimes you don't get what you are writing.

See:
https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html

Br,
Hannu


2016-04-27 20:41 GMT+03:00 Siddharth Verma :

> Hi,
> I dont know, if someone has faced this problem or not.
> I am running a job where some data is loaded from cassandra table. From
> that data, i make some insert and delete statements.
> and execute it (using forEach)
>
> Code snippet:
> boolean deleteStatus= connector.openSession().execute(delete).wasApplied();
> boolean  insertStatus =
> connector.openSession().execute(insert).wasApplied();
> System.out.println(delete+":"+deleteStatus);
> System.out.println(insert+":"+insertStatus);
>
> When i run it locally, i see the respective results in the table.
>
> However when i run it on a cluster, sometimes the result is displayed and
> sometime the changes don't take place.
> I saw the stdout from web-ui of spark, and the query along with true was
> printed for both the queries.
>
> I can't understand, what could be the issue.
>
> Any help would be appreciated.
>
> Thanks,
> Siddharth Verma
>


Re: Query regarding spark on cassandra

2016-04-28 Thread Siddharth Verma
Edit:
1. dc2 node has been removed.
nodetool status shows only active nodes.
2. Repair done on all nodes.
3. Cassandra restarted

Still it doesn't solve the problem.

On Thu, Apr 28, 2016 at 9:00 AM, Siddharth Verma <
verma.siddha...@snapdeal.com> wrote:

> Hi, If the info could be used
> we are using two DCs
> dc1 - 3 nodes
> dc2 - 1 node
> however, dc2 has been down for 3-4 weeks, and we haven't removed it yet.
>
> spark slaves on same machines as the cassandra nodes.
> each node has two instances of slaves.
>
> spark master on a separate machine.
>
> If anyone could provide insight to the problem, it would be helpful.
>
> Thanks
>
> On Wed, Apr 27, 2016 at 11:11 PM, Siddharth Verma <
> verma.siddha...@snapdeal.com> wrote:
>
>> Hi,
>> I dont know, if someone has faced this problem or not.
>> I am running a job where some data is loaded from cassandra table. From
>> that data, i make some insert and delete statements.
>> and execute it (using forEach)
>>
>> Code snippet:
>> boolean deleteStatus=
>> connector.openSession().execute(delete).wasApplied();
>> boolean  insertStatus =
>> connector.openSession().execute(insert).wasApplied();
>> System.out.println(delete+":"+deleteStatus);
>> System.out.println(insert+":"+insertStatus);
>>
>> When i run it locally, i see the respective results in the table.
>>
>> However when i run it on a cluster, sometimes the result is displayed and
>> sometime the changes don't take place.
>> I saw the stdout from web-ui of spark, and the query along with true was
>> printed for both the queries.
>>
>> I can't understand, what could be the issue.
>>
>> Any help would be appreciated.
>>
>> Thanks,
>> Siddharth Verma
>>
>
>