Re: Impact of removing compactions_in_progress folder

2015-04-13 Thread Anuj Wadehra
Any comments on exceptions related to unfinished compactions on Cassandra start 
up? Best way to deal with them? Side effects of deleting 
compactions_in_progress folder to resolve the issue?


Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:Anuj Wadehra anujw_2...@yahoo.co.in
Date:Mon, 13 Apr, 2015 at 12:32 am
Subject:Impact of removing compactions_in_progress folder

Often we face errors on Cassandra start regarding unfinished compactions 
particularly when cassandra was abrupty shut down . Problem gets resolved when 
we delete /var/lib/cassandra/data/system/compactions_in_progress folder. Does 
deletion of the folder has any impact on  integrity of data or any other aspect?



Thanks

Anuj Wadehra



Re: Impact of removing compactions_in_progress folder

2015-04-13 Thread Robert Coli
On Sun, Apr 12, 2015 at 12:02 PM, Anuj Wadehra anujw_2...@yahoo.co.in
wrote:

 Often we face errors on Cassandra start regarding unfinished compactions
 particularly when cassandra was abrupty shut down . Problem gets resolved
 when we delete /var/lib/cassandra/data/system/compactions_in_progress
 folder. Does deletion of the folder has any impact on  integrity of data or
 any other aspect?


While I have no specific knowledge about this case, it is difficult to
imagine how canceling a compaction could have any meaningful negative
effect other than the normal penalty one pays for uncompacted data.

nodetool stop can also cancel compactions, probably by approximately the
same mechanism as removing an in-progress-compactions file.

However if you can reproduce this reliably, you should :

1) file a ticket on http://issues.apache.org
2) respond to this mail letting the list know the JIRA # of the ticket

=Rob


Re: Delete-only work loads crash Cassandra

2015-04-13 Thread Robert Wille
Unfortunately, I’ve switched email systems and don’t have my emails from that 
time period. I did not file a Jira, and I don’t remember who made the patch for 
me or if he filed a Jira on my behalf.

I vaguely recall seeing the fix in the Cassandra change logs, but I just went 
and read them and I don’t see it. I’m probably remembering wrong.

My suspicion is that the original patch did not make it into the main branch, 
and I just have always had enough concurrent writing to keep Cassandra happy.

Hopefully the author of the patch will read this and be able to chime in.

This issue is very reproducible. I’ll try to come up with some time to write a 
simple program that illustrates the problem and file a Jira.

Thanks

Robert

On Apr 13, 2015, at 10:39 AM, Philip Thompson 
philip.thomp...@datastax.commailto:philip.thomp...@datastax.com wrote:

Did the original patch make it into upstream? That's unclear. If so, what was 
the JIRA #? Have you filed a JIRA for the new problem?

On Mon, Apr 13, 2015 at 12:21 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I 
did lots of deletes and no upserts, Cassandra would report that the memtable 
was 0 bytes because an accounting error. The memtable would never flush and 
Cassandra would eventually die. Someone was kind enough to create a patch, 
which seemed to have fixed the problem, but last night it reared its ugly head.

I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes, RF=3, 
CL=1). The workload was pretty light, because this cleanup process is 
single-threaded and does everything synchronously. It was performing 4 reads 
per second and about 3000 deletes per second. Over the course of many hours, 
heap slowly grew on all nodes. CPU utilization also increased as GC consumed an 
ever-increasing amount of time. Eventually a couple of nodes shed 3.5 GB of 
their 7.5 GB. Other nodes weren’t so fortunate and started flapping due to 30 
second GC pauses.

The workaround is pretty simple. This cleanup process can simply write a dummy 
record with a TTL periodically so that Cassandra can flush its memtables and 
function properly. However, I think this probably ought to be fixed. 
Delete-only workloads can’t be that rare. I can’t be the only one that needs to 
go through and cleanup their tables.

Robert





Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Robert Coli
On Mon, Apr 13, 2015 at 10:52 AM, Anuj Wadehra anujw_2...@yahoo.co.in
wrote:

 Any comments on side effects of Major compaction especially when sstable
 generated is 100+ GB?


I have no idea how this interacts with the automatic compaction stuff; if
you find out, let us know?

But if you want to do a major and don't want to deal with One Big SSTable
afterwards, stop the node and then run sstable_split utility.

=Rob


Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Anuj Wadehra
Any comments on side effects of Major compaction especially when sstable 
generated is 100+ GB? 


After Cassandra 1.2 , automated tombstone compaction occurs even on a single 
sstable if tombstone percentage increases the tombstone_threshold sub property 
specified in compaction strategy. So, even if the huge sstable is not compacted 
with any new table, still tombstones will be collected. Any other disadvantage 
of having a giant sstable of hundreds of GB? I understand that sstables have a 
summary and index which helps finding correct data blocks directly from a large 
data file. Still are there any disadvantages?


Thanks

Anuj Wadehra


Sent from Yahoo Mail on Android

From:Anuj Wadehra anujw_2...@yahoo.co.in
Date:Mon, 13 Apr, 2015 at 12:33 am
Subject:Re: Drawbacks of Major Compaction now that Automatic Tombstone 
Compaction Exists

No.


Anuj Wadehra




On Monday, 13 April 2015 12:23 AM, Sebastian Estevez 
sebastian.este...@datastax.com wrote:



Have you tried user defined compactions via JMX?

On Apr 12, 2015 1:40 PM, Anuj Wadehra anujw_2...@yahoo.co.in wrote:

Recently we faced an issue where every repair operation caused addition of 
hundreds of sstables (CASSANDRA-9146). In order to bring situation under 
control and make sure reads are not impacted, we were left with no option but 
to run major compaction to ensure that thousands of tiny sstables are compacted.

Queries:
Does major compaction has any drawback after automatic tombstone compaction got 
implemented in 1.2 via tombstone_threshold sub-property(CASSANDRA-3442)? 
I understand that the huge SSTable created after major compaction wont be 
compacted with new data any time soon but is that a problem if purged data is 
removed via automatic tombstone compaction? If we major compaction results in a 
huge file say 500GB, what are the drawbacks of it?

If one big sstable is a problem, is there any way of solving the problem? We 
tried running sstablesplit after major compaction to split the big sstable but 
as new sstables were of same size they are again compacted into single huge 
table once Cassandra was started after executing sstablesplit.



Thanks

Anuj Wadehra





Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Rahul Neelakantan
Rob,
Does that mean once you split it back into small ones, automatic compaction a 
will continue to happen on a more frequent basis now that it's no longer a 
single large monolith?

Rahul

 On Apr 13, 2015, at 3:23 PM, Robert Coli rc...@eventbrite.com wrote:
 
 On Mon, Apr 13, 2015 at 10:52 AM, Anuj Wadehra anujw_2...@yahoo.co.in 
 wrote:
 
 Any comments on side effects of Major compaction especially when sstable 
 generated is 100+ GB? 
 
 I have no idea how this interacts with the automatic compaction stuff; if you 
 find out, let us know?
 
 But if you want to do a major and don't want to deal with One Big SSTable 
 afterwards, stop the node and then run sstable_split utility. 
 
 =Rob
 


Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Robert Coli
On Mon, Apr 13, 2015 at 12:26 PM, Rahul Neelakantan ra...@rahul.be wrote:

 Does that mean once you split it back into small ones, automatic
 compaction a will continue to happen on a more frequent basis now that it's
 no longer a single large monolith?


That's what the word size tiered means in the phrase size tiered
compaction, yes.

=Rob


Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Benyi Wang
What about incremental repair and sequential repair?

I ran nodetool repair -- keyspace table on one node. I found the repair
sessions running on different nodes. Will this command repair the whole
table?

In this page:
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html#concept_ds_ebj_d3q_gk__opsRepairPrtRng

*Using the nodetool repair -pr (–partitioner-range) option repairs only the
first range returned by the partitioner for a node. Other replicas for that
range still have to perform the Merkle tree calculation, causing a
validation compaction.*

Does it sound like -pr runs on one node?
I'm still don't understand the first range returned by the partitioned for
a node?

On Mon, Apr 13, 2015 at 1:40 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Apr 13, 2015 at 1:36 PM, Benyi Wang bewang.t...@gmail.com wrote:


- I need to run compaction one each node,

 In general, there is no requirement to manually run compaction. Minor
 compaction occurs in the background, automatically.


- To repair a table (column family), I only need to run repair on any
of nodes.

 It depends on whether you are doing -pr or non -pr repair.

 If you are doing -pr repair, you run repair on all nodes. If you do non
 -pr repair, you have to figure out what set of nodes to run it on. That's
 why -pr exists, to simplify this.

 =Rob




Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Robert Coli
On Mon, Apr 13, 2015 at 1:36 PM, Benyi Wang bewang.t...@gmail.com wrote:


- I need to run compaction one each node,

 In general, there is no requirement to manually run compaction. Minor
compaction occurs in the background, automatically.


- To repair a table (column family), I only need to run repair on any
of nodes.

 It depends on whether you are doing -pr or non -pr repair.

If you are doing -pr repair, you run repair on all nodes. If you do non -pr
repair, you have to figure out what set of nodes to run it on. That's why
-pr exists, to simplify this.

=Rob


Do I need to run repair and compaction every node?

2015-04-13 Thread Benyi Wang
I read the document for several times, but I still not quite sure how to
run repair and compaction.

To my understanding,

   - I need to run compaction one each node,
   - To repair a table (column family), I only need to run repair on any of
   nodes.

Am I right?

Thanks.


Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Robert Coli
On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland j...@tubularlabs.com wrote:

 Nodetool repair -par: covers all nodes, computes merkle trees for each
 node at the same time. Much higher IO load as every copy of a key range is
 scanned at once. Can be totally OK with SSDs and throughput limits.  Only
 need to run the command one node.


No? -par is just a performance (of repair) de-optimization, intended to
improve service time during repair. Doing -par without -pr on a single node
doesn't repair your entire cluster.

Consider the following 7 node cluster, without vnodes :

A B C D E F G
RF=3

You run a repair on node D, without -pr.

D is repaired against B's tertiary replicas.
D is repaired against C's secondary replicas.
E is repaired against D's secondary replicas.
F is repaired against D's tertiary replicas.
Nodes A and G are completely unaffected and unrepaired, because D does not
share any ranges with them.

repair with or without -par only covers all *replica* nodes. Even with
vnodes, you still have to run it on almost all nodes in most cases. Which
is why most users should save themselves the complexity and just do a
rolling -par -pr on all nodes, one by one.

=Rob


Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Jon Haddad
Or use spotify’s reaper and forget about it 
https://github.com/spotify/cassandra-reaper 
https://github.com/spotify/cassandra-reaper
 On Apr 13, 2015, at 3:45 PM, Robert Coli rc...@eventbrite.com wrote:
 
 On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland j...@tubularlabs.com 
 mailto:j...@tubularlabs.com wrote:
 Nodetool repair -par: covers all nodes, computes merkle trees for each node 
 at the same time. Much higher IO load as every copy of a key range is scanned 
 at once. Can be totally OK with SSDs and throughput limits.  Only need to run 
 the command one node.
 
 No? -par is just a performance (of repair) de-optimization, intended to 
 improve service time during repair. Doing -par without -pr on a single node 
 doesn't repair your entire cluster.
 
 Consider the following 7 node cluster, without vnodes :
 
 A B C D E F G
 RF=3
 
 You run a repair on node D, without -pr.
 
 D is repaired against B's tertiary replicas.
 D is repaired against C's secondary replicas.
 E is repaired against D's secondary replicas.
 F is repaired against D's tertiary replicas.
 Nodes A and G are completely unaffected and unrepaired, because D does not 
 share any ranges with them.
 
 repair with or without -par only covers all *replica* nodes. Even with 
 vnodes, you still have to run it on almost all nodes in most cases. Which is 
 why most users should save themselves the complexity and just do a rolling 
 -par -pr on all nodes, one by one.
 
 =Rob
 



Keyspace Replication changes not synchronized after adding Datacenter

2015-04-13 Thread Thunder Stumpges
Hi guys,

We have recently added two datacenters to our existing 2.0.6 cluster. We
followed the process here pretty much exactly:
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

We are using GossipingPropertyFileSnitch and NetworkTopologyStrategy across
the board. All property files are identical in each of the three
datacenters, and we use two nodes from each DC in the seed list.

However when we came to step 7.a. we ran the ALTER KEYSPACE command on one
of the new datacenters (to add it as a replica). This change was reflected
on the new datacenter where it ran as returned by DESCRIBE KEYSPACE.
However the change was NOT propagated to either of the other two
datacenters. We effectively had to run the ALTER KEYSPACE command 3 times,
one in each datacenter. Is this expected? I could find no documentation
stating that this needed to be done, nor any documentation around how the
system keyspace was kept in sync across datacenters in general.

If this is indicative of a larger problem with our installation, how would
we go about troubleshooting it?

Thanks in advance!
Thunder


Binary Protocol Version and CQL version supported in 2.0.14

2015-04-13 Thread Anishek Agarwal
Hello,

I was trying to find what protocol versions are supported in Cassandara
2.0.14 and after reading multiple links i am very very confused.

Please correct me if my understanding is correct:

   - Binary Protocol version and CQL Spec version are different ?
   - Cassandra 2.0.x supports CQL 3 ?
   - Is there a different Binary Protocol version between 2.0.x and 2.1.x ?


Is there some link which states what version of cassandra supports what
binary protocol version and CQL spec version (Additionally showing which
drivers support what will be great too) ?

 The  link
http://www.datastax.com/dev/blog/java-driver-2-1-2-native-protocol-v3 shows
some info but i am not sure what the supported protocol versions are
referring to(binary or CQL spec).

Thanks
Anishek


Re: Uderstanding Read after update

2015-04-13 Thread Graham Sanderson
Yes it will look in each sstable that according to the bloom filter may have 
data for that partition key and use time stamps to figure out the latest 
version (or none in case of newer tombstone) to return for each clustering key

Sent from my iPhone

 On Apr 12, 2015, at 11:18 PM, Anishek Agarwal anis...@gmail.com wrote:
 
 Thanks Tyler for the validations, 
 
 I have a follow up question. 
 
  One SSTable doesn't have precedence over another.  Instead, when the same 
 cell exists in both sstables, the one with the higher write timestamp wins.
 
 if my table has 5(non partition key columns) and i update only 1 of them then 
 the new SST table should have only that entry, which means if i query 
 everything for that parition key,  cassandra has to have the timestamp 
 matched per column for a partition key across SST tables to get me the data ?
 
 
 On Fri, Apr 10, 2015 at 10:52 PM, Tyler Hobbs ty...@datastax.com wrote:
 
 
 SST Table level bloom filters have details as to what partition keys are in 
 that table. So to clear up my understanding, if I insert and then have a 
 update to the same row after some time (assuming both go to different SST 
 Tables), then during read cassandra will read data from both SST Tables and 
 merge them in order of time series with Data in Second SST table for the 
 row taking precedence over the First SST Table and return the result ?
 
 That's approximately correct.  The only part that's incorrect is how merging 
 works.  One SSTable doesn't have precedence over another.  Instead, when the 
 same cell exists in both sstables, the one with the higher write timestamp 
 wins.
  
 Does it mark the old column as tombstone in the previous SST Table or wait 
 for compaction to remove the old data ?
 
 It just waits for compaction to remove the old data, there's no tombstone.
 
 
 when the data is in mem cache it also keep tracks of unique keys in that 
 memtable so when it writes to disk it can use that to derive the right size 
 of bloom filter for that SST Table ?
 
 
 That's correct, it knows the number of keys before the bloom filter is 
 created.
 
 -- 
 Tyler Hobbs
 DataStax
 


Re: Delete-only work loads crash Cassandra

2015-04-13 Thread Philip Thompson
Did the original patch make it into upstream? That's unclear. If so, what
was the JIRA #? Have you filed a JIRA for the new problem?

On Mon, Apr 13, 2015 at 12:21 PM, Robert Wille rwi...@fold3.com wrote:

 Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If
 I did lots of deletes and no upserts, Cassandra would report that the
 memtable was 0 bytes because an accounting error. The memtable would never
 flush and Cassandra would eventually die. Someone was kind enough to create
 a patch, which seemed to have fixed the problem, but last night it reared
 its ugly head.

 I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes,
 RF=3, CL=1). The workload was pretty light, because this cleanup process is
 single-threaded and does everything synchronously. It was performing 4
 reads per second and about 3000 deletes per second. Over the course of many
 hours, heap slowly grew on all nodes. CPU utilization also increased as GC
 consumed an ever-increasing amount of time. Eventually a couple of nodes
 shed 3.5 GB of their 7.5 GB. Other nodes weren’t so fortunate and started
 flapping due to 30 second GC pauses.

 The workaround is pretty simple. This cleanup process can simply write a
 dummy record with a TTL periodically so that Cassandra can flush its
 memtables and function properly. However, I think this probably ought to be
 fixed. Delete-only workloads can’t be that rare. I can’t be the only one
 that needs to go through and cleanup their tables.

 Robert




Delete-only work loads crash Cassandra

2015-04-13 Thread Robert Wille
Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I 
did lots of deletes and no upserts, Cassandra would report that the memtable 
was 0 bytes because an accounting error. The memtable would never flush and 
Cassandra would eventually die. Someone was kind enough to create a patch, 
which seemed to have fixed the problem, but last night it reared its ugly head.

I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes, RF=3, 
CL=1). The workload was pretty light, because this cleanup process is 
single-threaded and does everything synchronously. It was performing 4 reads 
per second and about 3000 deletes per second. Over the course of many hours, 
heap slowly grew on all nodes. CPU utilization also increased as GC consumed an 
ever-increasing amount of time. Eventually a couple of nodes shed 3.5 GB of 
their 7.5 GB. Other nodes weren’t so fortunate and started flapping due to 30 
second GC pauses.

The workaround is pretty simple. This cleanup process can simply write a dummy 
record with a TTL periodically so that Cassandra can flush its memtables and 
function properly. However, I think this probably ought to be fixed. 
Delete-only workloads can’t be that rare. I can’t be the only one that needs to 
go through and cleanup their tables.

Robert