Re: disk space issue

2014-10-01 Thread Dominic Letz
This is a shot into the dark but you could check whether you have too many
snapshots laying around that you actually don't need. You can get rid of
those with a quick nodetool clearsnapshot.

On Wed, Oct 1, 2014 at 5:49 AM, cem cayiro...@gmail.com wrote:

 Hi All,

 I have a 7 node cluster. One node ran out of disk space and others are
 around 80% disk utilization.
 The data has 10 days TTL but I think compaction wasn't fast enough to
 clean up the expired data.  gc_grace value is set default. I have a
 replication factor of 3. Do you think that it may help if I delete all data
 for that node and run repair. Does node repair check the ttl value before
 retrieving data from other nodes? Do you have any other suggestions?

 Best Regards,
 Cem.




-- 
Dominic Letz
Director of RD
Exosite http://exosite.com


Re: disk space issue

2014-10-01 Thread Sumod Pawgi
In the past in such scenarios it has helped us to check the partition where 
cassandra is installed and allocate more space for the partition. Maybe it is a 
disk space issue but it is good to check if it is related to the space 
allocation for the partition issue. My 2 cents.

Sent from my iPhone

 On 01-Oct-2014, at 11:53 am, Dominic Letz dominicl...@exosite.com wrote:
 
 This is a shot into the dark but you could check whether you have too many 
 snapshots laying around that you actually don't need. You can get rid of 
 those with a quick nodetool clearsnapshot.
 
 On Wed, Oct 1, 2014 at 5:49 AM, cem cayiro...@gmail.com wrote:
 Hi All,
 
 I have a 7 node cluster. One node ran out of disk space and others are 
 around 80% disk utilization. 
 The data has 10 days TTL but I think compaction wasn't fast enough to clean 
 up the expired data.  gc_grace value is set default. I have a replication 
 factor of 3. Do you think that it may help if I delete all data for that 
 node and run repair. Does node repair check the ttl value before retrieving 
 data from other nodes? Do you have any other suggestions?
 
 Best Regards,
 Cem.
 
 
 
 -- 
 Dominic Letz
 Director of RD
 Exosite
 


Re: disk space issue

2014-10-01 Thread Nikolay Mihaylov
my 2 cents:

try major compaction on the column family with TTL's - for sure will be
faster than full rebuild.

also try not cassandra related things, such check and remove old log files,
backups etc.

On Wed, Oct 1, 2014 at 9:34 AM, Sumod Pawgi spa...@gmail.com wrote:

 In the past in such scenarios it has helped us to check the partition
 where cassandra is installed and allocate more space for the partition.
 Maybe it is a disk space issue but it is good to check if it is related to
 the space allocation for the partition issue. My 2 cents.

 Sent from my iPhone

 On 01-Oct-2014, at 11:53 am, Dominic Letz dominicl...@exosite.com wrote:

 This is a shot into the dark but you could check whether you have too many
 snapshots laying around that you actually don't need. You can get rid of
 those with a quick nodetool clearsnapshot.

 On Wed, Oct 1, 2014 at 5:49 AM, cem cayiro...@gmail.com wrote:

 Hi All,

 I have a 7 node cluster. One node ran out of disk space and others are
 around 80% disk utilization.
 The data has 10 days TTL but I think compaction wasn't fast enough to
 clean up the expired data.  gc_grace value is set default. I have a
 replication factor of 3. Do you think that it may help if I delete all data
 for that node and run repair. Does node repair check the ttl value before
 retrieving data from other nodes? Do you have any other suggestions?

 Best Regards,
 Cem.




 --
 Dominic Letz
 Director of RD
 Exosite http://exosite.com




Re: Not-Equals (!=) in Where Clause

2014-10-01 Thread Sylvain Lebresne
Right, my bad, thanks Tyler for the correction.

On Tue, Sep 30, 2014 at 5:44 PM, Tyler Hobbs ty...@datastax.com wrote:

 I think Sylvain may not have had his coffee yet.  You can't use IF's in
 SELECT statements, but you can in INSERT/UPDATE/DELETE:

 UPDATE foo SET a = 0 WHERE k = 0 IF b != 0;

 On Tue, Sep 30, 2014 at 2:36 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:



 Is != supported as part of the where clause in Cassandra?


 It's not.

 Or is it the grammar for some other purpose?


 It's supported in 'IF' conditions. You can do something like:
   SELECT * FROM foo WHERE k = 0 IF v != 3;

 --
 Sylvain




 --
 Tyler Hobbs
 DataStax http://datastax.com/



Regarding Cassandra-Stress tool

2014-10-01 Thread shahab
Hi,

I am trying to benchmark our custom schema in Cassandra and I managed to
run it. However there are couple of setting and issues which I couldn't
find any solution/explanation for. I appreciate any comments.
1- The default number of warm-up iterations in stress tool is about 5.
I would like to reduce this number (due to my storage space limitations),
but I couldn't find any input parameters to do this. I just wonder if this
setting is possible ?


2- I did not understand well what does the output of cassandra stress tool
mean? I read  this
http://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStressOutput_c.html,
but . for example, what does latency means here? does it mean how long a
read/write operation is delayed until it is executed? in this case, what is
the measure for actual read/write operation?
It seems that the documentation is outdated, there is an output parameter
partition_rate which is not explained in documentation?

best,
/Shahab


Re: disk space issue

2014-10-01 Thread Ken Hancock
Major compaction is bad if you're using size-tiered, especially if you're
already having capacity issues.  Once you have one huge table, with default
settings, you'll need 4x that huge table worth of storage in order for it
to compact again to ever reclaim your TTL'd data.

If you're running into space issues that are ultimately going to get your
system wedged and you're using columns with TTL, I'd recommend using the
jmx operation to compact individual tables.  This will free the TTL'd data
assuming that you've exceeded your gc_grace_seconds.  This can probably be
scripted up in a relatively easy manner with a nice,
shellshocked-vulnerable bash script and jmxterm.


On Wed, Oct 1, 2014 at 2:43 AM, Nikolay Mihaylov n...@nmmm.nu wrote:

 my 2 cents:

 try major compaction on the column family with TTL's - for sure will be
 faster than full rebuild.

 also try not cassandra related things, such check and remove old log
 files, backups etc.

 On Wed, Oct 1, 2014 at 9:34 AM, Sumod Pawgi spa...@gmail.com wrote:

 In the past in such scenarios it has helped us to check the partition
 where cassandra is installed and allocate more space for the partition.
 Maybe it is a disk space issue but it is good to check if it is related to
 the space allocation for the partition issue. My 2 cents.

 Sent from my iPhone

 On 01-Oct-2014, at 11:53 am, Dominic Letz dominicl...@exosite.com
 wrote:

 This is a shot into the dark but you could check whether you have too
 many snapshots laying around that you actually don't need. You can get rid
 of those with a quick nodetool clearsnapshot.

 On Wed, Oct 1, 2014 at 5:49 AM, cem cayiro...@gmail.com wrote:

 Hi All,

 I have a 7 node cluster. One node ran out of disk space and others are
 around 80% disk utilization.
 The data has 10 days TTL but I think compaction wasn't fast enough to
 clean up the expired data.  gc_grace value is set default. I have a
 replication factor of 3. Do you think that it may help if I delete all data
 for that node and run repair. Does node repair check the ttl value before
 retrieving data from other nodes? Do you have any other suggestions?

 Best Regards,
 Cem.




 --
 Dominic Letz
 Director of RD
 Exosite http://exosite.com





-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.


cassandra stress tools

2014-10-01 Thread shahab
Hi,

I am trying to benchmark our custom schema in Cassandra and I managed to
run it. However there are couple of setting and issues which I couldn't
find any solution/explanation for. I appreciate any comments.
1- The default number of warm-up iterations in stress tool is about 5.
I would like to reduce this number (due to my storage space limitations),
but I couldn't find any input parameters to do this. I just wonder if this
setting is possible ?


2- I did not understand well what does the output of cassandra stress tool
mean? I read  this
http://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStressOutput_c.html,
but . for example, what does latency means here? does it mean how long a
read/write operation is delayed until it is executed? in this case, what is
the measure for actual read/write operation?
It seems that the documentation is outdated, there is an output parameter
partition_rate which is not explained in documentation?

best,
/Shahab


CASSANDRA-7649 : upgrade existing db to 2.0.10

2014-10-01 Thread Desimpel, Ignace
I deploy/distribute the Cassandra database as an embedded service allowing me 
to create a basic cassandra.yaml file based on the global cluster of machines 
(seeds, non-seeds, ports, disks, etc...). That allows me to configure and 
upgrade my own software and the cassandra software using the same 
cassandra.yaml. That yaml file has no tokens specified in it, still having a 
vnode cluster (thanks cassandra) .

In previous versions that was ok, since the cassandra code was simply accepting 
the tokens it saved in its own database, disregarding any changes one made in 
the yaml file ( there was no test like bootstrapTokens.size() != 
DatabaseDescriptor.getNumTokens() ). I guess there was some logic to that, 
since at that time the system is not bootstrapping and thus should/could use 
the known token configuration without using the yaml token parameter.

Also, isn't this small code change of CASSANDRA-7649 inspired on balancing 
problems going to vnodes (CASSANDRA-7601) using a random partitioner. And in my 
case I'm using a ByteOrdered partitioner, forcing me to balance/move/add 
nodes/tokens myself.
And as the description is saying, it was meant to avoid 'to change the number 
of tokens', that test is doing a little more (from my point of view).

Well, in short : I would be in favor of removing that test, clearly leaving a 
message that the saved tokens are used, not the yaml configured tokens.

Regards,
Ignace






Re: disk space issue

2014-10-01 Thread cem
thanks for the answers!

Cem

On Wed, Oct 1, 2014 at 2:38 PM, Ken Hancock ken.hanc...@schange.com wrote:

 *https://github.com/hancockks/cassandra-compact-cf
 https://github.com/hancockks/cassandra-compact-cf*

 On Tue, Sep 30, 2014 at 5:49 PM, cem cayiro...@gmail.com wrote:

 Hi All,

 I have a 7 node cluster. One node ran out of disk space and others are
 around 80% disk utilization.
 The data has 10 days TTL but I think compaction wasn't fast enough to
 clean up the expired data.  gc_grace value is set default. I have a
 replication factor of 3. Do you think that it may help if I delete all data
 for that node and run repair. Does node repair check the ttl value before
 retrieving data from other nodes? Do you have any other suggestions?

 Best Regards,
 Cem.




 --
 *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
 http://www.schange.com/en-US/Company/InvestorRelations.aspx
 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
 http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.



Cassaandra Java 8

2014-10-01 Thread Tony Anecito
Hi All,
Has anyone done any performance testing of say Cassandra 2.1 using Java 8?
Thanks,-Tony


Question about incremental repair

2014-10-01 Thread John Sumsion
If you only run incremental repairs, does that mean that bitrot will go 
undetected for already repaired sstables?

If so, is there any other process that will detect bitrot for all the repaired 
sstables other than full repair (or an unfortunate user)?

John...


 NOTICE: This email message is for the sole use of the intended recipient(s) 
and may contain confidential and privileged information. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.



Re: Question about incremental repair

2014-10-01 Thread Tyler Hobbs
Compressed SSTables store a checksum for every compressed block, which is
checked each time the block is decompressed.  I believe there's a ticket
out there to add something similar for non-compressed SSTables.

We also store the sha1 hash of SSTables in its own file on disk.

On Wed, Oct 1, 2014 at 4:45 PM, John Sumsion sumsio...@familysearch.org
wrote:

  If you only run incremental repairs, does that mean that bitrot will go
 undetected for already repaired sstables?

  If so, is there any other process that will detect bitrot for all the
 repaired sstables other than full repair (or an unfortunate user)?

  John...



 NOTICE: This email message is for the sole use of the intended
 recipient(s) and may contain confidential and privileged information. Any
 unauthorized review, use, disclosure or distribution is prohibited. If you
 are not the intended recipient, please contact the sender by reply email
 and destroy all copies of the original message.




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Question about incremental repair

2014-10-01 Thread Robert Coli
On Wed, Oct 1, 2014 at 3:11 PM, Tyler Hobbs ty...@datastax.com wrote:

 Compressed SSTables store a checksum for every compressed block, which is
 checked each time the block is decompressed.  I believe there's a ticket
 out there to add something similar for non-compressed SSTables.

 We also store the sha1 hash of SSTables in its own file on disk.


@OP : this came up a few weeks ago on the list, search for bitrot for the
previous thread.

Expanding on the discussion further, I plan to file a JIRA on
you-must-mark-all-sstables-for-that-range-unrepaired-if-you-fail-CRC-on-read.
I'll try to remember to reply on thread when I do.

Once there is a CRC on uncompressed read,
marking-all-sstables-unrepaired-on-failed-CRC would handle the bitrot case
for both uncompressed and compressed reads.

=Rob


Re: cassandra stress tools

2014-10-01 Thread Sumod Pawgi
Not a direct answer to your post but you can also take a look at YCSB.

Sent from my iPhone

 On 01-Oct-2014, at 8:38 pm, shahab shahab.mok...@gmail.com wrote:
 
 Hi,
 
 I am trying to benchmark our custom schema in Cassandra and I managed to run 
 it. However there are couple of setting and issues which I couldn't find any 
 solution/explanation for. I appreciate any comments.
 1- The default number of warm-up iterations in stress tool is about 5. I 
 would like to reduce this number (due to my storage space limitations), but I 
 couldn't find any input parameters to do this. I just wonder if this setting 
 is possible ?
 
 
 2- I did not understand well what does the output of cassandra stress tool 
 mean? I read  this 
 http://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStressOutput_c.html,
  but . for example, what does latency means here? does it mean how long a 
 read/write operation is delayed until it is executed? in this case, what is 
 the measure for actual read/write operation?
 It seems that the documentation is outdated, there is an output parameter 
 partition_rate which is not explained in documentation?
 
 best,
 /Shahab