Long running nodetool repair

2013-02-19 Thread Haithem Jarraya
Hi,

I am new to Cassandra and I am not sure if this is the normal behavior but
nodetool repair runs for too long even for small dataset per node. As I am
writing I started a nodetool repair last night at 18:41 and now it's 9:18
and it's still running, the size of my data is only ~500mb per node.
We have
3 Node cluster in DC1 with RF 3
1 Node Cluster in DC2 with RF 1
1 Node cluster in DC3 with RF 1

and running Cassandra V1.2.1 with 256 vNodes.

From cassandra logs I do not see AntiEntropy logs anymore only compaction
Task and FlushWriter.

Is this a normal behaviour of nodetool repair?
Is the running time grow linearly with the size of the data?

Any help or direction will be much appreciated.


Thanks,

H


Testing compaction strategies on a single production server?

2013-02-19 Thread Henrik Schröder
Hey,

Version 1.1 of Cassandra introduced live traffic sampling, which allows you
to measure the performance of a node without it really joining the cluster:
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling

That page mentions that you can change the compaction strategy through jmx
if you want to test out a different strategy on your survey node.

That's great, but it doesn't give you a complete view of how your
performance would change, since you're not doing reads from the survey
node. But what would happen if you used jmx to change the compaction
strategy of a column family on a single *production* node? Would that be a
safe way to test it out or are there side-effects of doing that live?

And if you do that, would running a major compaction transform the entire
column family to the new format?

Finally, if the test was a success, how do you proceed from there? Just
change the schema?


/Henrik


RE: Testing compaction strategies on a single production server?

2013-02-19 Thread Viktor Jevdokimov
Just turn off dynamic snitch on survey node and make read requests from it 
directly with CL.ONE, watch histograms, compare.

Regarding switching compaction strategy there're a lot of info already.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Henrik Schröder [mailto:skro...@gmail.com]
Sent: Tuesday, February 19, 2013 15:57
To: user
Subject: Testing compaction strategies on a single production server?

Hey,

Version 1.1 of Cassandra introduced live traffic sampling, which allows you to 
measure the performance of a node without it really joining the cluster: 
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling
That page mentions that you can change the compaction strategy through jmx if 
you want to test out a different strategy on your survey node.

That's great, but it doesn't give you a complete view of how your performance 
would change, since you're not doing reads from the survey node. But what would 
happen if you used jmx to change the compaction strategy of a column family on 
a single *production* node? Would that be a safe way to test it out or are 
there side-effects of doing that live?

And if you do that, would running a major compaction transform the entire 
column family to the new format?
Finally, if the test was a success, how do you proceed from there? Just change 
the schema?

/Henrik
inline: signature-logo18be.pnginline: signature-best-employer-logo6784.png

Re: cassandra vs. mongodb quick question(good additional info)

2013-02-19 Thread Edward Capriolo
The 40 TB use case you heard about is probably one 40TB mysql machine
that someone migrated to mongo so it would be web scale Cassandra is
NOT good with drives that big, get a blade center or a high density
chassis.

On Mon, Feb 18, 2013 at 8:00 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
 I thought about this more, and even with a 10Gbit network, it would take 40 
 days to bring up a replacement node if mongodb did truly have a 42T / node 
 like I had heard.  I wrote the below email to the person I heard this from 
 going back to basics which really puts some perspective on it….(and a lot of 
 people don't even have a 10Gbit network like we do)

 Nodes are hooked up by a 10G network at most right now where that is 
 10gigabit.  We are talking about 10Terabytes on disk per node recently.

 Google 10 gigabit in gigabytes gives me 1.25 gigabytes/second  (yes I could 
 have divided by 8 in my head but eh…course when I saw the number, I went duh)

 So trying to transfer 10 Terabytes  or 10,000 Gigabytes to a node that we are 
 bringing online to replace a dead node would take approximately 5 days???

 This means no one else is using the bandwidth too ;).  10,000Gigabytes * 1 
 second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.55 days.  This is more 
 likely 11 days if we only use 50% of the network.

 So bringing a new node up to speed is more like 11 days once it is crashed.  
 I think this is the main reason the 1Terabyte exists to begin with, right?

 From an ops perspective, this could sound like a nightmare scenario of 
 waiting 10 days…..maybe it is livable though.  Either way, I thought it would 
 be good to share the numbers.  ALSO, that is assuming the bus with it's 10 
 disk can keep up with 10G  Can it?  What is the limit of throughput on a 
 bus / second on the computers we have as on wikipedia there is a huge 
 variance?

 What is the rate of the disks too (multiplied by 10 of course)?  Will they 
 keep up with a 10G rate for bringing a new node online?

 This all comes into play even more so when you want to double the size of 
 your cluster of course as all nodes have to transfer half of what they have 
 to all the new nodes that come online(cassandra actually has a very data 
 center/rack aware topology to transfer data correctly to not use up all 
 bandwidth unecessarily…I am not sure mongodb has that).  Anyways, just food 
 for thought.

 From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Monday, February 18, 2013 1:39 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org, Vegard Berget 
 p...@fantasista.nomailto:p...@fantasista.no
 Subject: Re: cassandra vs. mongodb quick question

 My experience is repair of 300GB compressed data takes longer than 300GB of 
 uncompressed, but I cannot point to an exact number. Calculating the 
 differences is mostly CPU bound and works on the non compressed data.

 Streaming uses compression (after uncompressing the on disk data).

 So if you have 300GB of compressed data, take a look at how long repair takes 
 and see if you are comfortable with that. You may also want to test replacing 
 a node so you can get the procedure documented and understand how long it 
 takes.

 The idea of the soft 300GB to 500GB limit cam about because of a number of 
 cases where people had 1 TB on a single node and they were surprised it took 
 days to repair or replace. If you know how long things may take, and that 
 fits in your operations then go with it.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 18/02/2013, at 10:08 PM, Vegard Berget 
 p...@fantasista.nomailto:p...@fantasista.no wrote:



 Just out of curiosity :

 When using compression, does this affect this one way or another?  Is 300G 
 (compressed) SSTable size, or total size of data?

 .vegard,

 - Original Message -
 From:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org

 To:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Cc:

 Sent:
 Mon, 18 Feb 2013 08:41:25 +1300
 Subject:
 Re: cassandra vs. mongodb quick question


 If you have spinning disk and 1G networking and no virtual nodes, I would 
 still say 300G to 500G is a soft limit.

 If you are using virtual nodes, SSD, JBOD disk configuration or faster 
 networking you may go higher.

 The limiting factors are the time it take to repair, the time it takes to 
 replace a node, the memory considerations for 100's of millions of rows. If 
 you the performance of those operations is acceptable to you, then go crazy.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.comhttp://www.thelastpickle.com/

 On 16/02/2013, at 9:05 AM, 

Re: JNA not found.

2013-02-19 Thread Tim Dunphy
Hey Guys,

 I just wanted to follow up on this thread on how I go JNA to work with the
cassandra 1.2.1 tarball I downloaded.

On CentOS I went :

[root@cassandra-node01 ~]# yum provides */jna.jar

...

jna-3.4.0-4.el5.x86_64 : Pure Java access to native libraries
Repo: epel
Matched from:
Filename: /usr/share/java/jna.jar


[root@cassandra-node01 ~]#  yum install jna-3.4.0-4.el5.x86_64

[root@cassandra-node01 ~]# ln -s /usr/share/java/jna.jar
/usr/local/apache-cassandra-1.2.1/lib/jna.jar


Now when I start Cassandra I see this message:

INFO 10:11:11,852 JNA mlockall successful


That's a win! Can't for the life of me figure out why Cassandra 1.2 was
refusing to recognize the downloaded jna.jar file in it's lib directory.
But the above trick seems to work every time.

Thanks for all your input.

Tim







On Tue, Jan 29, 2013 at 10:18 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hi Chandra,

  I'm using Cassandra 1.2.1 and jna/platform 3.5.1.

  One thing I should mention is that I tried putting the jar files into my
 java jre/lib directory. The theory being those jars would be available to
 all java apps. In that case Cassandra will start but still not recognize
 JNA. If I copy the jars to the cassandra/lib directory, I have the same
 crashing issue. Even if I symlink from the jre/lib directory to the
 cassandra/lib directory the same issue occurs. It's like this version of
 Cassandra can't stand having the jna jar in it's lib directory. I'm
 beginning to wonder if if anyone has gotten JNA to work with this version
 of cassandra and if so how. I've only tried a tarball install so far, I
 can't say about the package install which may well work.

 Thanks
 Tim


 On Tue, Jan 29, 2013 at 10:07 PM, chandra Varahala 
 hadoopandcassan...@gmail.com wrote:

 we had this issue before, but after adding those two jar the  error gone.
 We used 1.0.8 cassandra (JNA 3.3.0, JNA platform. 3.3.0).  what version
  cassnadra  you are using ?

 -chandra


 On Tue, Jan 29, 2013 at 12:19 PM, Tim Dunphy bluethu...@gmail.comwrote:

 Hi Chandra,

 Thanks for your reply. Well I have added both jna.jar and platform.jar
 to my lib directory (jna 3.3.0):

 [root@cassandra-node01 cassandrahome]# ls -l lib/jna.jar
 lib/platform.jar
 -rw-r--r-- 1 root root 865400 Jan 29 12:14 lib/jna.jar
 -rw-r--r-- 1 root root 841291 Jan 29 12:14 lib/platform.jar

 But sadly I get the same result:


 [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
 xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
 -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
   INFO 12:14:52,493 Logging initialized
  INFO 12:14:52,507 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
 VM/1.6.0_34
  INFO 12:14:52,507 Heap size: 301727744/302776320
  INFO 12:14:52,508 Classpath:
 /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-sources-fixes.jar:/etc/alternatives/cassandrahome/lib/commons-cli-1.1.jar:/etc/alternatives/cassandrahome/lib/commons-codec-1.2.jar:/etc/alternatives/cassandrahome/lib/commons-lang-2.6.jar:/etc/alternatives/cassandrahome/lib/compress-lzf-0.8.4.jar:/etc/alternatives/cassandrahome/lib/concurrentlinkedhashmap-lru-1.3.jar:/etc/alternatives/cassandrahome/lib/guava-13.0.1.jar:/etc/alternatives/cassandrahome/lib/high-scale-lib-1.1.2.jar:/etc/alternatives/cassandrahome/lib/jackson-core-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jackson-mapper-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar:/etc/alternatives/cassandrahome/lib/jline-1.0.jar:/etc/alternatives/cassandrahome/lib/jna.jar:/etc/alternatives/cassandrahome/lib/json-simple-1.1.jar:/etc/alternatives/cassandrahome/lib/libthrift-0.7.0.jar:/etc/alternatives/cassandrahome/lib/log4j-1.2.16.jar:/etc/alternatives/cassandrahome/lib/metrics-core-2.0.3.jar:/etc/alternatives/cassandrahome/lib/netty-3.5.9.Final.jar:/etc/alternatives/cassandrahome/lib/platform.jar:/etc/alternatives/cassandrahome/lib/servlet-api-2.5-20081211.jar:/etc/alternatives/cassandrahome/lib/slf4j-api-1.7.2.jar:/etc/alternatives/cassandrahome/lib/slf4j-log4j12-1.7.2.jar:/etc/alternatives/cassandrahome/lib/snakeyaml-1.6.jar:/etc/alternatives/cassandrahome/lib/snappy-java-1.0.4.1.jar:/etc/alternatives/cassandrahome/lib/snaptree-0.1.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
 Killed

 And still when I remove those library files cassandra starts without a
 problem exception the fact that it is not able to use JNA.

 I'd appreciate any input 

Re: JNA not found.

2013-02-19 Thread Edward Capriolo
based on your mount/selinux settings sometimes the os is unwilling to
tolerate so files outside certain directories.

Edward

On Tue, Feb 19, 2013 at 10:13 AM, Tim Dunphy bluethu...@gmail.com wrote:
 Hey Guys,

  I just wanted to follow up on this thread on how I go JNA to work with the
 cassandra 1.2.1 tarball I downloaded.

 On CentOS I went :

 [root@cassandra-node01 ~]# yum provides */jna.jar

 ...

 jna-3.4.0-4.el5.x86_64 : Pure Java access to native libraries
 Repo: epel
 Matched from:
 Filename: /usr/share/java/jna.jar


 [root@cassandra-node01 ~]#  yum install jna-3.4.0-4.el5.x86_64

 [root@cassandra-node01 ~]# ln -s /usr/share/java/jna.jar
 /usr/local/apache-cassandra-1.2.1/lib/jna.jar


 Now when I start Cassandra I see this message:

 INFO 10:11:11,852 JNA mlockall successful


 That's a win! Can't for the life of me figure out why Cassandra 1.2 was
 refusing to recognize the downloaded jna.jar file in it's lib directory. But
 the above trick seems to work every time.

 Thanks for all your input.

 Tim







 On Tue, Jan 29, 2013 at 10:18 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hi Chandra,

  I'm using Cassandra 1.2.1 and jna/platform 3.5.1.

  One thing I should mention is that I tried putting the jar files into my
 java jre/lib directory. The theory being those jars would be available to
 all java apps. In that case Cassandra will start but still not recognize
 JNA. If I copy the jars to the cassandra/lib directory, I have the same
 crashing issue. Even if I symlink from the jre/lib directory to the
 cassandra/lib directory the same issue occurs. It's like this version of
 Cassandra can't stand having the jna jar in it's lib directory. I'm
 beginning to wonder if if anyone has gotten JNA to work with this version of
 cassandra and if so how. I've only tried a tarball install so far, I can't
 say about the package install which may well work.

 Thanks
 Tim


 On Tue, Jan 29, 2013 at 10:07 PM, chandra Varahala
 hadoopandcassan...@gmail.com wrote:

 we had this issue before, but after adding those two jar the  error gone.
 We used 1.0.8 cassandra (JNA 3.3.0, JNA platform. 3.3.0).  what version
 cassnadra  you are using ?

 -chandra


 On Tue, Jan 29, 2013 at 12:19 PM, Tim Dunphy bluethu...@gmail.com
 wrote:

 Hi Chandra,

 Thanks for your reply. Well I have added both jna.jar and platform.jar
 to my lib directory (jna 3.3.0):

 [root@cassandra-node01 cassandrahome]# ls -l lib/jna.jar
 lib/platform.jar
 -rw-r--r-- 1 root root 865400 Jan 29 12:14 lib/jna.jar
 -rw-r--r-- 1 root root 841291 Jan 29 12:14 lib/platform.jar

 But sadly I get the same result:


 [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
 xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
 -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
  INFO 12:14:52,493 Logging initialized
  INFO 12:14:52,507 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
 VM/1.6.0_34
  INFO 12:14:52,507 Heap size: 301727744/302776320
  INFO 12:14:52,508 Classpath:
 

Re: Long running nodetool repair

2013-02-19 Thread Michael Kjellman
This is very normal (unfortunately). Are you doing a repair –pr or a straight 
up repair?

Does nodetool netstats show anything? I frequently see repair hang in 1.2.1, 
and I haven't been able to figure out why yet though. Feel free to take a stack 
dump with jstack on the node doing the repair and see if there are any 
deadlocks potentially occurring after the merkel tree's are received.

And to help more, do you have the last logs after AntiEntrophy? Any streaming 
sessions from other nodes?

Bug is being tracked here: https://issues.apache.org/jira/browse/CASSANDRA-5146

Best,
Michael

From: Haithem Jarraya 
haithem.jarr...@struq.commailto:haithem.jarr...@struq.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, February 19, 2013 1:29 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Long running nodetool repair

Hi,

I am new to Cassandra and I am not sure if this is the normal behavior but 
nodetool repair runs for too long even for small dataset per node. As I am 
writing I started a nodetool repair last night at 18:41 and now it's 9:18 and 
it's still running, the size of my data is only ~500mb per node.
We have
3 Node cluster in DC1 with RF 3
1 Node Cluster in DC2 with RF 1
1 Node cluster in DC3 with RF 1

and running Cassandra V1.2.1 with 256 vNodes.

From cassandra logs I do not see AntiEntropy logs anymore only compaction Task 
and FlushWriter.

Is this a normal behaviour of nodetool repair?
Is the running time grow linearly with the size of the data?

Any help or direction will be much appreciated.


Thanks,

H


Re: Testing compaction strategies on a single production server?

2013-02-19 Thread Henrik Schröder
Well, that answer didn't really help. I know how to make a survey node, and
I know how to simulate reads to it, it's just that that's a lot of work,
and I wouldn't be sure that the simulated load is the same as the
production load.

We gather a lot of metrics from our production servers, so we know exactly
how they perform over long periods of time. Changing a single server to run
a different compaction strategy would allow us to know in detail how a
different strategy would impact the cluster.

So, is it possible to modify org.apache.cassandra.db.[keyspace].[column
family].CompactionStrategyClass through jmx on a production server without
any ill effects? Or is this only possible to do on a survey node while it
is in a specific state?


/Henrik


On Tue, Feb 19, 2013 at 3:09 PM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

  Just turn off dynamic snitch on survey node and make read requests from
 it directly with CL.ONE, watch histograms, compare.

 ** **

 Regarding switching compaction strategy there’re a lot of info already.***
 *

 ** **

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* Henrik Schröder [mailto:skro...@gmail.com]
 *Sent:* Tuesday, February 19, 2013 15:57
 *To:* user
 *Subject:* Testing compaction strategies on a single production server?***
 *

 ** **

 Hey,


 Version 1.1 of Cassandra introduced live traffic sampling, which allows
 you to measure the performance of a node without it really joining the
 cluster:
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling
 

 That page mentions that you can change the compaction strategy through jmx
 if you want to test out a different strategy on your survey node.

 That's great, but it doesn't give you a complete view of how your
 performance would change, since you're not doing reads from the survey
 node. But what would happen if you used jmx to change the compaction
 strategy of a column family on a single *production* node? Would that be a
 safe way to test it out or are there side-effects of doing that live?

 And if you do that, would running a major compaction transform the entire
 column family to the new format?

 Finally, if the test was a success, how do you proceed from there? Just
 change the schema?

 

 /Henrik

signature-best-employer-logo6784.pngsignature-logo18be.png

Re: cassandra vs. mongodb quick question(good additional info)

2013-02-19 Thread Wei Zhu
From my limited experience with Mongo, it seems that Mongo only performs when 
the whole data set is in the memory which makes me wonder how the 40TB data 
works..

- Original Message -
From: Edward Capriolo edlinuxg...@gmail.com
To: user@cassandra.apache.org
Sent: Tuesday, February 19, 2013 7:02:56 AM
Subject: Re: cassandra vs. mongodb quick question(good additional info)

The 40 TB use case you heard about is probably one 40TB mysql machine
that someone migrated to mongo so it would be web scale Cassandra is
NOT good with drives that big, get a blade center or a high density
chassis.

On Mon, Feb 18, 2013 at 8:00 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
 I thought about this more, and even with a 10Gbit network, it would take 40 
 days to bring up a replacement node if mongodb did truly have a 42T / node 
 like I had heard.  I wrote the below email to the person I heard this from 
 going back to basics which really puts some perspective on it….(and a lot of 
 people don't even have a 10Gbit network like we do)

 Nodes are hooked up by a 10G network at most right now where that is 
 10gigabit.  We are talking about 10Terabytes on disk per node recently.

 Google 10 gigabit in gigabytes gives me 1.25 gigabytes/second  (yes I could 
 have divided by 8 in my head but eh…course when I saw the number, I went duh)

 So trying to transfer 10 Terabytes  or 10,000 Gigabytes to a node that we are 
 bringing online to replace a dead node would take approximately 5 days???

 This means no one else is using the bandwidth too ;).  10,000Gigabytes * 1 
 second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.55 days.  This is more 
 likely 11 days if we only use 50% of the network.

 So bringing a new node up to speed is more like 11 days once it is crashed.  
 I think this is the main reason the 1Terabyte exists to begin with, right?

 From an ops perspective, this could sound like a nightmare scenario of 
 waiting 10 days…..maybe it is livable though.  Either way, I thought it would 
 be good to share the numbers.  ALSO, that is assuming the bus with it's 10 
 disk can keep up with 10G  Can it?  What is the limit of throughput on a 
 bus / second on the computers we have as on wikipedia there is a huge 
 variance?

 What is the rate of the disks too (multiplied by 10 of course)?  Will they 
 keep up with a 10G rate for bringing a new node online?

 This all comes into play even more so when you want to double the size of 
 your cluster of course as all nodes have to transfer half of what they have 
 to all the new nodes that come online(cassandra actually has a very data 
 center/rack aware topology to transfer data correctly to not use up all 
 bandwidth unecessarily…I am not sure mongodb has that).  Anyways, just food 
 for thought.

 From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Monday, February 18, 2013 1:39 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org, Vegard Berget 
 p...@fantasista.nomailto:p...@fantasista.no
 Subject: Re: cassandra vs. mongodb quick question

 My experience is repair of 300GB compressed data takes longer than 300GB of 
 uncompressed, but I cannot point to an exact number. Calculating the 
 differences is mostly CPU bound and works on the non compressed data.

 Streaming uses compression (after uncompressing the on disk data).

 So if you have 300GB of compressed data, take a look at how long repair takes 
 and see if you are comfortable with that. You may also want to test replacing 
 a node so you can get the procedure documented and understand how long it 
 takes.

 The idea of the soft 300GB to 500GB limit cam about because of a number of 
 cases where people had 1 TB on a single node and they were surprised it took 
 days to repair or replace. If you know how long things may take, and that 
 fits in your operations then go with it.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 18/02/2013, at 10:08 PM, Vegard Berget 
 p...@fantasista.nomailto:p...@fantasista.no wrote:



 Just out of curiosity :

 When using compression, does this affect this one way or another?  Is 300G 
 (compressed) SSTable size, or total size of data?

 .vegard,

 - Original Message -
 From:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org

 To:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Cc:

 Sent:
 Mon, 18 Feb 2013 08:41:25 +1300
 Subject:
 Re: cassandra vs. mongodb quick question


 If you have spinning disk and 1G networking and no virtual nodes, I would 
 still say 300G to 500G is a soft limit.

 If you are using virtual nodes, SSD, JBOD disk configuration or faster 
 networking you may go higher.

 The limiting factors are the time it take 

Re: Long running nodetool repair

2013-02-19 Thread Wei Zhu
It should not take that long. For my 200G node, it takes about an hour to 
calculate the Merkle tree and then data streaming. 

By the way, how do you know the repair is not done?

If you run nodetool tpstats, it should give you the  AntiEntropy session info, 
active/pending/completed etc. While calculating Merkle tree, you can see the 
progress from nodetool compactionstats. While streaming data, you can see the 
progress from nodetool netstats.

Also you can grep the log by Merkle and repair.



- Original Message -
From: Haithem Jarraya haithem.jarr...@struq.com
To: user@cassandra.apache.org
Sent: Tuesday, February 19, 2013 1:29:19 AM
Subject: Long running nodetool repair


Hi, 


I am new to Cassandra and I am not sure if this is the normal behavior but 
nodetool repair runs for too long even for small dataset per node. As I am 
writing I started a nodetool repair last night at 18:41 and now it's 9:18 and 
it's still running, the size of my data is only ~500mb per node. 
We have 
3 Node cluster in DC1 with RF 3 
1 Node Cluster in DC2 with RF 1 
1 Node cluster in DC3 with RF 1 


and running Cassandra V1.2.1 with 256 vNodes. 


From cassandra logs I do not see AntiEntropy logs anymore only compaction Task 
and FlushWriter. 


Is this a normal behaviour of nodetool repair? 
Is the running time grow linearly with the size of the data? 


Any help or direction will be much appreciated. 




Thanks, 


H 


unsubscribe

2013-02-19 Thread Anurag Gujral
Unsubscribe me please.
Thanks
A


Re: unsubscribe

2013-02-19 Thread Alain RODRIGUEZ
Read the message you answered to, and help yourself !

Alain


2013/2/19 Anurag Gujral anurag.guj...@gmail.com


 Unsubscribe me please.
 Thanks
 A




RE: Question on Cassandra Snapshot

2013-02-19 Thread S C
Thank you Aaron.

From: aa...@thelastpickle.com
Subject: Re: Question on Cassandra Snapshot
Date: Mon, 18 Feb 2013 06:37:34 +1300
To: user@cassandra.apache.org

With incremental_backup turned OFF in Cassandra.yaml - Are all SSTables are 
under /data/TestKeySpace/ColumnFamily at all times?No. They are deleted when 
they are compacted and no internal operations are referencing them. 
With incremental_backup turned ON in cassandra.yaml - Are current SSTables 
under /data/TestKeySpace/ColumnFamily/ with a hardlink to 
/data/TestKeySpace/ColumnFamily/backups? Yes, sort of. *All* SSTables ever 
created are in the backups directory. Not just the ones currently live.
Lets say I have taken snapshot and moved the 
/data/TestKeySpace/ColumnFamily/snapshots/snapshot-name/*.db to tape, at what 
point should I be backing up *.db files from 
/data/TestKeySpace/ColumnFamily/backups directory. Also, should I be deleting 
the *.db files whose inode matches with the files in the snapshot? Is that a 
correct approach? Backup all files in the snapshots. There may be non .db 
extensions files if you use levelled compactionsWhen you are finished with the 
snapshot delete it. If the inode is not longer referenced from the live data 
dir it will be deleted. 
I noticed /data/TestKeySpace/ColumnFamily/snapshots/timestamp-ColumnFamily/ 
what are these timestamp directories?Probably automatic snapshot from 
dropping KS or CF's
Cheers

-Aaron MortonFreelance Cassandra DeveloperNew Zealand
@aaronmortonhttp://www.thelastpickle.com



On 16/02/2013, at 4:41 AM, S C as...@outlook.com wrote:I appreciate any 
advise or pointers on this.
Thanks in advance.

From: as...@outlook.com
To: user@cassandra.apache.org
Subject: Question on Cassandra Snapshot
Date: Thu, 14 Feb 2013 20:47:14 -0600

I have been looking at incremental backups and snapshots. I have done some 
experimentation but could not come to a conclusion. Can somebody please help me 
understanding it right?
/data is my data partition
With incremental_backup turned OFF in Cassandra.yaml - Are all SSTables are 
under /data/TestKeySpace/ColumnFamily at all times?With incremental_backup 
turned ON in cassandra.yaml - Are current SSTables under 
/data/TestKeySpace/ColumnFamily/ with a hardlink to 
/data/TestKeySpace/ColumnFamily/backups? Lets say I have taken snapshot and 
moved the /data/TestKeySpace/ColumnFamily/snapshots/snapshot-name/*.db to 
tape, at what point should I be backing up *.db files from 
/data/TestKeySpace/ColumnFamily/backups directory. Also, should I be deleting 
the *.db files whose inode matches with the files in the snapshot? Is that a 
correct approach? I noticed 
/data/TestKeySpace/ColumnFamily/snapshots/timestamp-ColumnFamily/ what are 
these timestamp directories?
Thanks in advance. SC
  

Cassandra network latency tuning

2013-02-19 Thread Brandon Walsh
I have a 5 node cluster and currently running ver 1.2. Prior to full scale 
deployment, I'm running some benchmarks  using YCSB. From a hadoop cluster 
deployment we saw an excellent improvement using higher speed networks. However 
Cassandra does not include network latencies and I would like to understand how 
we can capture network latencies between a 1GbE and 10GbE for ex. As of now all 
the graphs look the same. We will soon be adding SSD's and was wondering how 
Cassandra can utilize the 10GbE and the SSD's and if there are specific tuning 
that is required.


How to limit query results like from row 50 to 100

2013-02-19 Thread Mateus Ferreira e Freitas




With CQL or an API.   

Re: Mutation dropped

2013-02-19 Thread aaron morton
 Does the rpc_timeout not control the client timeout ?
No it is how long a node will wait for a response from other nodes before 
raising a TimedOutException if less than CL nodes have responded. 
Set the client side socket timeout using your preferred client. 

 Is there any param which is configurable to control the replication timeout 
 between nodes ?
There is no such thing.
rpc_timeout is roughly like that, but it's not right to think about it that 
way. 
i.e. if a message to a replica times out and CL nodes have already responded 
then we are happy to call the request complete. 

Cheers

 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/02/2013, at 1:48 AM, Kanwar Sangha kan...@mavenir.com wrote:

 Thanks Aaron.
  
 Does the rpc_timeout not control the client timeout ? Is there any param 
 which is configurable to control the replication timeout between nodes ? Or 
 the same param is used to control that since the other node is also like a 
 client ?
  
  
  
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: 17 February 2013 11:26
 To: user@cassandra.apache.org
 Subject: Re: Mutation dropped
  
 You are hitting the maximum throughput on the cluster. 
  
 The messages are dropped because the node fails to start processing them 
 before rpc_timeout. 
  
 However the request is still a success because the client requested CL was 
 achieved. 
  
 Testing with RF 2 and CL 1 really just tests the disks on one local machine. 
 Both nodes replicate each row, and writes are sent to each replica, so the 
 only thing the client is waiting on is the local node to write to it's commit 
 log. 
  
 Testing with (and running in prod) RF3 and CL QUROUM is a more real world 
 scenario. 
  
 Cheers
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
  
 @aaronmorton
 http://www.thelastpickle.com
  
 On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.com wrote:
 
 
 Hi – Is there a parameter which can be tuned to prevent the mutations from 
 being dropped ? Is this logic correct ?
  
 Node A and B with RF=2, CL =1. Load balanced between the two.
  
 --  Address   Load   Tokens  Owns (effective)  Host ID
Rack
 UN  10.x.x.x   746.78 GB  256 100.0%
 dbc9e539-f735-4b0b-8067-b97a85522a1a  rack1
 UN  10.x.x.x   880.77 GB  256 100.0%
 95d59054-be99-455f-90d1-f43981d3d778  rack1
  
 Once we hit a very high TPS (around 50k/sec of inserts), the nodes start 
 falling behind and we see the mutation dropped messages. But there are no 
 failures on the client. Does that mean other node is not able to persist the 
 replicated data ? Is there some timeout associated with replicated data 
 persistence ?
  
 Thanks,
 Kanwar
  
  
  
  
  
  
  
 From: Kanwar Sangha [mailto:kan...@mavenir.com] 
 Sent: 14 February 2013 09:08
 To: user@cassandra.apache.org
 Subject: Mutation dropped
  
 Hi – I am doing a load test using YCSB across 2 nodes in a cluster and seeing 
 a lot of mutation dropped messages.  I understand that this is due to the 
 replica not being written to the
 other node ? RF = 2, CL =1.
  
 From the wiki -
 For MUTATION messages this means that the mutation was not applied to all 
 replicas it was sent to. The inconsistency will be repaired by Read Repair or 
 Anti Entropy Repair
  
 Thanks,
 Kanwar