date:20110722

Re: Stress test using Java-based stress utility

2011-07-22 Thread Kirk True


  
  
Have you checked the logs on the nodes to see if there are any
errors?

On 7/21/11 10:43 PM, Nilabja Banerjee wrote:
Hi All,
  
  I am following this following link " http://www.datastax.com/docs/0.7/utilities/stress_java
  " for a stress test. I am getting this notification after
  running this command 
  
  xxx.xxx.xxx.xx= my ip
  contrib/stress/bin/stress
-d xxx.xxx.xxx.xx
  Created keyspaces. Sleeping 1s
for propagation.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
Operation [44] retried 10 times - error inserting key
044 ((UnavailableException))

Operation [49] retried 10 times - error inserting key
049 ((UnavailableException))

Operation [7] retried 10 times - error inserting key 007
((UnavailableException))

Operation [6] retried 10 times - error inserting key 006
((UnavailableException))
  
  
  
  Any idea why I am getting these
things?
  
  
  Thank You
  
  
  
  



-- 
  Kirk True
  
  Founder, Principal Engineer
  
  
  
  
  
  Expert Engineering Firepower
  
  
  About us:

Re: b-tree

2011-07-22 Thread Eldad Yamin

In order order to split the nodes.
SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if
the number is 1,000 they split the node.
In order to avoid that more then 1 process will edit/split the node -
transaction is needed.
On Jul 22, 2011 1:01 AM, aaron morton aa...@thelastpickle.com wrote:
 But how will you be able to maintain it while it evolves and new data is
added without transactions?

 What is the situation you think you need transactions for ?

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22 Jul 2011, at 00:06, Eldad Yamin wrote:

 Aaron,
 Nested set is exactly what I had in mind.
 But how will you be able to maintain it while it evolves and new data is
added without transactions?

 Thanks!

 On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.com
wrote:
 Just throwing out a (half baked) idea, perhaps the Nested Set Model of
trees would work http://en.wikipedia.org/wiki/Nested_set_model

 * Ever row would represent a set with a left and right encoded into the
key
 * Members are inserted as columns into *every* set / row they are a
member. So we are de-normalising and trading space for time.
 * May need to maintain a custom secondary index of the materialised sets.
e.g. slice a row to get the first column = the left value you are
interested in, that is the key for the set.

 I've not thought it through much further than that, a lot would depend on
your data. The top sets may get very big, .

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote:

 Im not sure if I have an answer for you, anyway, but I'm curious

 A b-tree and a binary tree are not the same thing. A binary tree is a
basic fundamental data structure, A b-tree is an approach to storing and
indexing data on disc for a database.

 Which do you mean?

 On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote:
 Hello,
 Is there any good way of storing a binary-tree in Cassandra?
 I wonder if someone already implement something like that and how
accomplished that without transaction supports (while the tree keep
evolving)?

 I'm asking that becouse I want to save geospatial-data, and SimpleGeo
did it using b-tree:
 http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php

 Thanks!



 --
 It's always darkest just before you are eaten by a grue.

Re: cassandra fatal error when compaction

2011-07-22 Thread lebron james

ERROR [pool-2-thread-3] 2011-07-22 10:34:59,102 Cassandra.java (line 3294)
Internal error processing insert
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
at
org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360)
at
org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241)
at
org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62)
at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99)
at
org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210)
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154)
at
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560)
at
org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436)
at
org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444)
at
org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
ERROR [pool-2-thread-6] 2011-07-22 10:34:59,102 Cassandra.java (line 3294)
Internal error processing insert
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
at
org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360)
at
org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241)
at
org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62)
at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99)
at
org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210)
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154)
at
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560)
at
org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436)
at
org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444)
at
org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
ERROR [pool-2-thread-3] 2011-07-22 10:34:59,102 Cassandra.java (line 3294)
Internal error processing insert
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
at
org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360)
at
org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241)
at
org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62)
at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99)
at
org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210)
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154)
at
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560)
at
org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436)
at
org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444)
at
org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

eliminate need to repair by using column TTL??

2011-07-22 Thread jonathan . colby

One of the main reasons for regularly running repair is to make sure  
deletes are propagated in the cluster, ie, data is not resurrected if a  
node never received the delete call.


And repair-on-read takes care of repairing inconsistencies on-the-fly.

So if I were to set a universal TTL on all columns - so everything would  
only live for a certain age, would I be able to get away without having to  
do regular repairs with nodetool?


I realize this scenario would not be applicable for everyone, but our data  
model would allow us to do this.


So could this be an alternative to running the (resource-intensive,  
long-running) repairs with nodetool?


Thanks.

Re: Stress test using Java-based stress utility

2011-07-22 Thread aaron morton

UnavailableException is raised server side when there is less than CL nodes UP 
when the request starts. 

It seems odd to get it in this case because the default replication factor used 
by stress test is 1. How many nodes do you have and have you made any changes 
to the RF ?

Also check the server side logs as Kirk says. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 18:37, Kirk True wrote:

 Have you checked the logs on the nodes to see if there are any errors?
 
 On 7/21/11 10:43 PM, Nilabja Banerjee wrote:
 
 Hi All,
 
 I am following this following link  
 http://www.datastax.com/docs/0.7/utilities/stress_java  for a stress test. 
 I am getting this notification after running this command 
 
 xxx.xxx.xxx.xx= my ip
 contrib/stress/bin/stress -d xxx.xxx.xxx.xx
 
 Created keyspaces. Sleeping 1s for propagation.
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 Operation [44] retried 10 times - error inserting key 044 
 ((UnavailableException))
 
 Operation [49] retried 10 times - error inserting key 049 
 ((UnavailableException))
 
 Operation [7] retried 10 times - error inserting key 007 
 ((UnavailableException))
 
 Operation [6] retried 10 times - error inserting key 006 
 ((UnavailableException))
 
 
 Any idea why I am getting these things?
 
 
 Thank You
 
 
 
 
 
 -- 
 Kirk True 
 Founder, Principal Engineer 
 
 mustardgrain.gif 
 
 Expert Engineering Firepower 
 
 About us: twitter.gif linkedin.gif

Re: b-tree

2011-07-22 Thread aaron morton

You can use something like Zoo Keeper to coordinate processes doing page splits.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 19:05, Eldad Yamin wrote:

 In order order to split the nodes.
 SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if the 
 number is 1,000 they split the node.
 In order to avoid that more then 1 process will edit/split the node - 
 transaction is needed.
 
 On Jul 22, 2011 1:01 AM, aaron morton aa...@thelastpickle.com wrote:
  But how will you be able to maintain it while it evolves and new data is 
  added without transactions?
  
  What is the situation you think you need transactions for ?
  
  Cheers
  
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
  
  On 22 Jul 2011, at 00:06, Eldad Yamin wrote:
  
  Aaron,
  Nested set is exactly what I had in mind.
  But how will you be able to maintain it while it evolves and new data is 
  added without transactions?
  
  Thanks!
  
  On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.com 
  wrote:
  Just throwing out a (half baked) idea, perhaps the Nested Set Model of 
  trees would work http://en.wikipedia.org/wiki/Nested_set_model
  
  * Ever row would represent a set with a left and right encoded into the key
  * Members are inserted as columns into *every* set / row they are a 
  member. So we are de-normalising and trading space for time. 
  * May need to maintain a custom secondary index of the materialised sets. 
  e.g. slice a row to get the first column = the left value you are 
  interested in, that is the key for the set. 
  
  I've not thought it through much further than that, a lot would depend on 
  your data. The top sets may get very big, . 
  
  Cheers
  
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
  
  On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote:
  
  Im not sure if I have an answer for you, anyway, but I'm curious
  
  A b-tree and a binary tree are not the same thing. A binary tree is a 
  basic fundamental data structure, A b-tree is an approach to storing and 
  indexing data on disc for a database.
  
  Which do you mean?
  
  On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote:
  Hello,
  Is there any good way of storing a binary-tree in Cassandra?
  I wonder if someone already implement something like that and how 
  accomplished that without transaction supports (while the tree keep 
  evolving)?
  
  I'm asking that becouse I want to save geospatial-data, and SimpleGeo did 
  it using b-tree:
  http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
  
  Thanks!
  
  
  
  -- 
  It's always darkest just before you are eaten by a grue.

Re: cassandra fatal error when compaction

2011-07-22 Thread aaron morton

Something has shutdown the mutation stage thread pool. This happens during 
drain or decommission / move. 

Restart the service and it should be ok.

if it happens again without anyone running something like drain, decommission 
or move let us know. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 19:41, lebron james wrote:

 ERROR [pool-2-thread-3] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) 
 Internal error processing insert
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
 at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
 at 
 org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360)
 at 
 org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241)
 at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62)
 at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99)
 at 
 org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210)
 at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154)
 at 
 org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560)
 at 
 org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436)
 at 
 org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
 at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 ERROR [pool-2-thread-6] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) 
 Internal error processing insert
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
 at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
 at 
 org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360)
 at 
 org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241)
 at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62)
 at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99)
 at 
 org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210)
 at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154)
 at 
 org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560)
 at 
 org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436)
 at 
 org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
 at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 ERROR [pool-2-thread-3] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) 
 Internal error processing insert
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
 at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
 at 
 org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360)
 at 
 org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241)
 at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62)
 at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99)
 at 
 org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210)
 at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154)
 at 
 org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560)
 at 
 org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436)
 at 
 org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444)
 at

Re: cassandra fatal error when compaction

2011-07-22 Thread lebron james

it happend again i turn off compaction by setting max and min compaction
tresholds to zero, and run, 5 threads of inserts, after base reach 27GB size
cassandra fall with same error. OS Windows Server 2008 datacenter, JVM have
1.5 GB heap. cassandra version 0.8.1 all parameters in conf file are
default.

On Fri, Jul 22, 2011 at 12:18 PM, aaron morton aa...@thelastpickle.comwrote:

 Something has shutdown the mutation stage thread pool. This happens during
 drain or decommission / move.

 Restart the service and it should be ok.

 if it happens again without anyone running something like drain,
 decommission or move let us know.

 Cheers

Re: eliminate need to repair by using column TTL??

2011-07-22 Thread aaron morton

Read repair will only repair data that is read on the nodes that are up at that 
time, and does not guarantee that any changes it detects will be written back 
to the nodes. The diff mutations are async fire and forget messages which may 
go missing or be dropped or ignored by the recipient just like any other 
message. 

Also getting hit with a bunch of read repair operations is pretty painful. The 
normal read runs, the coordinator detects the digest mis-match, the read runs 
again from all nodes and they all have to return their full data (no digests 
this time), the coordinator detects the diffs, mutations are sent back to each 
node that needs them. All this happens sync to the read request when the CL  
ONE. Thats 2 reads with more network IO and up to RF mutations . 

The delete thing is important but repair also reduces the chance of reads 
getting hit with RR and gives me confidence when it's necessary to nuke a bad 
node. 

Your plan may work but it feels risky to me. You may end up with worse read 
performance and unpleasent emotions if you ever have to nuke a node. Others may 
disagree. 

Not ignoring the fact the repair can take a long time, fail, hurt performance 
etc. There are plans to improve it though. 

Cheers
  
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 19:55, jonathan.co...@gmail.com wrote:

 One of the main reasons for regularly running repair is to make sure deletes 
 are propagated in the cluster, i.e., data is not resurrected if a node never 
 received the delete call.
 
 And repair-on-read takes care of repairing inconsistencies on-the-fly.
 
 So if I were to set a universal TTL on all columns - so everything would only 
 live for a certain age, would I be able to get away without having to do 
 regular repairs with nodetool?
 
 I realize this scenario would not be applicable for everyone, but our data 
 model would allow us to do this. 
 
 So could this be an alternative to running the (resource-intensive, 
 long-running) repairs with nodetool?
 
 Thanks.

Re: Stress test using Java-based stress utility

2011-07-22 Thread Nilabja Banerjee

Running only one node. I dnt think it is coming for the replication
factor...  I will try to sort this out Any other suggestions from your
side is always be helpful..

:) Thank you



On 22 July 2011 14:36, aaron morton aa...@thelastpickle.com wrote:

 UnavailableException is raised server side when there is less than CL nodes
 UP when the request starts.

 It seems odd to get it in this case because the default replication factor
 used by stress test is 1. How many nodes do you have and have you made any
 changes to the RF ?

 Also check the server side logs as Kirk says.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22 Jul 2011, at 18:37, Kirk True wrote:

  Have you checked the logs on the nodes to see if there are any errors?

 On 7/21/11 10:43 PM, Nilabja Banerjee wrote:

 Hi All,

 I am following this following link  *
 http://www.datastax.com/docs/0.7/utilities/stress_java * for a stress
 test. I am getting this notification after running this command

 *xxx.xxx.xxx.xx= my ip*

 *contrib/stress/bin/stress -d xxx.xxx.xxx.xx*

 *Created keyspaces. Sleeping 1s for propagation.
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 Operation [44] retried 10 times - error inserting key 044
 ((UnavailableException))

 Operation [49] retried 10 times - error inserting key 049
 ((UnavailableException))

 Operation [7] retried 10 times - error inserting key 007
 ((UnavailableException))

 Operation [6] retried 10 times - error inserting key 006
 ((UnavailableException))
 *


 *Any idea why I am getting these things?*


 *Thank You
 *


 *
 *


 --
 Kirk True
 Founder, Principal Engineer

 mustardgrain.gif http://www.mustardgrain.com/

 *Expert Engineering Firepower*

 About us: twitter.gif http://www.twitter.com/mustardgraininc
 linkedin.gif http://www.linkedin.com/company/mustard-grain-inc.

Re: Re: eliminate need to repair by using column TTL??

2011-07-22 Thread jonathan . colby

good points Aaron. I realize now how expensive repair on reads are. I'm  
going to keep doing repairs regularly but still have a max TTL on all  
columns to make sure we don't have really old data we no longer need  
getting buried in the cluster.


On , aaron morton aa...@thelastpickle.com wrote:
Read repair will only repair data that is read on the nodes that are up  
at that time, and does not guarantee that any changes it detects will be  
written back to the nodes. The diff mutations are async fire and forget  
messages which may go missing or be dropped or ignored by the recipient  
just like any other message.




Also getting hit with a bunch of read repair operations is pretty  
painful. The normal read runs, the coordinator detects the digest  
mis-match, the read runs again from all nodes and they all have to return  
their full data (no digests this time), the coordinator detects the  
diffs, mutations are sent back to each node that needs them. All this  
happens sync to the read request when the CL  ONE. Thats 2 reads with  
more network IO and up to RF mutations .




The delete thing is important but repair also reduces the chance of reads  
getting hit with RR and gives me confidence when it's necessary to nuke a  
bad node.




Your plan may work but it feels risky to me. You may end up with worse  
read performance and unpleasent emotions if you ever have to nuke a node.  
Others may disagree.




Not ignoring the fact the repair can take a long time, fail, hurt  
performance etc. There are plans to improve it though.





Cheers





-



Aaron Morton



Freelance Cassandra Developer



@aaronmorton



http://www.thelastpickle.com





On 22 Jul 2011, at 19:55, jonathan.co...@gmail.com wrote:




 One of the main reasons for regularly running repair is to make sure  
deletes are propagated in the cluster, ie, data is not resurrected if a  
node never received the delete call.







 And repair-on-read takes care of repairing inconsistencies on-the-fly.






 So if I were to set a universal TTL on all columns - so everything  
would only live for a certain age, would I be able to get away without  
having to do regular repairs with nodetool?






 I realize this scenario would not be applicable for everyone, but our  
data model would allow us to do this.






 So could this be an alternative to running the (resource-intensive,  
long-running) repairs with nodetool?







 Thanks.

Re: Stress test using Java-based stress utility

2011-07-22 Thread Jonathan Ellis

What does nodetool ring say?

On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee
nilabja.baner...@gmail.com wrote:
 Hi All,

 I am following this following link 
 http://www.datastax.com/docs/0.7/utilities/stress_java  for a stress test.
 I am getting this notification after running this command

 xxx.xxx.xxx.xx= my ip

 contrib/stress/bin/stress -d xxx.xxx.xxx.xx

 Created keyspaces. Sleeping 1s for propagation.
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 Operation [44] retried 10 times - error inserting key 044
 ((UnavailableException))

 Operation [49] retried 10 times - error inserting key 049
 ((UnavailableException))

 Operation [7] retried 10 times - error inserting key 007
 ((UnavailableException))

 Operation [6] retried 10 times - error inserting key 006
 ((UnavailableException))

 Any idea why I am getting these things?

 Thank You






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Is it safe to stop a read repair and any suggestion on speeding up repairs

2011-07-22 Thread Adi



 Short answer, yes it's safe to kill cassandra during a repair. It's one of
 the nice things about never mutating data.

 Longer answer: If nodetool compactionstats says there are no Validation
 compactions running (and the compaction queue is empty)  and netstats says
 there is nothing streaming there is a a good chance the repair is finished
 or dead. If a neighbour dies during a repair the node it was started on will
 wait for 48 hours(?) until it times out. Check the logs on the machines for
 errors, particularly from the AntiEntropyService. And see what
 compactionstats is saying on all the nodes involved in the repair.


Thanks Aaron. One of the neighboring nodes did go down due to running out of
memory so I will make sure the repair is dead and start it again per column
family.


Even Longer: um, 3 TB of data is *way* to much data per node, generally
 happy people have up to about 200 to 300GB per node. The reason for this
 recommendation is so that things like repair, compaction, node moves, etc
 are managable  and because the loss of a single node has less of an impact.
 I would not recommend running a live system with that much data per node.


 Thanks for the advice and this can be a separate discussion but that will
make a Cassandra cluster way too costly , we would have to buy 16 systems
for the same amount of data as opposed to 4 that we have now and my IT
director will strangle me.

-Adi

Equalizing nodes storage load

2011-07-22 Thread Mina Naguib


Hi everyone

I've been struggling trying to get the data volume (load) to equalize across 
a balanced cluster, and I'm not sure what else I can try.

Background: This was originally a 5-node cluster.  We re-balanced the 3 faster 
machines across the ring, and decommissioned the 2 older ones.  We also 
upgraded cassandra a few times from 0.7.4 through 0.7.5, 0.7.6-2 to 0.7.7.  The 
ring currently looks like so:

Address Status State   LoadOwnsToken
   
   
151236607520417094872610936636341427313 
xx.xx.x.105 Up Normal  41.98 GB33.33%  
37809151880104273718152734159085356828  
xx.xx.x.107 Up Normal  59.4 GB 33.33%  
94522879700260684295381835397713392071  
xx.xx.x.18  Up Normal  74.65 GB33.33%  
151236607520417094872610936636341427313 

What I've tried to far:
1. Running repair on each node (sequentially of course).
2. Running cleanup on the largest node (.18) hoping it would shed 
unneeded data

The repairs helped a bit by, slightly, bumping up the load of the first 2 
machines, but the cleanup on the 3rd failed to reduce its data volume.

So, at this point, I'm out of ideas.  In terms of tpstats metrics, each of the 
3 nodes is serving roughly the same volume of ReadStage and MutationStage, so 
they're balanced in that respect.  However I'm concerned about the imbalance of 
the data load ( 24% / 34% / 42% ) and being unable to equalize it.

For the record, there's only 1 keyspace of meaningful data in the cluster, with 
the following schema settings:
Keyspace: ZZ:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
Options: [DCMTL:2]
  Column Families:
ColumnFamily: AA
  default_validation_class: org.apache.cassandra.db.marshal.UTF8Type
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds: 256000.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 0.88125/1440/188 (millions of ops/minutes/MB)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.1
  Built indexes: []
ColumnFamily: B (Super)
  default_validation_class: org.apache.cassandra.db.marshal.UTF8Type
  Columns sorted by: 
org.apache.cassandra.db.marshal.UTF8Type/org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds: 75000.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 0.88125/1440/188 (millions of ops/minutes/MB)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.25
  Built indexes: []

Any tips or ideas to help get the nodes' load equalized would be highly 
appreciated.  If this is normal behaviour and I shouldn't be trying too hard to 
get it equalized, I'd appreciate any notes/links explaining why.

Thank you.

Counter consistency - are counters idempotent?

2011-07-22 Thread Kenny Yu

As of Cassandra 0.8.1, are counter increments and decrements idempotent? If,
for example, a client sends an increment request and the increment occurs,
but the network subsequently fails and reports a failure to the client, will
Cassandra retry the increment (thus leading to an overcount and inconsistent
data)?

I have done some reading and I am getting conflicting sources about counter
consistency. In this source (
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html),
it states that counters now have the same consistency as regular
columns--does this imply that the above example will not lead to an
overcount?

If counters are not idempotent, are there examples of effective uses of
counters that will prevent inconsistent counts?

Thank you for your help.

Re: Equalizing nodes storage load

2011-07-22 Thread Sasha Dolgy

are you trying to balance load or owns ?  owns looks fine ...
33.33% each ... which to me says balanced.

how did you calculate your tokens?


On Fri, Jul 22, 2011 at 4:37 PM, Mina Naguib
mina.nag...@bloomdigital.com wrote:

 Address         Status State   Load            Owns    Token
 xx.xx.x.105     Up     Normal  41.98 GB        33.33%  
 37809151880104273718152734159085356828
 xx.xx.x.107     Up     Normal  59.4 GB         33.33%  
 94522879700260684295381835397713392071
 xx.xx.x.18      Up     Normal  74.65 GB        33.33%  
 151236607520417094872610936636341427313

Re: Counter consistency - are counters idempotent?

2011-07-22 Thread Sylvain Lebresne

On Fri, Jul 22, 2011 at 4:52 PM, Kenny Yu kenny...@knewton.com wrote:
As of Cassandra 0.8.1, are counter increments and decrements idempotent? If,
for example, a client sends an increment request and the increment occurs,
but the network subsequently fails and reports a failure to the client, will
Cassandra retry the increment (thus leading to an overcount and inconsistent
data)?
I have done some reading and I am getting conflicting sources about counter
consistency. In this source
(http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html),
it states that counters now have the same consistency as regular
columns--does this imply that the above example will not lead to an
overcount?

That email thread was arguably a bit imprecise with its use of the
word 'consistency'
but what it was talking about is really consistency level. That is, counter
supports all the usual consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM,
EACH_QUORUM) excepted ANY.
Counter are still not idempotent. And just a small precision, if you
get a TimeoutException,
Cassandra never retry the increment on it's own (your sentence
suggests it does),
but you won't know in that case if the increment was persisted or not,
and thus you
won't know if you should retry or not. And yes, this is still a
limitation of counters.

If counters are not idempotent, are there examples of effective uses of
counters that will prevent inconsistent counts?
Thank you for your help.

Re: Equalizing nodes storage load

2011-07-22 Thread Mina Naguib


I'm trying to balance Load ( 41.98GB vs 59.4GB vs 74.65GB )

Owns looks ok. They're all 33.33% which is what I want.  It was calculated 
simply by 2^127 / num_nodes.  The only reason the first one doesn't start at 0 
is that I''ve actually carved the ring planning for 9 machines (2 new data 
centers of 3 machines each).  However only 1 data center (DCMTL) is currently 
up.


On 2011-07-22, at 10:56 AM, Sasha Dolgy wrote:

 are you trying to balance load or owns ?  owns looks fine ...
 33.33% each ... which to me says balanced.
 
 how did you calculate your tokens?
 
 
 On Fri, Jul 22, 2011 at 4:37 PM, Mina Naguib
 mina.nag...@bloomdigital.com wrote:
 
 Address Status State   LoadOwnsToken
 xx.xx.x.105 Up Normal  41.98 GB33.33%  
 37809151880104273718152734159085356828
 xx.xx.x.107 Up Normal  59.4 GB 33.33%  
 94522879700260684295381835397713392071
 xx.xx.x.18  Up Normal  74.65 GB33.33%  
 151236607520417094872610936636341427313

CompositeType for row Keys

2011-07-22 Thread Patrick Julien

With the current implementation of CompositeType in Cassandra 0.8.1,
is it recommended practice to try to use a CompositeType as the key?
Or are both, column and key, equally well supported?

The documentation on CompositeType is light, well non-existent really, with

key_validation_class set to CompositeType (UUIDType, IntegerType)

can we query all matching rows just by using CompositeType(UUIDType)?

In my specific use case, what would work best is to have a composite
key that is a CompositeType with thousands of columns each.

Re: CompositeType for row Keys

2011-07-22 Thread Donal Zang

If you are using OPP, then you can use CompositeType on both key and 
column name; otherwise(Random Partition), just use it for columns.

On 22/07/2011 17:10, Patrick Julien wrote:

With the current implementation of CompositeType in Cassandra 0.8.1,
is it recommended practice to try to use a CompositeType as the key?
Or are both, column and key, equally well supported?

The documentation on CompositeType is light, well non-existent really, with

key_validation_class set to CompositeType (UUIDType, IntegerType)

can we query all matching rows just by using CompositeType(UUIDType)?

In my specific use case, what would work best is to have a composite
key that is a CompositeType with thousands of columns each.




--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018

Re: CompositeType for row Keys

2011-07-22 Thread Patrick Julien

I can still use it for keys if I don't need ranges then?  Because for
what we are doing we can always re-assemble keys

On Fri, Jul 22, 2011 at 11:38 AM, Donal Zang zan...@ihep.ac.cn wrote:
 If you are using OPP, then you can use CompositeType on both key and column
 name; otherwise(Random Partition), just use it for columns.
 On 22/07/2011 17:10, Patrick Julien wrote:

 With the current implementation of CompositeType in Cassandra 0.8.1,
 is it recommended practice to try to use a CompositeType as the key?
 Or are both, column and key, equally well supported?

 The documentation on CompositeType is light, well non-existent really,
 with

 key_validation_class set to CompositeType (UUIDType, IntegerType)

 can we query all matching rows just by using CompositeType(UUIDType)?

 In my specific use case, what would work best is to have a composite
 key that is a CompositeType with thousands of columns each.



 --
 Donal Zang
 Computing Center, IHEP
 19B YuquanLu, Shijingshan District,Beijing, 100049
 zan...@ihep.ac.cn
 86 010 8823 6018

Fwd: Counter consistency - are counters idempotent?

2011-07-22 Thread Yang

btw, this issue of  not knowing whether a write is persisted or not
when client reports error, is not limited to counters,  for regular
columns, it's the same: if client reports write failure, the value may
well be replicated to all replicas later.  this is even the same with
all other systems: Zookeeper, Paxos, ultimately due to the FLP
theoretical result of no guarantee of consensus in async systems


-- Forwarded message --
From: Sylvain Lebresne sylv...@datastax.com
Date: Fri, Jul 22, 2011 at 8:03 AM
Subject: Re: Counter consistency - are counters idempotent?
To: user@cassandra.apache.org


On Fri, Jul 22, 2011 at 4:52 PM, Kenny Yu kenny...@knewton.com wrote:
 As of Cassandra 0.8.1, are counter increments and decrements idempotent? If,
 for example, a client sends an increment request and the increment occurs,
 but the network subsequently fails and reports a failure to the client, will
 Cassandra retry the increment (thus leading to an overcount and inconsistent
 data)?
 I have done some reading and I am getting conflicting sources about counter
 consistency. In this source
 (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html),
 it states that counters now have the same consistency as regular
 columns--does this imply that the above example will not lead to an
 overcount?

That email thread was arguably a bit imprecise with its use of the
word 'consistency'
but what it was talking about is really consistency level. That is, counter
supports all the usual consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM,
EACH_QUORUM) excepted ANY.
Counter are still not idempotent. And just a small precision, if you
get a TimeoutException,
Cassandra never retry the increment on it's own (your sentence
suggests it does),
but you won't know in that case if the increment was persisted or not,
and thus you
won't know if you should retry or not. And yes, this is still a
limitation of counters.


 If counters are not idempotent, are there examples of effective uses of
 counters that will prevent inconsistent counts?
 Thank you for your help.

Re: Counter consistency - are counters idempotent?

2011-07-22 Thread Jonathan Ellis

If that's the case, your client is being misleading.  Cassandra
distinguishes between Unavailable (we knew we couldn't achieve CL
before we started, and nothing changed) and TimedOut (didn't get reply
in a timely fashion; it may or may not have gone through).

TimedOut != Failed.

On Fri, Jul 22, 2011 at 11:08 AM, Yang tedd...@gmail.com wrote:
 btw, this issue of  not knowing whether a write is persisted or not
 when client reports error, is not limited to counters,  for regular
 columns, it's the same: if client reports write failure, the value may
 well be replicated to all replicas later.  this is even the same with
 all other systems: Zookeeper, Paxos, ultimately due to the FLP
 theoretical result of no guarantee of consensus in async systems


 -- Forwarded message --
 From: Sylvain Lebresne sylv...@datastax.com
 Date: Fri, Jul 22, 2011 at 8:03 AM
 Subject: Re: Counter consistency - are counters idempotent?
 To: user@cassandra.apache.org


 On Fri, Jul 22, 2011 at 4:52 PM, Kenny Yu kenny...@knewton.com wrote:
 As of Cassandra 0.8.1, are counter increments and decrements idempotent? If,
 for example, a client sends an increment request and the increment occurs,
 but the network subsequently fails and reports a failure to the client, will
 Cassandra retry the increment (thus leading to an overcount and inconsistent
 data)?
 I have done some reading and I am getting conflicting sources about counter
 consistency. In this source
 (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html),
 it states that counters now have the same consistency as regular
 columns--does this imply that the above example will not lead to an
 overcount?

 That email thread was arguably a bit imprecise with its use of the
 word 'consistency'
 but what it was talking about is really consistency level. That is, counter
 supports all the usual consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM,
 EACH_QUORUM) excepted ANY.
 Counter are still not idempotent. And just a small precision, if you
 get a TimeoutException,
 Cassandra never retry the increment on it's own (your sentence
 suggests it does),
 but you won't know in that case if the increment was persisted or not,
 and thus you
 won't know if you should retry or not. And yes, this is still a
 limitation of counters.


 If counters are not idempotent, are there examples of effective uses of
 counters that will prevent inconsistent counts?
 Thank you for your help.




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: CompositeType for row Keys

2011-07-22 Thread Donal Zang


On 22/07/2011 17:56, Patrick Julien wrote:

I can still use it for keys if I don't need ranges then?  Because for
what we are doing we can always re-assemble keys

yes,but why would you use CompositeType if you don't need range query?

On Fri, Jul 22, 2011 at 11:38 AM, Donal Zangzan...@ihep.ac.cn  wrote:

If you are using OPP, then you can use CompositeType on both key and column
name; otherwise(Random Partition), just use it for columns.
On 22/07/2011 17:10, Patrick Julien wrote:

With the current implementation of CompositeType in Cassandra 0.8.1,
is it recommended practice to try to use a CompositeType as the key?
Or are both, column and key, equally well supported?

The documentation on CompositeType is light, well non-existent really,
with

key_validation_class set to CompositeType (UUIDType, IntegerType)

can we query all matching rows just by using CompositeType(UUIDType)?

In my specific use case, what would work best is to have a composite
key that is a CompositeType with thousands of columns each.



--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018






--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018

Re: [SPAM] Fwd: Counter consistency - are counters idempotent?

2011-07-22 Thread Donal Zang


On 22/07/2011 18:08, Yang wrote:

btw, this issue of  not knowing whether a write is persisted or not
when client reports error, is not limited to counters,  for regular
columns, it's the same: if client reports write failure, the value may
well be replicated to all replicas later.  this is even the same with
all other systems: Zookeeper, Paxos, ultimately due to the FLP
theoretical result of no guarantee of consensus in async systems

yes, but with regular columns, retry is OK, while counter is not.


-- Forwarded message --
From: Sylvain Lebresnesylv...@datastax.com
Date: Fri, Jul 22, 2011 at 8:03 AM
Subject: Re: Counter consistency - are counters idempotent?
To: user@cassandra.apache.org


On Fri, Jul 22, 2011 at 4:52 PM, Kenny Yukenny...@knewton.com  wrote:

As of Cassandra 0.8.1, are counter increments and decrements idempotent? If,
for example, a client sends an increment request and the increment occurs,
but the network subsequently fails and reports a failure to the client, will
Cassandra retry the increment (thus leading to an overcount and inconsistent
data)?
I have done some reading and I am getting conflicting sources about counter
consistency. In this source
(http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html),
it states that counters now have the same consistency as regular
columns--does this imply that the above example will not lead to an
overcount?

That email thread was arguably a bit imprecise with its use of the
word 'consistency'
but what it was talking about is really consistency level. That is, counter
supports all the usual consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM,
EACH_QUORUM) excepted ANY.
Counter are still not idempotent. And just a small precision, if you
get a TimeoutException,
Cassandra never retry the increment on it's own (your sentence
suggests it does),
but you won't know in that case if the increment was persisted or not,
and thus you
won't know if you should retry or not. And yes, this is still a
limitation of counters.



If counters are not idempotent, are there examples of effective uses of
counters that will prevent inconsistent counts?
Thank you for your help.



--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018

Re: CompositeType for row Keys

2011-07-22 Thread Nate McCall

 yes,but why would you use CompositeType if you don't need range query?

If you were doing composite keys anyway (common approach with time
series data for example), you would not have to write parsing and
concatenation code. Particularly useful if you had mixed types in the
key.

Re: CompositeType for row Keys

2011-07-22 Thread Patrick Julien

Exactly.  In any case, I just answered my own question.  If I need
range, I can just make another column family where the column name are
these keys

On Fri, Jul 22, 2011 at 12:37 PM, Nate McCall n...@datastax.com wrote:
 yes,but why would you use CompositeType if you don't need range query?

 If you were doing composite keys anyway (common approach with time
 series data for example), you would not have to write parsing and
 concatenation code. Particularly useful if you had mixed types in the
 key.

Predictable low RW latency, SLABS and STW GC

2011-07-22 Thread Milind Parikh

In order to be predicable @ big data scale, the intensity and periodicity of
STW Garbage Collection has to be brought down. Assume that SLABS (Cass 2252)
will be available in the main line at some time and assume that this will
have the impact that other projects (hbase etc) are reporting. I womder
whether avoiding GC by restarting the servers before GC will be a feasible
approach (of course while knowing the workload)

Regards
Milind

[RELEASE] Apache Cassandra 0.7.8 released

2011-07-22 Thread Sylvain Lebresne

The Cassandra team is pleased to announce the release of Apache Cassandra
version 0.7.8.

This version is a bug fix release[1] and in particular it fixes a regression
of Cassandra 0.7.7 that made hinted handoff delivery not being triggered
automatically (you could still force delivery through JMX). For that reason,
upgrade is highly encouraged.

Please always pay attention to the release notes[2] before upgrading though.

This release can be downloaded as usual from http://cassandra.apache.org/.

If you were to encounter any problem, please let us know[3].

Have fun!


[1]: http://goo.gl/LrBBY (CHANGES.txt)
[2]: http://goo.gl/rX1q0 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA

Re: b-tree

2011-07-22 Thread Mike Malone

On Fri, Jul 22, 2011 at 12:05 AM, Eldad Yamin elda...@gmail.com wrote:

 In order order to split the nodes.
 SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if
 the number is 1,000 they split the node.
 In order to avoid that more then 1 process will edit/split the node -
 transaction is needed.

You don't need a transaction, you just need consensus and/or idempotence. In
this case both can be achieved fairly easily.

Mike


 On Jul 22, 2011 1:01 AM, aaron morton aa...@thelastpickle.com wrote:
  But how will you be able to maintain it while it evolves and new data is
 added without transactions?
 
  What is the situation you think you need transactions for ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 22 Jul 2011, at 00:06, Eldad Yamin wrote:
 
  Aaron,
  Nested set is exactly what I had in mind.
  But how will you be able to maintain it while it evolves and new data is
 added without transactions?
 
  Thanks!
 
  On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.com
 wrote:
  Just throwing out a (half baked) idea, perhaps the Nested Set Model of
 trees would work http://en.wikipedia.org/wiki/Nested_set_model
 
  * Ever row would represent a set with a left and right encoded into the
 key
  * Members are inserted as columns into *every* set / row they are a
 member. So we are de-normalising and trading space for time.
  * May need to maintain a custom secondary index of the materialised
 sets. e.g. slice a row to get the first column = the left value you are
 interested in, that is the key for the set.
 
  I've not thought it through much further than that, a lot would depend
 on your data. The top sets may get very big, .
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote:
 
  Im not sure if I have an answer for you, anyway, but I'm curious
 
  A b-tree and a binary tree are not the same thing. A binary tree is a
 basic fundamental data structure, A b-tree is an approach to storing and
 indexing data on disc for a database.
 
  Which do you mean?
 
  On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com
 wrote:
  Hello,
  Is there any good way of storing a binary-tree in Cassandra?
  I wonder if someone already implement something like that and how
 accomplished that without transaction supports (while the tree keep
 evolving)?
 
  I'm asking that becouse I want to save geospatial-data, and SimpleGeo
 did it using b-tree:
 
 http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
 
  Thanks!
 
 
 
  --
  It's always darkest just before you are eaten by a grue.

Re: how to stop the whole cluster, start the whole cluster like in hadoop/hbase?

2011-07-22 Thread Dean Hiller

Yes, I am wondering more about the yaml file and the settings like the
autobootstrap setting and such.

I guess I will find out once they enable my amazon service and I can get
running with it.

NOTE: anyone doing 1.0 or prototype I think constantly uses start/stop whole
cluster to upgrade/install new stuff into all the nodes in cassandrayes,
we don't plan on using that in production of course as then we would prefer
to do a rolling restart to get new code into the datagrid if needed.

thanks,
Dean

On Thu, Jul 21, 2011 at 2:24 PM, Eldad Yamin elda...@gmail.com wrote:

 I wonder if it wont make problems...
 Anyine did it already?
  On Jul 21, 2011 10:39 PM, Jonathan Ellis jbel...@gmail.com wrote:
  dsh -c -g cassandra /etc/init.d/cassandra stop
 
  http://www.netfort.gr.jp/~dancer/software/dsh.html.en
 
  P.S. mostly people are concerned about making sure their entire
  cluster does NOT stop at the same time :)
 
  On Thu, Jul 21, 2011 at 2:23 PM, Dean Hiller d...@alvazan.com wrote:
  Is there a framework for stopping all nodes/starting all nodes for
  cassandra?  I am okay with something like password-less ssh setup that
  hadoop scripts did...just something that allows me to start and stop the
  whole cluster.
 
  thanks,
  Dean
 
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com

CL=N-1?

2011-07-22 Thread Yang

is there such an option?

in some cases I want to distribute some small lookup tables to all the
nodes, so that everyone has a local copy, and loaded in memory. so the
lookup is fast. supposedly I want to write to all N nodes, but that
exposes me to failure in case of just one node down.
so I'd like to declare success to N-1 nodes.

thanks
Yang

CL=N-1?

2011-07-22 Thread Yang

is there such an option?

in some cases I want to distribute some small lookup tables to all the
nodes, so that everyone has a local copy, and loaded in memory. so the
lookup is fast. supposedly I want to write to all N nodes, but that
exposes me to failure in case of just one node down.
so I'd like to declare success to N-1 nodes.

thanks
Yang

Re: CL=N-1?

2011-07-22 Thread Edward Capriolo

On Fri, Jul 22, 2011 at 3:24 PM, Yang tedd...@gmail.com wrote:

 is there such an option?

 in some cases I want to distribute some small lookup tables to all the
 nodes, so that everyone has a local copy, and loaded in memory. so the
 lookup is fast. supposedly I want to write to all N nodes, but that
 exposes me to failure in case of just one node down.
 so I'd like to declare success to N-1 nodes.

 thanks
 Yang


There is no n-1 CL. The numbered levels are ONE,TWO,THREE. There has been
some talk of this. By pairing READ/WRITE levels you can normally get an
effect close enough to what you are looking for. In your case QUORUM will
write and read with a single FAILED node.

Also if your lookup tables are not changing often not having n-1 is
negligible.

Re: Repair fails with java.io.IOError: java.io.EOFException

2011-07-22 Thread Sameer Farooqui

I don't see a JVM crashlog ( hs_err_pid[pid].log) in
~/brisk/resources/cassandra/bin or /tmp. So maybe JVM didn't crash?

We're running a pretty up to date with Sun Java:

ubuntu@ip-10-2-x-x:/tmp$ java -version
java version 1.6.0_24
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

I'm gonna restart the Repair process in a few more hours. If there are any
additional debug or troubleshooting logs you'd like me to enable first,
please let me know.

- Sameer



On Thu, Jul 21, 2011 at 5:31 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Did you check for a JVM crash log?

 You should make sure you're running the latest Sun JVM, older versions
 and OpenJDK in particular are prone to segfaulting.

 On Thu, Jul 21, 2011 at 6:53 PM, Sameer Farooqui
 cassandral...@gmail.com wrote:
  We are starting Cassandra with brisk cassandra, so as a stand-alone
  process, not a service.
 
  The syslog on the node doesn't show anything regarding the Cassandra Java
  process around the time the last entries were made in the Cassandra
  system.log (2011-07-21 13:01:51):
 
  Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v
  debian-sa1  /dev/null  debian-sa1 1 1)
  Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v
  debian-sa1  /dev/null  debian-sa1 1 1)
  Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v
  debian-sa1  /dev/null  debian-sa1 1 1)
  Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source =
  /proc/kmsg started.
  Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software=rsyslogd
  swVersion=4.2.0 x-pid=663 x-info=http://www.rsyslog.com;] (re)start
 
 
  The last thing in the Cassandra log before INFO Logging initialized is:
 
   INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line
 128)
  GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max
 is
  4030726144
 
 
  I can start Repair again, but am worried that it will crash Cassandra
 again,
  so I want to turn on any debugging or helpful logs to diagnose the crash
 if
  it happens again.
 
 
  - Sameer
 
 
  On Thu, Jul 21, 2011 at 4:30 PM, aaron morton aa...@thelastpickle.com
  wrote:
 
  The default init.d script will direct std out/err to that file, how are
  you starting brisk / cassandra ?
  Check the syslog and other logs in /var/log to see if the OS killed
  cassandra.
  Also, what was the last thing in the casandra log before INFO [main]
  2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging
  initialised ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
  On 22 Jul 2011, at 10:50, Sameer Farooqui wrote:
 
  Hey Aaron,
 
  I don't have any output.log files in that folder:
 
  ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra
  ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls
  system.log system.log.11  system.log.4  system.log.7
  system.log.1   system.log.2   system.log.5  system.log.8
  system.log.10  system.log.3   system.log.6  system.log.9
 
 
 
  On Thu, Jul 21, 2011 at 3:40 PM, aaron morton aa...@thelastpickle.com
  wrote:
 
  Check /var/log/cassandra/output.log (assuming the default init scripts)
  A
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
  On 22 Jul 2011, at 10:13, Sameer Farooqui wrote:
 
  Hmm. Just looked at the log more closely.
 
  So, what actually happened is while Repair was running on this specific
  node, the Cassandra java process terminated itself automatically. The
 last
  entries in the log are:
 
   INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line
  128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888
 used; max
  is 4030726144
   INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line
  128) GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688
 used; max
  is 4030726144
   INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line
  128) GC for ParNew: 251 ms, 148861328 reclaimed leaving 193120
 used; max
  is 4030726144
   INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java (line
  128) GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368
 used; max
  is 4030726144
   INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line
  128) GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176
 used; max
  is 4030726144
   INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line
  128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600
 used; max
  is 4030726144
 
  When we came in this morning, nodetool ring from another node showed
 the
  1st node as down and OpsCenter also reported it as down.
 
  Next we ran sudo netstat -anp | grep 7199 from the 1st node to see
 the
  status of the Cassandra PID and it was not running.
 
  We then started Cassandra:
 
  INFO [main]

Re: b-tree

2011-07-22 Thread Peter Tillotson

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I've not tried this but speculative implementation schema probably
something like the following:

Super Col family for structure

hash(nodeId): {
root: { left=nodeId1, right=nodeId2 }
nodeId1: { left=nodeId3, right=nodeId4 }
nodeId2: { left=nodeId5, right=nodeId6 }
}

Col family for data

root: { meta=blob col1=val1 col2=val2 }
nodeId1: { meta=blob col3=val1 col4=val3 }
etc

Same Col family for index

val1: { root: , nodeId1= }
val2: { root:  }
val1: { nodeId1= }


I think 3 Multiget queries covers most logic and splitting the tree etc
should be reasonably easy to manage. Though you probably pull more data
back to the client than was strictly needed for the logic.

Something along these lines would probably work.

p

On 20/07/11 21:30, Eldad Yamin wrote:
 Hello,
 Is there any good way of storing a binary-tree in Cassandra?
 I wonder if someone already implement something like that and how
 accomplished that without transaction supports (while the tree keep
 evolving)?
 
 I'm asking that becouse I want to save geospatial-data, and SimpleGeo
 did it using b-tree:
 http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
 
 Thanks!



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOKd7gAAoJEFL3PBwM86C5QhcH/2a3/dkvGcIWuc81s3Ffv3Gp
qdcJ8zTajZnDK5iH7+vItDGiTTr8QcUNDLCz/CsVIBIglMHGZXe0n4m43VAlJGy2
LiVCqP8QPO8QAP+4Juwy3TORh/QlY6+ac+OXDD7FDHNjujQfTy3EW9ey7bx96g8m
M8XeFM4jXME+YAkUxLUhN0azx/khAM5AQGqEkoOs4W3+BVlYBTmejNmSZZZICUxz
SIapVX6mzzeJhQdXRKSx1dUBJBbt78duG3WcXw2y1hqH0T9Kk8LHQqoCOVzkXQQa
GGjUrKArTHKJAJwLbu9qxV9uR7kjaBYiBkAYD/7R6TnDwAaqpk6tnWnOC16aMbc=
=WX26
-END PGP SIGNATURE-

Re: [SPAM] Fwd: Counter consistency - are counters idempotent?

2011-07-22 Thread Aaron Turner

On Fri, Jul 22, 2011 at 9:27 AM, Donal Zang zan...@ihep.ac.cn wrote:
 On 22/07/2011 18:08, Yang wrote:

 btw, this issue of  not knowing whether a write is persisted or not
 when client reports error, is not limited to counters,  for regular
 columns, it's the same: if client reports write failure, the value may
 well be replicated to all replicas later.  this is even the same with
 all other systems: Zookeeper, Paxos, ultimately due to the FLP
 theoretical result of no guarantee of consensus in async systems

 yes, but with regular columns, retry is OK, while counter is not.

I know I've heard that fixing this issue is hard.  I've assumed this
to mean don't expect a fix anytime soon.  Is that accurate?
Beginning to start having second thoughts that Cassandra is the right
fit for my project which would heavily rely on counters to roll up
aggregates.

-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
carpe diem quam minimum credula postero

question on setup for writes into 2 datacenters

2011-07-22 Thread Dean Hiller

Ideally, we would want to have a replication factor of 4, and a minimum
write consistency of 2 (which looking at the default in cassandra.yaml is to
memory first with asynch to disk...perfect so far!!!)

Now, obviously, I can get the partitioner setup to make sure I get 2
replicas in each data center.  The next thing I would want to guarantee
however is that if a write came into datacenter 1, it would write to the two
nodes in datacenter 1 and asynchronously replicate to datacenter 2.  Is this
possible?  Does cassandra already handle that or is there something I could
do to get cassandra to do that?

In this mode, I believe I can have both datacenters be live as well as be
backup for the other not wasting resources.

thanks,
Dean

CQL COUNT Not Accurate?

2011-07-22 Thread Hefeng Yuan

Hi,

I just noticed that the count(*) in CQL seems to be having wrong answer, when I 
have only one row, the count(*) returns two.

Below are the commands I tried:

cqlsh SELECT COUNT(*) FROM UserProfile USING CONSISTENCY QUORUM WHERE KEY IN 
('00D760DB1730482D81BC6845F875A97D');
(2,)
cqlsh select * from UserProfile where key = '00D760DB1730482D81BC6845F875A97D';
u'00D760DB1730482D81BC6845F875A97D' | 
u'ScreenName','3D5C78CAE2E143FBBD1F539A8496D472'

Is this a known bug? Or I did something wrong?

Thanks,
Hefeng

Re: CQL COUNT Not Accurate?

2011-07-22 Thread Eric Evans

On Fri, 2011-07-22 at 14:18 -0700, Hefeng Yuan wrote:
 Hi,
 
 I just noticed that the count(*) in CQL seems to be having wrong answer, when 
 I have only one row, the count(*) returns two.
 
 Below are the commands I tried:
 
 cqlsh SELECT COUNT(*) FROM UserProfile USING CONSISTENCY QUORUM WHERE KEY IN 
 ('00D760DB1730482D81BC6845F875A97D');
 (2,)
 cqlsh select * from UserProfile where key = 
 '00D760DB1730482D81BC6845F875A97D';
 u'00D760DB1730482D81BC6845F875A97D' | 
 u'ScreenName','3D5C78CAE2E143FBBD1F539A8496D472'
 
 Is this a known bug? Or I did something wrong?

It's the equivalent of get_count(), so it returns a column count instead
of row count.

-- 
Eric Evans
eev...@rackspace.com

select * from A join B using(common_id) where A.id == a and B.id == b

2011-07-22 Thread Yang

this is a common pattern used in RDMS,
is there some existing idiom to do it in cassandra ?


if the size of select * from A where id == a  is very large, and
similarly for B, while the join of A.id == a and B.id==b is small,
then doing a get() for both and then merging seems excessively slow.


my fall-back approach is, since A and B do not change a lot, I'll
pre-generate the join of A and B (not very large) keyed on A.id +
B.id,
then do the get(a+b)


thanks
Yang

Re: CQL COUNT Not Accurate?

2011-07-22 Thread Jonathan Ellis

Yes, this is broken.  We'll fix this for
https://issues.apache.org/jira/browse/CASSANDRA-2474

On Fri, Jul 22, 2011 at 4:18 PM, Hefeng Yuan hfy...@rhapsody.com wrote:
 Hi,

 I just noticed that the count(*) in CQL seems to be having wrong answer, when 
 I have only one row, the count(*) returns two.

 Below are the commands I tried:

 cqlsh SELECT COUNT(*) FROM UserProfile USING CONSISTENCY QUORUM WHERE KEY IN 
 ('00D760DB1730482D81BC6845F875A97D');
 (2,)
 cqlsh select * from UserProfile where key = 
 '00D760DB1730482D81BC6845F875A97D';
 u'00D760DB1730482D81BC6845F875A97D' | 
 u'ScreenName','3D5C78CAE2E143FBBD1F539A8496D472'

 Is this a known bug? Or I did something wrong?

 Thanks,
 Hefeng



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Equalizing nodes storage load

2011-07-22 Thread Mina Naguib


Hi Peter

That was precisely it.  Thank you :)

Doing a major compaction on the heaviest node (74.65GB) reduced it to 33.55GB.

I'll compact the other 2 nodes as well.  I anticipate they will also settle 
around that size.


On 2011-07-22, at 5:00 PM, Peter Tillotson wrote:

 I'm not sure if this is the answer, but major compaction on each node
 for each column family. I suspect the data shuffle has left quite a few
 deleted keys which may get cleaned out on major compaction. As I
 remember major compaction doesn't automatically in 7.x, I'm not sure if
 it is triggered by repair.
 
 p
 
 On 22/07/11 16:08, Mina Naguib wrote:
 
 I'm trying to balance Load ( 41.98GB vs 59.4GB vs 74.65GB )
 
 Owns looks ok. They're all 33.33% which is what I want.  It was calculated 
 simply by 2^127 / num_nodes.  The only reason the first one doesn't start at 
 0 is that I''ve actually carved the ring planning for 9 machines (2 new data 
 centers of 3 machines each).  However only 1 data center (DCMTL) is 
 currently up.
 
 
 On 2011-07-22, at 10:56 AM, Sasha Dolgy wrote:
 
 are you trying to balance load or owns ?  owns looks fine ...
 33.33% each ... which to me says balanced.
 
 how did you calculate your tokens?
 
 
 On Fri, Jul 22, 2011 at 4:37 PM, Mina Naguib
 mina.nag...@bloomdigital.com wrote:
 
 Address Status State   LoadOwnsToken
 xx.xx.x.105 Up Normal  41.98 GB33.33%  
 37809151880104273718152734159085356828
 xx.xx.x.107 Up Normal  59.4 GB 33.33%  
 94522879700260684295381835397713392071
 xx.xx.x.18  Up Normal  74.65 GB33.33%  
 151236607520417094872610936636341427313

43 matches

Mail list logo