4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread Jeremy Hanna
We've been having issues where as soon as we start doing heavy writes (via 
hadoop) recently, it really hammers 4 nodes out of 20.  We're using random 
partitioner and we've set the initial tokens for our 20 nodes according to the 
general spacing formula, except for a few token offsets as we've replaced dead 
nodes.

When I say hammers, I look at nodetool tpstats: those 4 nodes have completed 
something like 70 million mutation stage events whereas the rest of the cluster 
have completed from 2-20 million mutation stage events.  Therefore, on the 4 
nodes, we find in the logs there is evidence of backing up in the mutation 
stage and a lot of read repair message drops.  It looks like there is quite a 
bit of flushing is going on and consequently auto minor compactions.

We are running 0.7.8 and have about 34 column families (when counting secondary 
indexes as column families) so we can't get too large with our memtable 
throughput in mb.  We would like to upgrade to 0.8.4 (not least because of 
JAMM) but it seems that something else is going on with our cluster if we are 
using RP and balanced initial tokens and still have 4 hot nodes.

Do these symptoms and context sound familiar to anyone?  Does anyone have any 
suggestions as to how to address this kind of case - disproportionate write 
load?

Thanks,

Jeremy

Re: Avoid Simultaneous Minor Compactions?

2011-08-23 Thread aaron morton
Change one thing at a time and work out what metric it is you want to improve. 

I would start with reducing compaction_throughput_mb_per_sec. Have a look in 
your logs for the Enqueuing flush of Memtable… messages, count up how many 
serialised bytes you are flushing and then check it against the advice in the 
yaml file. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2011, at 11:36 AM, Hefeng Yuan wrote:

 Shall I lower this or increase it? Or probably ask in this way, do we suggest 
 to let it run longer while using less CPU, or should we let it finish faster 
 with more CPU usage?
 The problem we're facing is, with the default setting, they run slow and also 
 eat a lot of CPU in the meanwhile.
 
 I'm thinking about the following changes, does this make sense?
 1. lower the compaction thread priority
 2. shorten the compaction threshold to 2~20
 3. lower compaction_throughput_mb_per_sec to 10
 
 Thanks,
 Hefeng
 
 On Aug 22, 2011, at 8:09 AM, Jonathan Ellis wrote:
 
 Specifically, look at compaction_throughput_mb_per_sec in cassandra.yaml
 
 On Mon, Aug 22, 2011 at 12:39 AM, Ryan King r...@twitter.com wrote:
 You should throttle your compactions to a sustainable level.
 
 -ryan
 
 On Sun, Aug 21, 2011 at 10:22 PM, Hefeng Yuan hfy...@rhapsody.com wrote:
 We just noticed that at one time, 4 nodes were doing minor compaction 
 together, each of them took 20~60 minutes.
 We're on 0.8.1, 6 nodes, RF5.
 This simultaneous compactions slowed down the whole cluster, we have 
 local_quorum consistency level, therefore, dynamic_snitch is not helping 
 us.
 
 Aside from lower down the compaction thread priority, is there any other 
 way to tell the cluster hold on doing this if other nodes are already 
 compacting?
 
 Thanks,
 Hefeng
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 



Re: run Cassandra tutorial example

2011-08-23 Thread aaron morton
Did you sort this out ? The #cassandra IRC room is a good place to get help as 
well.

I tried to build it first using 
mvn compile and got this different error 
[ERROR] Failed to execute goal on project cassandra-tutorial: Could not 
resolve dependencies for project 
com.datastax.tutorial:cassandra-tutorial:jar:1.0-SNAPSHOT: Could not find 
artifact me.prettyprint:hector-core:jar:0.8.0-2-SNAPSHOT - [Help 1]

There is a fix here...
https://github.com/zznate/cassandra-tutorial/pull/1

After than I did mvn compile and it worked. 

So added the schema…
path-to-cassandra/bin/cassandra-cli --host localhost  
~/code/github/datastax/cassandra-tutorial/npanxx_script.txt 

And ran 
mvn -e exec:java -Dexec.args=get 
-Dexec.mainClass=com.datastax.tutorial.TutorialRunner

Which outputted 
HColumn(city=Austin)

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2011, at 10:30 AM, Alvin UW wrote:

 Hello,
 
 I'd like to try the cassandra tutorial example: 
 https://github.com/zznate/cassandra-tutorial
  by following the readme.
 
 After typing  mvn -e exec:java -Dexec.args=get 
 -Dexec.mainClass=com.datastax.tutorial.TutorialRunner
 I got the following errors.
 Should I do something before the above command?
 Thanks.
 
 + Error stacktraces are turned on.
 [INFO] Scanning for projects...
 [INFO] 
 
 [INFO] Building cassandra-tutorial
 [INFO]task-segment: [exec:java]
 [INFO] 
 
 [INFO] Preparing exec:java
 [INFO] No goals needed for project - skipping
 [INFO] [exec:java {execution: default-cli}]
 [INFO] 
 
 [ERROR] BUILD ERROR
 [INFO] 
 
 [INFO] An exception occured while executing the Java class. 
 com.datastax.tutorial.TutorialRunner
 
 [INFO] 
 
 [INFO] Trace
 org.apache.maven.lifecycle.LifecycleExecutionException: An exception occured 
 while executing the Java class. com.datastax.tutorial.TutorialRunner
 at 
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:719)
 at 
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:569)
 at 
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:539)
 at 
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387)
 at 
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:348)
 at 
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180)
 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328)
 at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138)
 at org.apache.maven.cli.MavenCli.main(MavenCli.java:362)
 at org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)
 at org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
 at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)
 at org.codehaus.classworlds.Launcher.main(Launcher.java:375)
 Caused by: org.apache.maven.plugin.MojoExecutionException: An exception 
 occured while executing the Java class. com.datastax.tutorial.TutorialRunner
 at org.codehaus.mojo.exec.ExecJavaMojo.execute(ExecJavaMojo.java:346)
 at 
 org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:490)
 at 
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:694)
 ... 17 more
 Caused by: java.lang.ClassNotFoundException: 
 com.datastax.tutorial.TutorialRunner
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:284)
 at java.lang.Thread.run(Thread.java:619)
 [INFO] 
 
 [INFO] Total time:  1 second
 [INFO] Finished at: Mon Aug 22 18:24:05 EDT 2011
 [INFO] Final Memory: 

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread Peter Schuller
 We've been having issues where as soon as we start doing heavy writes (via 
 hadoop) recently, it really hammers 4 nodes out of 20.  We're using random 
 partitioner and we've set the initial tokens for our 20 nodes according to 
 the general spacing formula, except for a few token offsets as we've replaced 
 dead nodes.

Is the hadoop job iterating over keys in the cluster in token order
perhaps, and you're generating writes to those keys? That would
explain a moving hotspot along the cluster.

-- 
/ Peter Schuller (@scode on twitter)


Re: Completely removing a node from the cluster

2011-08-23 Thread aaron morton
I'm running low on ideas for this one. Anyone else ? 

If the phantom node is not listed in the ring, other nodes should not be 
storing hints for it. You can see what nodes they are storing hints for via 
JConsole. 

You can try a rolling restart passing the JVM opt 
-Dcassandra.load_ring_state=false However if the phantom node is been passed 
around in the gossip state it will probably just come back again. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2011, at 3:49 PM, Bryce Godfrey wrote:

 Could this ghost node be causing my hints column family to grow to this size? 
  I also crash after about 24 hours due to commit logs growth taking up all 
 the drive space.  A manual nodetool flush keeps it under control though.
 
 
Column Family: HintsColumnFamily
SSTable count: 6
Space used (live): 666480352
Space used (total): 666480352
Number of Keys (estimate): 768
Memtable Columns Count: 1043
Memtable Data Size: 461773
Memtable Switch Count: 3
Read Count: 38
Read Latency: 131.289 ms.
Write Count: 582108
Write Latency: 0.019 ms.
Pending Tasks: 0
Key cache capacity: 7
Key cache size: 6
Key cache hit rate: 0.8334
Row cache: disabled
Compacted row minimum size: 2816160
Compacted row maximum size: 386857368
Compacted row mean size: 120432714
 
 Is there a way for me to manually remove this dead node?
 
 -Original Message-
 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] 
 Sent: Sunday, August 21, 2011 9:09 PM
 To: user@cassandra.apache.org
 Subject: RE: Completely removing a node from the cluster
 
 It's been at least 4 days now.
 
 -Original Message-
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Sunday, August 21, 2011 3:16 PM
 To: user@cassandra.apache.org
 Subject: Re: Completely removing a node from the cluster
 
 I see the mistake I made about ring, gets the endpoint list from the same 
 place but uses the token's to drive the whole process. 
 
 I'm guessing here, don't have time to check all the code. But there is a 3 
 day timeout in the gossip system. Not sure if it applies in this case. 
 
 Anyone know ?
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote:
 
 Both .2 and .3 list the same from the mbean that Unreachable is empty 
 collection, and Live node lists all 3 nodes still:
 192.168.20.2
 192.168.20.3
 192.168.20.1
 
 The removetoken was done a few days ago, and I believe the remove was done 
 from .2
 
 Here is what ring outlook looks like, not sure why I get that token on the 
 empty first line either:
 Address DC  RackStatus State   LoadOwns  
   Token
  
 85070591730234615865843651857942052864
 192.168.20.2datacenter1 rack1   Up Normal  79.53 GB   50.00% 
  0
 192.168.20.3datacenter1 rack1   Up Normal  42.63 GB   50.00% 
  85070591730234615865843651857942052864
 
 Yes, both nodes show the same thing when doing a describe cluster, that .1 
 is unreachable.
 
 
 -Original Message-
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Sunday, August 21, 2011 4:23 AM
 To: user@cassandra.apache.org
 Subject: Re: Completely removing a node from the cluster
 
 Unreachable nodes in either did not respond to the message or were known to 
 be down and were not sent a message. 
 The way the node lists are obtained for the ring command and describe 
 cluster are the same. So it's a bit odd. 
 
 Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean 
 ? What do the LiveNode and UnrechableNodes attributes say ? 
 
 Also how long ago did you remove the token and on which machine? Do both 
 20.2 and 20.3 think 20.1 is still around ? 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote:
 
 I'm on 0.8.4
 
 I have removed a dead node from the cluster using nodetool removetoken 
 command, and moved one of the remaining nodes to rebalance the tokens.  
 Everything looks fine when I run nodetool ring now, as it only lists the 
 remaining 2 nodes and they both look fine, owning 50% of the tokens.
 
 However, I can still see it being considered as part of the cluster from 
 the Cassandra-cli (192.168.20.1 being the removed node) and I'm worried 
 that the cluster is still queuing up hints for the node, or any other 
 issues it may cause:
 
 Cluster Information:
 

Re: Completely removing a node from the cluster

2011-08-23 Thread Jonathan Colby
I ran into this.  I also tried log_ring_state=false which also did not help.   
The way I got through this was to stop the entire cluster and start the nodes 
one-by-one.   

I realize this is not a practical solution for everyone, but if you can afford 
to stop the cluster for a few minutes, it's worth a try.


On Aug 23, 2011, at 9:26 AM, aaron morton wrote:

 I'm running low on ideas for this one. Anyone else ? 
 
 If the phantom node is not listed in the ring, other nodes should not be 
 storing hints for it. You can see what nodes they are storing hints for via 
 JConsole. 
 
 You can try a rolling restart passing the JVM opt 
 -Dcassandra.load_ring_state=false However if the phantom node is been passed 
 around in the gossip state it will probably just come back again. 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 23/08/2011, at 3:49 PM, Bryce Godfrey wrote:
 
 Could this ghost node be causing my hints column family to grow to this 
 size?  I also crash after about 24 hours due to commit logs growth taking up 
 all the drive space.  A manual nodetool flush keeps it under control though.
 
 
   Column Family: HintsColumnFamily
   SSTable count: 6
   Space used (live): 666480352
   Space used (total): 666480352
   Number of Keys (estimate): 768
   Memtable Columns Count: 1043
   Memtable Data Size: 461773
   Memtable Switch Count: 3
   Read Count: 38
   Read Latency: 131.289 ms.
   Write Count: 582108
   Write Latency: 0.019 ms.
   Pending Tasks: 0
   Key cache capacity: 7
   Key cache size: 6
   Key cache hit rate: 0.8334
   Row cache: disabled
   Compacted row minimum size: 2816160
   Compacted row maximum size: 386857368
   Compacted row mean size: 120432714
 
 Is there a way for me to manually remove this dead node?
 
 -Original Message-
 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] 
 Sent: Sunday, August 21, 2011 9:09 PM
 To: user@cassandra.apache.org
 Subject: RE: Completely removing a node from the cluster
 
 It's been at least 4 days now.
 
 -Original Message-
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Sunday, August 21, 2011 3:16 PM
 To: user@cassandra.apache.org
 Subject: Re: Completely removing a node from the cluster
 
 I see the mistake I made about ring, gets the endpoint list from the same 
 place but uses the token's to drive the whole process. 
 
 I'm guessing here, don't have time to check all the code. But there is a 3 
 day timeout in the gossip system. Not sure if it applies in this case. 
 
 Anyone know ?
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote:
 
 Both .2 and .3 list the same from the mbean that Unreachable is empty 
 collection, and Live node lists all 3 nodes still:
 192.168.20.2
 192.168.20.3
 192.168.20.1
 
 The removetoken was done a few days ago, and I believe the remove was done 
 from .2
 
 Here is what ring outlook looks like, not sure why I get that token on the 
 empty first line either:
 Address DC  RackStatus State   LoadOwns 
Token
 
 85070591730234615865843651857942052864
 192.168.20.2datacenter1 rack1   Up Normal  79.53 GB   
 50.00%  0
 192.168.20.3datacenter1 rack1   Up Normal  42.63 GB   
 50.00%  85070591730234615865843651857942052864
 
 Yes, both nodes show the same thing when doing a describe cluster, that .1 
 is unreachable.
 
 
 -Original Message-
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Sunday, August 21, 2011 4:23 AM
 To: user@cassandra.apache.org
 Subject: Re: Completely removing a node from the cluster
 
 Unreachable nodes in either did not respond to the message or were known to 
 be down and were not sent a message. 
 The way the node lists are obtained for the ring command and describe 
 cluster are the same. So it's a bit odd. 
 
 Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean 
 ? What do the LiveNode and UnrechableNodes attributes say ? 
 
 Also how long ago did you remove the token and on which machine? Do both 
 20.2 and 20.3 think 20.1 is still around ? 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote:
 
 I'm on 0.8.4
 
 I have removed a dead node from the cluster using nodetool removetoken 
 command, and moved one of the remaining nodes to rebalance the tokens.  
 Everything looks fine when I run nodetool ring now, as 

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread Jeremy Hanna

On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote:

 We've been having issues where as soon as we start doing heavy writes (via 
 hadoop) recently, it really hammers 4 nodes out of 20.  We're using random 
 partitioner and we've set the initial tokens for our 20 nodes according to 
 the general spacing formula, except for a few token offsets as we've 
 replaced dead nodes.
 
 Is the hadoop job iterating over keys in the cluster in token order
 perhaps, and you're generating writes to those keys? That would
 explain a moving hotspot along the cluster.

Yes - we're iterating over all the keys of particular column families, doing 
joins using pig as we enrich and perform measure calculations.  When we write, 
we're usually writing out for a certain small subset of keys which shouldn't 
have hotspots with RandomPartitioner afaict.

 
 -- 
 / Peter Schuller (@scode on twitter)



preloading entire CF with SEQ access on startup

2011-08-23 Thread Radim Kolar
Is there way to preload entire CF into cache with seq access when server 
starts?


I think that standard cache preloader is using random access and because 
of that its so slow that we cant use it.


Re: 4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread aaron morton
Dropped messages in ReadRepair is odd. Are you also dropping mutations ? 

There are two tasks performed on the ReadRepair stage. The digests are compared 
on this stage, and secondly the repair happens on the stage. Comparing digests 
is quick. Doing the repair could take a bit longer, all the cf's returned are 
collated, filtered and deletes removed.  

We don't do background Read Repair on range scans, they do have foreground 
digest checking though.

What CL are you using ? 

begin crazy theory:

Could there be a very big row that is out of sync ? The increased RR 
would be resulting in mutations been sent back to the replicas. Which would 
give you a hot spot in mutations.

Check max compacted row size on the hot nodes. 

Turn the logging up to DEBUG on the hot machines for 
o.a.c.service.RowRepairResolver and look for the resolve:… message it has the 
time taken.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2011, at 7:52 PM, Jeremy Hanna wrote:

 
 On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote:
 
 We've been having issues where as soon as we start doing heavy writes (via 
 hadoop) recently, it really hammers 4 nodes out of 20.  We're using random 
 partitioner and we've set the initial tokens for our 20 nodes according to 
 the general spacing formula, except for a few token offsets as we've 
 replaced dead nodes.
 
 Is the hadoop job iterating over keys in the cluster in token order
 perhaps, and you're generating writes to those keys? That would
 explain a moving hotspot along the cluster.
 
 Yes - we're iterating over all the keys of particular column families, doing 
 joins using pig as we enrich and perform measure calculations.  When we 
 write, we're usually writing out for a certain small subset of keys which 
 shouldn't have hotspots with RandomPartitioner afaict.
 
 
 -- 
 / Peter Schuller (@scode on twitter)
 



Re: Completely removing a node from the cluster

2011-08-23 Thread aaron morton
I normally link to the data stax article to avoid having to actually write 
those words :)

http://www.datastax.com/docs/0.8/troubleshooting/index#view-of-ring-differs-between-some-nodes
A
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2011, at 7:45 PM, Jonathan Colby wrote:

 I ran into this.  I also tried log_ring_state=false which also did not help.  
  The way I got through this was to stop the entire cluster and start the 
 nodes one-by-one.   
 
 I realize this is not a practical solution for everyone, but if you can 
 afford to stop the cluster for a few minutes, it's worth a try.
 
 
 On Aug 23, 2011, at 9:26 AM, aaron morton wrote:
 
 I'm running low on ideas for this one. Anyone else ? 
 
 If the phantom node is not listed in the ring, other nodes should not be 
 storing hints for it. You can see what nodes they are storing hints for via 
 JConsole. 
 
 You can try a rolling restart passing the JVM opt 
 -Dcassandra.load_ring_state=false However if the phantom node is been passed 
 around in the gossip state it will probably just come back again. 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 23/08/2011, at 3:49 PM, Bryce Godfrey wrote:
 
 Could this ghost node be causing my hints column family to grow to this 
 size?  I also crash after about 24 hours due to commit logs growth taking 
 up all the drive space.  A manual nodetool flush keeps it under control 
 though.
 
 
  Column Family: HintsColumnFamily
  SSTable count: 6
  Space used (live): 666480352
  Space used (total): 666480352
  Number of Keys (estimate): 768
  Memtable Columns Count: 1043
  Memtable Data Size: 461773
  Memtable Switch Count: 3
  Read Count: 38
  Read Latency: 131.289 ms.
  Write Count: 582108
  Write Latency: 0.019 ms.
  Pending Tasks: 0
  Key cache capacity: 7
  Key cache size: 6
  Key cache hit rate: 0.8334
  Row cache: disabled
  Compacted row minimum size: 2816160
  Compacted row maximum size: 386857368
  Compacted row mean size: 120432714
 
 Is there a way for me to manually remove this dead node?
 
 -Original Message-
 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] 
 Sent: Sunday, August 21, 2011 9:09 PM
 To: user@cassandra.apache.org
 Subject: RE: Completely removing a node from the cluster
 
 It's been at least 4 days now.
 
 -Original Message-
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Sunday, August 21, 2011 3:16 PM
 To: user@cassandra.apache.org
 Subject: Re: Completely removing a node from the cluster
 
 I see the mistake I made about ring, gets the endpoint list from the same 
 place but uses the token's to drive the whole process. 
 
 I'm guessing here, don't have time to check all the code. But there is a 3 
 day timeout in the gossip system. Not sure if it applies in this case. 
 
 Anyone know ?
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote:
 
 Both .2 and .3 list the same from the mbean that Unreachable is empty 
 collection, and Live node lists all 3 nodes still:
 192.168.20.2
 192.168.20.3
 192.168.20.1
 
 The removetoken was done a few days ago, and I believe the remove was done 
 from .2
 
 Here is what ring outlook looks like, not sure why I get that token on the 
 empty first line either:
 Address DC  RackStatus State   Load
 OwnsToken

 85070591730234615865843651857942052864
 192.168.20.2datacenter1 rack1   Up Normal  79.53 GB   
 50.00%  0
 192.168.20.3datacenter1 rack1   Up Normal  42.63 GB   
 50.00%  85070591730234615865843651857942052864
 
 Yes, both nodes show the same thing when doing a describe cluster, that .1 
 is unreachable.
 
 
 -Original Message-
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Sunday, August 21, 2011 4:23 AM
 To: user@cassandra.apache.org
 Subject: Re: Completely removing a node from the cluster
 
 Unreachable nodes in either did not respond to the message or were known 
 to be down and were not sent a message. 
 The way the node lists are obtained for the ring command and describe 
 cluster are the same. So it's a bit odd. 
 
 Can you connect to JMX and have a look at the o.a.c.db.StorageService 
 MBean ? What do the LiveNode and UnrechableNodes attributes say ? 
 
 Also how long ago did you remove the token and on which machine? Do both 
 20.2 and 20.3 think 20.1 is still around ? 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance 

multi-node cassandra config doubt

2011-08-23 Thread Thamizh
Hi All,

This is regarding multi-node cluster configuration doubt.

I have configured 3 nodes of cluster using Cassandra-0.8.4 and getting error 
when I ran Map/Reduce job which uploads records from HDFS to Cassandra.

Here are my 3 nodes cluster config file (cassandra.yaml) for Cassandra:

node01:
    seeds: node01,node02,node03
    auto_bootstrap: false
    listen_address: 192.168.0.1
    rpc_address: 192.168.0.1


node02:

seeds: node01,node02,node03
auto_bootstrap: true
listen_address: 192.168.0.2
rpc_address: 192.168.0.2


node03:
seeds: node01,node02,node03
auto_bootstrap: true
listen_address: 192.168.0.3
rpc_address: 192.168.0.3

When I ran M/R program, I am getting below error
11/08/23 04:37:00 INFO mapred.JobClient:  map 100% reduce 11%
11/08/23 04:37:06 INFO mapred.JobClient:  map 100% reduce 22%
11/08/23 04:37:09 INFO mapred.JobClient:  map 100% reduce 33%
11/08/23 04:37:14 INFO mapred.JobClient: Task Id : 
attempt_201104211044_0719_r_00_0, Status : FAILED
java.lang.NullPointerException
    at org.apache.cassandra.client.RingCache.getRange(RingCache.java:130)
    at 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:125)
    at 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:60)
    at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:90)
    at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:1)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:563)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)


Is anything wrong on my cassandra.yaml file?

I followed http://wiki.apache.org/cassandra/MultinodeCluster for cluster 
configuration.

Regards,
Thamizhannal

Re: nodetool repair caused high disk space usage

2011-08-23 Thread Héctor Izquierdo Seliva
El sáb, 20-08-2011 a las 01:22 +0200, Peter Schuller escribió:
  Is there any chance that the entire file from source node got streamed to
  destination node even though only small amount of data in hte file from
  source node is supposed to be streamed destination node?
 
 Yes, but the thing that's annoying me is that even if so - you should
 not be seeing a 40 gb - hundreds of gig increase even if all
 neighbors sent all their data.
 

I'm having the very same issue. Trying to repair a node with 90GB of
data fills up the 1.5TB drive, and it's still trying to send more data.
This is on 0.8.1.




Re: 4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread Jeremy Hanna

On Aug 23, 2011, at 3:43 AM, aaron morton wrote:

 Dropped messages in ReadRepair is odd. Are you also dropping mutations ? 
 
 There are two tasks performed on the ReadRepair stage. The digests are 
 compared on this stage, and secondly the repair happens on the stage. 
 Comparing digests is quick. Doing the repair could take a bit longer, all the 
 cf's returned are collated, filtered and deletes removed.  
 
 We don't do background Read Repair on range scans, they do have foreground 
 digest checking though.
 
 What CL are you using ? 

CL.ONE for hadoop writes, CL.QUORUM for hadoop reads

 
 begin crazy theory:
 
   Could there be a very big row that is out of sync ? The increased RR 
 would be resulting in mutations been sent back to the replicas. Which would 
 give you a hot spot in mutations.
   
   Check max compacted row size on the hot nodes. 
   
   Turn the logging up to DEBUG on the hot machines for 
 o.a.c.service.RowRepairResolver and look for the resolve:… message it has 
 the time taken.

The max compacted size didn't seem unreasonable - about a MB.  I turned up 
logging to DEBUG for that class and I get plenty of dropped READ_REPAIR 
messages, but nothing coming out of DEBUG in the logs to indicate the time 
taken that I can see.

 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 23/08/2011, at 7:52 PM, Jeremy Hanna wrote:
 
 
 On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote:
 
 We've been having issues where as soon as we start doing heavy writes (via 
 hadoop) recently, it really hammers 4 nodes out of 20.  We're using random 
 partitioner and we've set the initial tokens for our 20 nodes according to 
 the general spacing formula, except for a few token offsets as we've 
 replaced dead nodes.
 
 Is the hadoop job iterating over keys in the cluster in token order
 perhaps, and you're generating writes to those keys? That would
 explain a moving hotspot along the cluster.
 
 Yes - we're iterating over all the keys of particular column families, doing 
 joins using pig as we enrich and perform measure calculations.  When we 
 write, we're usually writing out for a certain small subset of keys which 
 shouldn't have hotspots with RandomPartitioner afaict.
 
 
 -- 
 / Peter Schuller (@scode on twitter)
 
 



Re: how to know if nodetool cleanup is safe?

2011-08-23 Thread Sam Overton
On 21 August 2011 12:34, Yan Chunlu springri...@gmail.com wrote:

 since nodetool cleanup could remove hinted handoff,  will it cause the
 data loss?


Hi Yan,

Hints are not guaranteed to be delivered and nodetool cleanup is one of
the reasons for that. This will only cause data-loss if you are writing at
CL.ANY where a hint counts as a write. If you are writing at CL.ONE or above
then at least one replica must receive the data for the write to succeed, so
losing hints will not cause data-loss.

If a hint is not delivered then the replica to which it was intended will
become consistent after a read-repair, or after manual anti-entropy repair.

Sam

-- 
Sam Overton
Acunu | http://www.acunu.com | @acunu


Different Load values after stress test runs....

2011-08-23 Thread Chris Marino
Hi, we're running some performance tests against some clusters and I'm
curious about some of the numbers I see.

I'm running the stress test against two identically configured clusters, but
after I run at stress test, I get different Load values across the
clusters?

The difference between the two clusters is that one uses standard EC2
interfaces, but the other runs on a virtual network. Are these differences
indicating something that I should be aware of??

Here is a sample of the kinds of results I'm seeing.

Address DC  RackStatus State   LoadOwns
   Token

 12760588759xxx
10.0.0.17   DC1 RAC1Up Normal  94 MB
25.00%  0
10.0.0.18   DC1 RAC1Up Normal  104.52 MB
25.00%  42535295865xxx
10.0.0.19   DC1 RAC1Up Normal  78.58 MB
 25.00%  85070591730xxx
10.0.0.20   DC1 RAC1Up Normal  78.58 MB
 25.00%  12760588759xxx

Address DC  RackStatus State   LoadOwns
   Token

12760588759xxx
10.120.35.52DC1 RAC1Up Normal  103.74 MB
25.00%  0
10.120.6.124DC1 RAC1Up Normal  118.99 MB
25.00%  42535295865xxx
10.127.90.142   DC1 RAC1Up Normal  104.26 MB
25.00%  85070591730xxx
10.94.69.237DC1 RAC1Up Normal  75.74 MB
 25.00%  12760588759xxx

The first cluster with the vNet (10.0.0.0/28 addresses) consistently show
smaller Load values. The total Load of 355MB vs. 402MB with native EC2
interfaces?? Is a total Load value even meaningful?? The stress test is the
very first thing that's run against the clusters.

[I'm also a little puzzled that these numbers are not uniform within the
clusters, but I suspect that's because the stress test is using a key
distribution that is Gaussian.  I'm not 100% sure of this either since I've
seen conflicting documentation. Haven't tried 'random' keys, but I presume
that would change them to be uniform]

Except for these curious Load numbers, things seem to be running just fine.
Getting good fast results. Over 10 iterations I'm getting more than 10-12K
inserts per sec. (default values for the stress test).

Should I expect the Load to be the same across different clusters?? What
might explain the differences I'm seeing???

Thanks in advance.
CM


Re: run Cassandra tutorial example

2011-08-23 Thread Alvin UW
Thanks.

By your solution, the problem is fixed.
But my output is like this.
I don't understand what's the meaning of  Error stacktraces are turned on.

And I hope only the results are outputted, not the INFO.

 mvn -e exec:java -Dexec.args=get
-Dexec.mainClass=com.datastax.tutorial.TutorialRunner
+ Error stacktraces are turned on.
[INFO] Scanning for projects...
[INFO]

[INFO] Building cassandra-tutorial
[INFO]task-segment: [exec:java]
[INFO]

[INFO] Preparing exec:java
[INFO] No goals needed for project - skipping
[INFO] [exec:java {execution: default-cli}]
HColumn(city=Austin)
[INFO]

[INFO] BUILD SUCCESSFUL
[INFO]

[INFO] Total time: 1 second
[INFO] Finished at: Tue Aug 23 12:02:27 EDT 2011
[INFO] Final Memory: 11M/100M


2011/8/23 aaron morton aa...@thelastpickle.com

 Did you sort this out ? The #cassandra IRC room is a good place to get help
 as well.

 I tried to build it first using
 mvn compile and got this different error
 [ERROR] Failed to execute goal on project cassandra-tutorial: Could not
 resolve dependencies for project
 com.datastax.tutorial:cassandra-tutorial:jar:1.0-SNAPSHOT: Could not find
 artifact me.prettyprint:hector-core:jar:0.8.0-2-SNAPSHOT - [Help 1]

 There is a fix here...
 https://github.com/zznate/cassandra-tutorial/pull/1

 After than I did mvn compile and it worked.

 So added the schema…
 path-to-cassandra/bin/cassandra-cli --host localhost 
 ~/code/github/datastax/cassandra-tutorial/npanxx_script.txt

 And ran
 mvn -e exec:java -Dexec.args=get
 -Dexec.mainClass=com.datastax.tutorial.TutorialRunner

 Which outputted
 HColumn(city=Austin)

 Hope that helps.

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23/08/2011, at 10:30 AM, Alvin UW wrote:

 Hello,

 I'd like to try the cassandra tutorial example:
 https://github.com/zznate/cassandra-tutorial
  by following the readme.

 After typing  mvn -e exec:java -Dexec.args=get
 -Dexec.mainClass=com.datastax.tutorial.TutorialRunner
 I got the following errors.
 Should I do something before the above command?
 Thanks.

 + Error stacktraces are turned on.
 [INFO] Scanning for projects...
 [INFO]
 
 [INFO] Building cassandra-tutorial
 [INFO]task-segment: [exec:java]
 [INFO]
 
 [INFO] Preparing exec:java
 [INFO] No goals needed for project - skipping
 [INFO] [exec:java {execution: default-cli}]
 [INFO]
 
 [ERROR] BUILD ERROR
 [INFO]
 
 [INFO] An exception occured while executing the Java class.
 com.datastax.tutorial.TutorialRunner

 [INFO]
 
 [INFO] Trace
 org.apache.maven.lifecycle.LifecycleExecutionException: An exception
 occured while executing the Java class. com.datastax.tutorial.TutorialRunner
 at
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:719)
 at
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:569)
 at
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:539)
 at
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387)
 at
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:348)
 at
 org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180)
 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328)
 at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138)
 at org.apache.maven.cli.MavenCli.main(MavenCli.java:362)
 at
 org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)
 at org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
 at
 org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)
 at org.codehaus.classworlds.Launcher.main(Launcher.java:375)
 Caused by: org.apache.maven.plugin.MojoExecutionException: An exception
 occured while 

Re: Different Load values after stress test runs....

2011-08-23 Thread Philippe
Have you run repair on the nodes ? Maybe some data was lost and not repaired
yet ?

Philippe

2011/8/23 Chris Marino ch...@vcider.com

 Hi, we're running some performance tests against some clusters and I'm
 curious about some of the numbers I see.

 I'm running the stress test against two identically configured clusters,
 but after I run at stress test, I get different Load values across the
 clusters?

 The difference between the two clusters is that one uses standard EC2
 interfaces, but the other runs on a virtual network. Are these differences
 indicating something that I should be aware of??

 Here is a sample of the kinds of results I'm seeing.

 Address DC  RackStatus State   LoadOwns
Token

  12760588759xxx
 10.0.0.17   DC1 RAC1Up Normal  94 MB
 25.00%  0
 10.0.0.18   DC1 RAC1Up Normal  104.52 MB
 25.00%  42535295865xxx
 10.0.0.19   DC1 RAC1Up Normal  78.58 MB
  25.00%  85070591730xxx
 10.0.0.20   DC1 RAC1Up Normal  78.58 MB
  25.00%  12760588759xxx

 Address DC  RackStatus State   LoadOwns
Token

 12760588759xxx
 10.120.35.52DC1 RAC1Up Normal  103.74 MB
 25.00%  0
 10.120.6.124DC1 RAC1Up Normal  118.99 MB
 25.00%  42535295865xxx
 10.127.90.142   DC1 RAC1Up Normal  104.26 MB
 25.00%  85070591730xxx
 10.94.69.237DC1 RAC1Up Normal  75.74 MB
  25.00%  12760588759xxx

 The first cluster with the vNet (10.0.0.0/28 addresses) consistently show
 smaller Load values. The total Load of 355MB vs. 402MB with native EC2
 interfaces?? Is a total Load value even meaningful?? The stress test is the
 very first thing that's run against the clusters.

 [I'm also a little puzzled that these numbers are not uniform within the
 clusters, but I suspect that's because the stress test is using a key
 distribution that is Gaussian.  I'm not 100% sure of this either since I've
 seen conflicting documentation. Haven't tried 'random' keys, but I presume
 that would change them to be uniform]

 Except for these curious Load numbers, things seem to be running just fine.
 Getting good fast results. Over 10 iterations I'm getting more than 10-12K
 inserts per sec. (default values for the stress test).

 Should I expect the Load to be the same across different clusters?? What
 might explain the differences I'm seeing???

 Thanks in advance.
 CM



Customized Secondary Index Schema

2011-08-23 Thread Alvin UW
Hello,

As mentioned by Ed Anuff in his blog and slides, one way to build customized
secondary index is:
We use one CF, each row to represent a secondary index, with the secondary
index name as row key.
For example,

Indexes = {
User_Keys_By_Last_Name : {
adams : e5d61f2b-…,
alden : e80a17ba-…,
anderson : e5d61f2b-…,
davis : e719962b-…,
doe : e78ece0f-…,
franks : e66afd40-…,
… : …,
}
}

But the whole secondary index is partitioned into a single node, because of
the row key.
All the queries against this secondary index will go to this node. Of
course, there are some replica nodes.

Do you think this is a scalability problem, or any better solution to solve
it?
Thanks.


Re: how to know if nodetool cleanup is safe?

2011-08-23 Thread Edward Capriolo
On Tue, Aug 23, 2011 at 11:56 AM, Sam Overton sover...@acunu.com wrote:

 On 21 August 2011 12:34, Yan Chunlu springri...@gmail.com wrote:

 since nodetool cleanup could remove hinted handoff,  will it cause the
 data loss?


 Hi Yan,

 Hints are not guaranteed to be delivered and nodetool cleanup is one of
 the reasons for that. This will only cause data-loss if you are writing at
 CL.ANY where a hint counts as a write. If you are writing at CL.ONE or above
 then at least one replica must receive the data for the write to succeed, so
 losing hints will not cause data-loss.

 If a hint is not delivered then the replica to which it was intended will
 become consistent after a read-repair, or after manual anti-entropy repair.

 Sam

 --
 Sam Overton
 Acunu | http://www.acunu.com | @acunu


If you run nodetool tpstats on each node in your cluster and ensure none of
them have an active or pending threads in the Hinted stage no hints are
currently being delivered. But as pointed out above Hinted Handoff is a best
effort system.


Re: Completely removing a node from the cluster

2011-08-23 Thread Brandon Williams
On Tue, Aug 23, 2011 at 2:26 AM, aaron morton aa...@thelastpickle.com wrote:
 I'm running low on ideas for this one. Anyone else ?

 If the phantom node is not listed in the ring, other nodes should not be 
 storing hints for it. You can see what nodes they are storing hints for via 
 JConsole.

I think I found it in https://issues.apache.org/jira/browse/CASSANDRA-3071

--Brandon


Re: How can I patch a single issue

2011-08-23 Thread Yi Yang
Thanks Jonathan, and thanks Peter.

How do u guys use the mail list? I'm using a mail client and this e-mail didn't 
group up until i found it today...

On Aug 19, 2011, at 12:27 PM, Jonathan Ellis wrote:

 I think this is what you want:
 https://github.com/stuhood/cassandra/tree/file-format-and-promotion
 
 On Fri, Aug 19, 2011 at 1:28 PM, Peter Schuller
 peter.schul...@infidyne.com wrote:
 https://issues.apache.org/jira/browse/CASSANDRA-674
 But when I downloaded the patch file I can't find the correct trunk to
 patch...
 
 Check it out from git (or svn) and apply to trunk. I'm not sure
 whether it still applies cleanly; given the size of the patch I
 wouldn't be surprised if some rebasing is necessary. You might try a
 trunk from further back in time (around the time Stu submitted the
 patch).
 
 I'm not quite sure what you're actual problem is though, if it's
 source code access then the easiest route is probably to check it out
 from https://github.com/apache/cassandra
 
 --
 / Peter Schuller (@scode on twitter)
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Solandra error - spaces in search

2011-08-23 Thread Ashley Martens
We are getting an error in our Solandra search when the search string
contains a space. Is anyone else seeing this?


*Net::HTTPFatalError*: 500 null java.

lang.ArrayIndexOutOfBoundsException null
java.lang.ArrayIndexOutOfBoundsException request:
http://10.103.1.70:8983/solandra/users~27/selectorg.apache.solr.common.SolrException:
null
java.lang.ArrayIndexOutOfBoundsException null
java.lang.ArrayIndexOutOfBoundsException request:
http://10.103.1.70:8983/solandra/users~27/select \tat
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
\tat
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
\tat
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
\tat
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) \tat
java.util.concurrent.FutureTask.run(FutureTask.java:138) \tat
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) \tat
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) \tat
java.util.concurrent.FutureTask.run(FutureTask.java:138) \tat
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
\tat java.lang.Thread.run(Thread.java:662) 


RE: Completely removing a node from the cluster

2011-08-23 Thread Bryce Godfrey
Taking the cluster down completely did remove the phantom node.  The 
hintscolumnfamily is causing a lot of commit logs to back up and threaten the 
commit log drive to run out of space.  A manual flush of that column family 
always clears out the files though.


-Original Message-
From: Brandon Williams [mailto:dri...@gmail.com] 
Sent: Tuesday, August 23, 2011 10:42 AM
To: user@cassandra.apache.org
Subject: Re: Completely removing a node from the cluster

On Tue, Aug 23, 2011 at 2:26 AM, aaron morton aa...@thelastpickle.com wrote:
 I'm running low on ideas for this one. Anyone else ?

 If the phantom node is not listed in the ring, other nodes should not be 
 storing hints for it. You can see what nodes they are storing hints for via 
 JConsole.

I think I found it in https://issues.apache.org/jira/browse/CASSANDRA-3071

--Brandon


checksumming

2011-08-23 Thread Bill Hastings
Are checksum errors detected in Cassandra and if so how are they resolved?


cassandra unexpected shutdown

2011-08-23 Thread Ernst D Schoen-René

Hi,
  I'm running a 16-node cassandra cluster, with a reasonably large 
amount of data per node (~1TB).  Nodes have 16G ram, but heap is set to 8G.


The nodes keep stopping with this output in the log.  Any ideas?

ERROR [Thread-85] 2011-08-23 21:00:38,723 AbstractCassandraDaemon.java 
(line 113) Fatal exception in thread Thread[Thread-85,5,main]

java.lang.OutOfMemoryError: Java heap space
ERROR [ReadStage:568] 2011-08-23 21:00:38,723 
AbstractCassandraDaemon.java (line 113) Fatal exception in thread 
Thread[ReadStage:568,5,main]

java.lang.OutOfMemoryError: Java heap space
 INFO [HintedHandoff:1] 2011-08-23 21:00:38,720 
HintedHandOffManager.java (line 320) Started hinted handoff for endpoint 
/10.28.0.184
 INFO [GossipStage:2] 2011-08-23 21:00:50,751 Gossiper.java (line 606) 
InetAddress /10.29.20.67 is now UP
ERROR [Thread-34] 2011-08-23 21:00:50,525 AbstractCassandraDaemon.java 
(line 113) Fatal exception in thread Thread[Thread-34,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has 
shut down
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
ERROR [Thread-36] 2011-08-23 21:00:50,518 AbstractCassandraDaemon.java 
(line 113) Fatal exception in thread Thread[Thread-36,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has 
shut down
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
 INFO [GossipTasks:1] 2011-08-23 21:00:50,466 Gossiper.java (line 620) 
InetAddress /10.29.20.67 is now dead.
 INFO [HintedHandoff:1] 2011-08-23 21:00:50,751 
HintedHandOffManager.java (line 376) Finished hinted handoff of 0 rows 
to endpoint /10.28.0.184
ERROR [Thread-33] 2011-08-23 21:01:05,048 AbstractCassandraDaemon.java 
(line 113) Fatal exception in thread Thread[Thread-33,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has 
shut down
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
ERROR [Thread-128] 2011-08-23 21:01:05,048 AbstractCassandraDaemon.java 
(line 113) Fatal exception in thread Thread[Thread-128,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has 
shut down
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)

root@cass1:~#



Re: Could Not connect to cassandra-cli on windows

2011-08-23 Thread Alaa Zubaidi

Hi Aaron,
We are using Thrift 5..
TSocket _tr = new TSocket(server.Host, 
server.Port);//localhost, 9160);

_transport = new TFramedTransport(_tr);
_protocol = new TBinaryProtocol(_transport);
_client = new Cassandra.Client(_protocol);

Do you have any clue on what could cause the first exception?

Thanks and Regards.
Alaa

On 8/18/2011 3:59 AM, aaron morton wrote:

IIRC cassandra 0.7 needs thrift 0.5, are you using that version ?

Perhaps try grabbing the cassandra 0.7 version for one of the pre built clients 
(pycassa, hector etc) to check things work and then check you are using the 
same thrift version.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 18/08/2011, at 4:03 PM, Alaa Zubaidi wrote:


Hi Aaron,
Thanks for the reply.
I am running 0.7.4 and NO client.
The error was reported by the application where it fails to connect and it 
happens that 2 threads are trying to connect at the same time. and when I 
checked the cassandra log I found these errors??

Thanks
Alaa

On 8/17/2011 4:29 PM, aaron morton wrote:

What client, what version, what version of cassandra are you using ?

Looks like you are connecting with an old version of thrift, like the message 
says. Check the client you are using was made for cassandra 0.8.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 18/08/2011, at 7:27 AM, Alaa Zubaidi wrote:


Hi,

I se this error while the application tries to connect to cassandra at the same 
time from 2 different threads: any clues:

ERROR [pool-1-thread-13] 2011-07-29 06:46:45,718 CustomTThreadPoolServer.java 
(line 222) Error occurred during processing of message.
java.lang.StringIndexOutOfBoundsException: String index out of range: 
-2147418111
at java.lang.String.checkBounds(String.java:397)
at java.lang.String.init(String.java:442)
at 
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:210)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2543)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
ERROR [pool-1-thread-11] 2011-07-29 06:53:21,921 CustomTThreadPoolServer.java 
(line 218) Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in 
readMessageBegin, old client?
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:213)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2543)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Thanks,
Alaa




--
Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 700
San Jose, CA 95110  USA
Tel: 408-283-5639 (or 408-280-7900 x5639)
fax: 408-938-6479
email: alaa.zuba...@pdf.com







--
Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 700
San Jose, CA 95110  USA
Tel: 408-283-5639 (or 408-280-7900 x5639)
fax: 408-938-6479
email: alaa.zuba...@pdf.com




Re: cassandra unexpected shutdown

2011-08-23 Thread Ernst D Schoen-René

Thanks,
We had already been running cassandra with a larger heap size, but 
it meant that java took way too long between garbage collections.  The 
advice I'd found was to set the heap size at the 8 we're running at.  It 
was ok for a while, but now some nodes crash.  It's definitely our 
experience that adding more memory per node actually makes things worse 
eventually, as java starts eating up too many resources for it to handle.



On 8/23/11 5:28 PM, Adi wrote:

2011/8/23 Ernst D Schoen-Renéer...@peoplebrowsr.com:

Hi,
  I'm running a 16-node cassandra cluster, with a reasonably large amount of
data per node (~1TB).  Nodes have 16G ram, but heap is set to 8G.

The nodes keep stopping with this output in the log.  Any ideas?

ERROR [Thread-85] 2011-08-23 21:00:38,723 AbstractCassandraDaemon.java (line
113) Fatal exception in thread Thread[Thread-85,5,main]
java.lang.OutOfMemoryError: Java heap space
ERROR [ReadStage:568] 2011-08-23 21:00:38,723 AbstractCassandraDaemon.java
(line 113) Fatal exception in thread Thread[ReadStage:568,5,main]
java.lang.OutOfMemoryError: Java heap space
  INFO [HintedHandoff:1] 2011-08-23 21:00:38,720 HintedHandOffManager.java
(line 320) Started hinted handoff for endpoint /10.28.0.184
  INFO [GossipStage:2] 2011-08-23 21:00:50,751 Gossiper.java (line 606)
InetAddress /10.29.20.67 is now UP
ERROR [Thread-34] 2011-08-23 21:00:50,525 AbstractCassandraDaemon.java (line
113) Fatal exception in thread Thread[Thread-34,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
ERROR [Thread-36] 2011-08-23 21:00:50,518 AbstractCassandraDaemon.java (line
113) Fatal exception in thread Thread[Thread-36,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
  INFO [GossipTasks:1] 2011-08-23 21:00:50,466 Gossiper.java (line 620)
InetAddress /10.29.20.67 is now dead.
  INFO [HintedHandoff:1] 2011-08-23 21:00:50,751 HintedHandOffManager.java
(line 376) Finished hinted handoff of 0 rows to endpoint /10.28.0.184
ERROR [Thread-33] 2011-08-23 21:01:05,048 AbstractCassandraDaemon.java (line
113) Fatal exception in thread Thread[Thread-33,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
ERROR [Thread-128] 2011-08-23 21:01:05,048 AbstractCassandraDaemon.java
(line 113) Fatal exception in thread Thread[Thread-128,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
root@cass1:~#




You can try the  cargo cult solution of upping the heap to 12GB and
see if the nodes stabilize. We have a 4-node cluster with 2-3 TB data
per node and that was the heap at which it the nodes were managing to
serve requests without running out of memory. Ultimately we ordered
more memory and are running it with 24 GB heap and the cluster has
been stable without complains.
Other things you can do for reducing memory usage if they are
appropriate for your read/write profile:
a) reduce memtable throughput(most reduction in mem footprint)
b) disable row caching
c) reduce/disable key caching(least reduction)
Ultimately you will have to tune 

Re: How can I patch a single issue

2011-08-23 Thread Yi Yang
@Jonathan:
I patched CASSANDRA 2530 on this version, and tested it for our financial 
related case.   It really improved a lot on disk consumption, using only 20% of 
original space for financing-related data storage.   The performance is better 
than MySQL and also it consumes only 1x more than MySQL, much better than 
previous versions.

On Aug 19, 2011, at 12:27 PM, Jonathan Ellis wrote:

 I think this is what you want:
 https://github.com/stuhood/cassandra/tree/file-format-and-promotion
 
 On Fri, Aug 19, 2011 at 1:28 PM, Peter Schuller
 peter.schul...@infidyne.com wrote:
 https://issues.apache.org/jira/browse/CASSANDRA-674
 But when I downloaded the patch file I can't find the correct trunk to
 patch...
 
 Check it out from git (or svn) and apply to trunk. I'm not sure
 whether it still applies cleanly; given the size of the patch I
 wouldn't be surprised if some rebasing is necessary. You might try a
 trunk from further back in time (around the time Stu submitted the
 patch).
 
 I'm not quite sure what you're actual problem is though, if it's
 source code access then the easiest route is probably to check it out
 from https://github.com/apache/cassandra
 
 --
 / Peter Schuller (@scode on twitter)
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Solandra error - spaces in search

2011-08-23 Thread Ashley Martens
 INFO [769787724@qtp-311722089-9825] 2011-08-23 22:07:53,750 SolrCore.java
(line 1370) [users] webapp=/solandra path=/select
params={fl=*,scorestart=0q=+(+(first_name:hatice^1.2)+(first_name:hatice~0.9^1.0)++)+AND+(+(last_name:ali^3.0)+(last_name:ali~0.9^2.1)++)+wt=rubyqt=standardrows=1}
status=500 QTime=465
ERROR [1745740420@qtp-311722089-9795] 2011-08-23 22:07:53,875
SolrException.java (line 151) java.lang.ArrayIndexOutOfBoundsException: 4
at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:310)
at
org.apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:116)
at
org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:179)
at
org.apache.lucene.search.DisjunctionSumScorer.advance(DisjunctionSumScorer.java:229)
at
org.apache.lucene.search.BooleanScorer2.advance(BooleanScorer2.java:320)
at
org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:99)
at
org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:72)
at
org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:33)
at
org.apache.lucene.search.BooleanScorer2$2.init(BooleanScorer2.java:173)
at
org.apache.lucene.search.BooleanScorer2.countingConjunctionSumScorer(BooleanScorer2.java:173)
at
org.apache.lucene.search.BooleanScorer2.makeCountingSumScorerSomeReq(BooleanScorer2.java:234)
at
org.apache.lucene.search.BooleanScorer2.makeCountingSumScorer(BooleanScorer2.java:211)
at
org.apache.lucene.search.BooleanScorer2.init(BooleanScorer2.java:101)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:328)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:527)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:323)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1177)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1065)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:171)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:137)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


Single node cassandra

2011-08-23 Thread Derek Andree
I'm looking for advice for running cassandra 8.+ on a single node.   Would love 
to hear stories about how much RAM you succeeded with, etc.

Currently we are running with a 4GB heap size.  Hardware is 4 cores and 8GB 
physical memory.  We're not opposed to going to 16GB of memory or even 32GB.  
We have fast 6GBps SAS drives, although currently do not have a dedicated drive 
for Commit Log, but we are thinking about trying this out.

Our application has time series data which means our data will get larger over 
time, not in all CFs though.  We have 7 hot CFs but they are probably not 
that hot to someone running a sizable cluster.  We have 15 CFs total in our 
keyspace.

I've already tweaked key cache and in some cases added a large enough row cache 
to keep an entire CF in memory where appropriate.

Basically, I'm just curious under this sort of load if we can expect decent 
results with a single node as the data gets larger.  I know this isn't 
necessarily a binary answer, just curious about people's experiences on a 
single node.


Thanks,
-Derek


cfstats:



Keyspace: test
Read Count: 2869777
Read Latency: 1.2266526890416922 ms.
Write Count: 7672036
Write Latency: 0.016663120975970395 ms.
Pending Tasks: 0
Column Family: DocData
SSTable count: 1
Space used (live): 425090
Space used (total): 425090
Number of Keys (estimate): 128
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 162
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 1000
Key cache size: 27
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 771
Compacted row maximum size: 315852
Compacted row mean size: 18290

Column Family: User
SSTable count: 1
Space used (live): 5368
Space used (total): 5368
Number of Keys (estimate): 128
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 1000
Key cache size: 0
Key cache hit rate: NaN
Row cache capacity: 1000
Row cache size: 0
Row cache hit rate: NaN
Compacted row minimum size: 447
Compacted row maximum size: 535
Compacted row mean size: 535

Column Family: SMapping
SSTable count: 1
Space used (live): 536900
Space used (total): 536900
Number of Keys (estimate): 768
Memtable Columns Count: 9528
Memtable Data Size: 4646940
Memtable Switch Count: 0
Read Count: 763
Read Latency: NaN ms.
Write Count: 763
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 1000
Key cache size: 763
Key cache hit rate: NaN
Row cache capacity: 1000
Row cache size: 763
Row cache hit rate: NaN
Compacted row minimum size: 447
Compacted row maximum size: 73457
Compacted row mean size: 713

Column Family: LObject
SSTable count: 2
Space used (live): 6945430
Space used (total): 6945430
Number of Keys (estimate): 10752
Memtable Columns Count: 729638
Memtable Data Size: 400799230
Memtable Switch Count: 2
Read Count: 316756
Read Latency: NaN ms.
Write Count: 147028
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 2
Key cache size: 2
Key cache hit rate: NaN
Row cache capacity: 2
Row cache size: 5304
Row cache hit rate: NaN
Compacted row minimum size: 259
Compacted row maximum size: 182785
Compacted row mean size: 661

Column Family: SMeta
SSTable count: 2
Space used (live): 173059487
Space used (total): 343520951
Number of Keys (estimate): 165760
Memtable Columns 

Re: Solandra error - spaces in search

2011-08-23 Thread Jake Luciani
Thx for the info I'll try to reproduce

On Aug 23, 2011, at 9:28 PM, Ashley Martens amart...@ngmoco.com wrote:

  INFO [769787724@qtp-311722089-9825] 2011-08-23 22:07:53,750 SolrCore.java 
 (line 1370) [users] webapp=/solandra path=/select 
 params={fl=*,scorestart=0q=+(+(first_name:hatice^1.2)+(first_name:hatice~0.9^1.0)++)+AND+(+(last_name:ali^3.0)+(last_name:ali~0.9^2.1)++)+wt=rubyqt=standardrows=1}
  status=500 QTime=465 
 ERROR [1745740420@qtp-311722089-9795] 2011-08-23 22:07:53,875 
 SolrException.java (line 151) java.lang.ArrayIndexOutOfBoundsException: 4
 at 
 org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:310)
 at 
 org.apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:116)
 at 
 org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:179)
 at 
 org.apache.lucene.search.DisjunctionSumScorer.advance(DisjunctionSumScorer.java:229)
 at 
 org.apache.lucene.search.BooleanScorer2.advance(BooleanScorer2.java:320)
 at 
 org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:99)
 at 
 org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:72)
 at 
 org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:33)
 at 
 org.apache.lucene.search.BooleanScorer2$2.init(BooleanScorer2.java:173)
 at 
 org.apache.lucene.search.BooleanScorer2.countingConjunctionSumScorer(BooleanScorer2.java:173)
 at 
 org.apache.lucene.search.BooleanScorer2.makeCountingSumScorerSomeReq(BooleanScorer2.java:234)
 at 
 org.apache.lucene.search.BooleanScorer2.makeCountingSumScorer(BooleanScorer2.java:211)
 at 
 org.apache.lucene.search.BooleanScorer2.init(BooleanScorer2.java:101)
 at 
 org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:328)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:527)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:323)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1177)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1065)
 at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358)
 at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at 
 solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:171)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at 
 solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:137)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 


Memory overhead of vector clocks…. how often are they pruned?

2011-08-23 Thread Kevin Burton
I had a thread going the other day about vector clock memory usage and that
it is a series of (clock id, clock):ts and the ability to prune old entries
… I'm specifically curious here how often old entries are pruned.

If you're storing small columns within cassandra.  Say just an integer.  The
vector clock overhead could easily use up far more data than is actually in
your database.

However, if they are pruned, then this shouldn't really be a problem.

How much memory is this wasting?

Thoughts?


Jonathan Ellis jbel...@gmail.com to user
 show details Aug 19 (4 days ago)
 The problem with naive last write wins is that writes don't always
arrive at each replica in the same order.  So no, that's a
non-starter.

Vector clocks are a series of (client id, clock) entries, and usually
a timestamp so you can prune old entries.  Obviously implementations
can vary, but to pick a specific example, Voldemort [1] uses 2 bytes
per client id, a variable number (at least one) of bytes for the
clock, and 8 bytes for the timestamp.

[1]
https://github.com/voldemort/voldemort/blob/master/src/java/voldemort/versioning/VectorClock.java


-- 

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*