Re: schema design question

2010-03-09 Thread Matteo Caprari
Thanks Jonathan. Correct if I'm wrong: you are suggesting that each time we receive a new row (item, [users]) we do 2 operations: 1) insert (or merge) this row 'as it is' (item, [users]) 2) for each user in [users]: insert (user, [item]) Each incoming item is liked by 100 users, so it would be

Re: schema design question

2010-03-09 Thread Matteo Caprari
On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis jbel...@gmail.com wrote: One quad-core node can handle ~14000 inserts per second so you are in good shape. Well, yeah! instead of 'all users that liked N items'? That's true.  So you'd want to use a custom comparator where first 64 bits is the

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 7:15 AM, Sylvain Lebresne sylv...@yakaz.com wrote:  1) stress.py -t 10 -o read -n 5000 -c 1 -r  2) stress.py -t 10 -o read -n 50 -c 1 -r In the case 1) I get around 200 reads/seconds and that's pretty stable. The disk is spinning like crazy (~25% io_wait), very

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 8:31 AM, Sylvain Lebresne sylv...@yakaz.com wrote: Well, unless I'm mistaking, that's the same in my example as I give in both case to stress.py the option '-c 1' which tells it to retrieve only one column each time even in the case where I have 100 columns by row. Oh.

Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Sylvain Lebresne
Hello, I've done some tests and it seems that somehow to have more rows with few columns is better than to have more rows with fewer columns, at least as long as read performance is concerned. Using stress.py, on a quad core 2.27Ghz with 4Go RAM and the out of the box cassandra configuration, I

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Jesse McConnell
in my experience #2 will work well up to a point where it will trigger a limitation of cassandra (slated to be resolved in .7 \o/) where all of the columns under a given key must be able to fit into memory. For things like index's of data I have opted to shard the keys for really large data sets

Re: schema design question

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 3:53 AM, Matteo Caprari matteo.capr...@gmail.com wrote: Thanks Jonathan. Correct if I'm wrong: you are suggesting that each time we receive a new row (item, [users]) we do 2 operations: 1) insert (or merge) this row 'as it is' (item, [users]) 2) for each user in

another ConcurrentModificationException

2010-03-09 Thread B. Todd Burruss
using cassandra-0.6.0-beta2/ 2010-03-09 09:17:26,827 ERROR [pool-1-thread-675] [Cassandra.java:1166] Internal error processing get java.util.ConcurrentModificationException at java.util.AbstractList $Itr.checkForComodification(AbstractList.java:372) at

Re: Cassandra hardware - balancing CPU/memory/iops/disk space

2010-03-09 Thread B. Todd Burruss
our dataset is too big to fit into cache, so we are hitting disk. not a problem for normal operation, but when a node is restored, hinted handoff, load balanced, or if reads/write simply build up we see a problem. the nodes can't seem to catch up. this seems to be centered around drive seek

Re: another ConcurrentModificationException

2010-03-09 Thread B. Todd Burruss
np, you give me free software, i give you free testing ;) i have some more so i'll just create tix and send them along i just switched to using thunderbird and any new messages i send to the list are being flagged as spam. i have no problems with evolution. anyone have an idea? (i can

new bug tix

2010-03-09 Thread B. Todd Burruss
these are both ConcurrentModificationExceptions https://issues.apache.org/jira/browse/CASSANDRA-864 https://issues.apache.org/jira/browse/CASSANDRA-865 this one is an AssertError https://issues.apache.org/jira/browse/CASSANDRA-866

Re: Cassandra hardware - balancing CPU/memory/iops/disk space

2010-03-09 Thread Jesse McConnell
let us know how the SSD's pan out, I am curious about that as well cheers, jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Tue, Mar 9, 2010 at 12:08, B. Todd Burruss bburr...@real.com wrote: our dataset is too big to fit into cache, so we are hitting disk.  not a problem for normal

ControlPort no longer in storage-conf.xml in 0.6

2010-03-09 Thread Bill Au
I am checking out the 0.6 release since I need the batch_mutate command. I noticed that ControlPort is no longer in storage-conf.xml for 0.6. Is that not used anymore? Or is that not configurable anymore? If it is still used but not configurable, how do I run multiple instances of Cassandra on

Re: schema design question

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 7:30 AM, Matteo Caprari matteo.capr...@gmail.com wrote: On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis jbel...@gmail.com wrote: That's true.  So you'd want to use a custom comparator where first 64 bits is the Long and the rest is the userid, for instance. (Long +

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Brandon Williams
On Tue, Mar 9, 2010 at 1:14 PM, Sylvain Lebresne sylv...@yakaz.com wrote: I've inserted 1000 row of 100 column each (python stress.py -t 2 -n 1000 -c 100 -i 5) If I read, I get the roughly the same number of row whether I read the whole row (python stress.py -t 10 -n 1000 -o read -r -c 100)

cassandra 0.6.0 beta 2 download contains beta 1?

2010-03-09 Thread Omer van der Horst Jansen
The apache-cassandra-0.6.0-beta2-bin.tar.gz download contains both these files in the apache-cassandra-0.6.0-beta2/lib directory: apache-cassandra-0.6.0-beta1.jar apache-cassandra-0.6.0-beta2.jar Given the way the classpath is constructed, it's possible that anyone using this download is

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Brandon Williams
On Tue, Mar 9, 2010 at 2:28 PM, Sylvain Lebresne sylv...@yakaz.com wrote: A row causes a disk seek while columns are contiguous. So if the row isn't in the cache, you're being impaired by the seeks. In general, fatter rows should be more performant than skinny ones. Sure, I understand

IllegalStateException: Queue full

2010-03-09 Thread Todd Burruss
using tip of 0.6 branch with 864.txt patch. i have 4 nodes, one node is overcome with compaction right now. i started with no load then added a tiny bit of load and almost immediately got these errors on the other 3 nodes. 2010-03-09 16:05:43,004 ERROR [RESPONSE-STAGE:982]

Re: IllegalStateException: Queue full

2010-03-09 Thread Jonathan Ellis
v2 of patch attached to #864 (replaces old one) On Tue, Mar 9, 2010 at 6:08 PM, Todd Burruss bburr...@real.com wrote: using tip of 0.6 branch with 864.txt patch.  i have 4 nodes, one node is overcome with compaction right now.  i started with no load then added a tiny bit of load and almost

Re: Hackathon?!?

2010-03-09 Thread Dan Di Spaltro
Alright guys, we have settled on a date for the Cassandra meetup on... April 15th, better known as, Tax day! We can host it here at Cloudkick, unless a cooler startup wants to host it. http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=19

Re: Hackathon?!?

2010-03-09 Thread Jonathan Ellis
I can make it. \o/ On Tue, Mar 9, 2010 at 8:05 PM, Dan Di Spaltro dan.dispal...@gmail.com wrote: Alright guys, we have settled on a date for the Cassandra meetup on... April 15th, better known as, Tax day! We can host it here at Cloudkick, unless a cooler startup wants to host it.

Re: Hackathon?!?

2010-03-09 Thread Jeff Hodges
I'm down. -- Jeff On Tue, Mar 9, 2010 at 6:18 PM, Jonathan Ellis jbel...@gmail.com wrote: I can make it. \o/ On Tue, Mar 9, 2010 at 8:05 PM, Dan Di Spaltro dan.dispal...@gmail.com wrote: Alright guys, we have settled on a date for the Cassandra meetup on... April 15th, better known as, Tax

Re: Hackathon?!?

2010-03-09 Thread Stu Hood
Definitely on board! -Original Message- From: Dan Di Spaltro dan.dispal...@gmail.com Sent: Tuesday, March 9, 2010 8:05pm To: cassandra-user@incubator.apache.org Subject: Re: Hackathon?!? Alright guys, we have settled on a date for the Cassandra meetup on... April 15th, better known as,

Re: atomicity across keys and secondary index support

2010-03-09 Thread Patricio Echagüe
Hey Jonathan, has there been any update on this feature? Thanks a lot Patricio On Thu, Dec 3, 2009 at 2:35 PM, Jonathan Ellis jbel...@gmail.com wrote: that is still very firmly in the category of future work. 2009/12/3 Patricio Echagüe patric...@gmail.com: Hi all, I was reading the

Re: atomicity across keys and secondary index support

2010-03-09 Thread Jonathan Ellis
Atomicity: no. 2ary indexes: CASSANDRA-749 is targeting the 0.8 release 2010/3/9 Patricio Echagüe patric...@gmail.com: Hey Jonathan, has there been any update on this feature? Thanks a lot Patricio On Thu, Dec 3, 2009 at 2:35 PM, Jonathan Ellis jbel...@gmail.com wrote: that is still very

Re: Hackathon?!?

2010-03-09 Thread Dan Di Spaltro
Great, that would probably get us a lot more room. Sweet, so its settled, we'll do it at Digg WHQ! On Tue, Mar 9, 2010 at 9:13 PM, Chris Goffinet goffi...@digg.com wrote: +1 from Digg if you wanna have it at our place as well, got the OK from the boss. -Chris On Mar 9, 2010, at 6:05 PM,

Re: Hackathon?!?

2010-03-09 Thread Ryan King
I'm already committed to talking about cassandra that day at our company's developer conference (chirp.twitter.com). -ryan On Tue, Mar 9, 2010 at 6:26 PM, Jeff Hodges jhod...@twitter.com wrote: I'm down. -- Jeff On Tue, Mar 9, 2010 at 6:18 PM, Jonathan Ellis jbel...@gmail.com wrote: I can

Re: Hackathon?!?

2010-03-09 Thread Jeff Hodges
Ah, hell. Thought this was the first day. Can't make it. -- Jeff On Mar 9, 2010 9:32 PM, Ryan King r...@twitter.com wrote: I'm already committed to talking about cassandra that day at our company's developer conference (chirp.twitter.com). -ryan On Tue, Mar 9, 2010 at 6:26 PM, Jeff Hodges