Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread shiv shivaji
Thanks for the pointer. Wanted to figure out if this is the real bottleneck as there might be something else contributing to the low speed. Let me explain our setup in more detail: We are using cassandra to store about 700 million images. This includes image metadata and the image (in binary

Re: Questions while evaluating Cassandra

2010-03-05 Thread Eran Kutner
Thank you Jonathan! On Fri, Mar 5, 2010 at 00:03, Jonathan Ellis jbel...@gmail.com wrote: On Thu, Mar 4, 2010 at 2:51 AM, Eran Kutner e...@gigya-inc.com wrote: On Tue, Mar 2, 2010 at 15:44, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Mar 2, 2010 at 6:43 AM, Eran Kutner e...@gigya.com

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread Jonathan Ellis
On Fri, Mar 5, 2010 at 2:13 AM, shiv shivaji shivaji...@yahoo.com wrote: 1. Is there a way to estimate the time it would take to compact this work load? I hope the load balancing will be much faster after the compaction. Curious how fast I can get the transfer once compaction is done. 0.6

Re: ConcurrentModificationException

2010-03-05 Thread B. Todd Burruss
yes, 0.6 beta2 i'll open ticket On Thu, 2010-03-04 at 19:00 -0800, Jonathan Ellis wrote: This is the 0.6 beta yes? Looks like a regression, please open a ticket. On Thu, Mar 4, 2010 at 8:54 PM, Todd Burruss bburr...@real.com wrote: i'm seeing a lot of these ... any idea? 2010-03-04

Re: ConcurrentModificationException

2010-03-05 Thread B. Todd Burruss
https://issues.apache.org/jira/browse/CASSANDRA-853 On Thu, 2010-03-04 at 19:00 -0800, Jonathan Ellis wrote: This is the 0.6 beta yes? Looks like a regression, please open a ticket. On Thu, Mar 4, 2010 at 8:54 PM, Todd Burruss bburr...@real.com wrote: i'm seeing a lot of these ... any

ColumnFamilies vs composite rows in one table.

2010-03-05 Thread Erik Holstad
What are the benefits of using multiple ColumnFamilies compared to using a composite row name? Example: You have messages that you want to index on sent and to. So you can either have ColumnFamilyFrom:userTo:{userFrom-messageid} ColumnFamilyTo:userFrom:{userTo-messageid} or something like

Re: ColumnFamilies vs composite rows in one table.

2010-03-05 Thread David Strauss
On 2010-03-05 18:04, Erik Holstad wrote: What are the benefits of using multiple ColumnFamilies compared to using a composite row name? Just for terminology's sake, I'll note that rows have keys, not names. Only columns and supercolumns have names. I'm not the top expert here by any means, but

Re: ColumnFamilies vs composite rows in one table.

2010-03-05 Thread David Strauss
On 2010-03-05 18:30, David Strauss wrote: On 2010-03-05 18:04, Erik Holstad wrote: So you can either have ColumnFamilyFrom:userTo:{userFrom-messageid} ColumnFamilyTo:userFrom:{userTo-messageid} or something like ColumnFamily:user_to:{user1_messageId, user2_messageId}

Re: ColumnFamilies vs composite rows in one table.

2010-03-05 Thread Jonathan Ellis
Generally, you want to have different types of data in different CFs so you can tune them separately (key / row caches). Mixing different row types in one CF also makes doing get_slice_range scans difficult. On Fri, Mar 5, 2010 at 12:04 PM, Erik Holstad erikhols...@gmail.com wrote: What are the

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread shiv shivaji
Sorry, how to get compaction progress with 0.6. Is it in nodetool or somewhere else? I tried a few options after nodetool and did not get this info. My vmstats are procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobi

Dynamically Switching from Ordered Partitioner to Random?

2010-03-05 Thread shiv shivaji
I started with the ordered partitioner as I was hoping to make use of the map-reduce functionality. However, my data was likely lopped onto 2 key machines with most of it on one (as seen from another thread. There were also machine failures to blame for the uneven distribution). One solution

Re: Dynamically Switching from Ordered Partitioner to Random?

2010-03-05 Thread Chris Goffinet
At this time, you have to re-import the data. -Chris On Fri, Mar 5, 2010 at 11:42 AM, shiv shivaji shivaji...@yahoo.com wrote: I started with the ordered partitioner as I was hoping to make use of the map-reduce functionality. However, my data was likely lopped onto 2 key machines with most

Re: Dynamically Switching from Ordered Partitioner to Random?

2010-03-05 Thread Stu Hood
But rather than switching, you should definitely try the 'loadbalance' approach first, and see whether OrderPP works out for you. -Original Message- From: Chris Goffinet goffi...@digg.com Sent: Friday, March 5, 2010 1:43pm To: cassandra-user@incubator.apache.org Subject: Re: Dynamically

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread Jonathan Ellis
On Fri, Mar 5, 2010 at 1:36 PM, shiv shivaji shivaji...@yahoo.com wrote: Sorry, how to get compaction progress with 0.6. Is it in nodetool or somewhere else? I tried a few options after nodetool and did not get this info. it's under CompactionManager in jmx. I'm not sure if nodetool exposes

Re: Dynamically Switching from Ordered Partitioner to Random?

2010-03-05 Thread shiv shivaji
Point taken. Was thinking of switching in parallel using a 2nd cassandra instance (perhaps on the same set of machines). This way if loadbalancing is too slow, I can try this version. From: Stu Hood stu.h...@rackspace.com To:

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread shiv shivaji
Ah, will look at the jmx console. Thought it was under nodetool. cont...@cl201 ~/swell/cassandra $ iostat -x Linux 2.6.30-gentoo-r4pb (cl201) 03/05/10 _x86_64_(8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 9.660.002.184.980.00 83.18

Re: ConcurrentModificationException

2010-03-05 Thread Jonathan Ellis
Fixed, thanks. On Fri, Mar 5, 2010 at 11:12 AM, B. Todd Burruss bburr...@real.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-853 On Thu, 2010-03-04 at 19:00 -0800, Jonathan Ellis wrote: This is the 0.6 beta yes? Looks like a regression, please open a ticket. On Thu, Mar 4,

Unreliable transport layer

2010-03-05 Thread Ashwin Jayaprakash
Hey guys! I have a simple question. I'm a casual observer, not a real Cassandra user yet. So, excuse my ignorance. I see that the Gossip feature uses UDP. I was curious to know if you guys faced issues with unreliable transports in your production clusters? Like faulty switches, dropped packets

Re: Unreliable transport layer

2010-03-05 Thread Jonathan Ellis
In 0.6 gossip is over TCP. On Fri, Mar 5, 2010 at 6:54 PM, Ashwin Jayaprakash ashwin.jayaprak...@gmail.com wrote: Hey guys! I have a simple question. I'm a casual observer, not a real Cassandra user yet. So, excuse my ignorance. I see that the Gossip feature uses UDP. I was curious to know if

Cassandra hardware - balancing CPU/memory/iops/disk space

2010-03-05 Thread Rosenberry, Eric
I am looking for advice from others that are further along in deploying Cassandra in production environments than we are. I want to know what you are finding your bottlenecks to be. I would feel silly purchasing dual processor quad core 2.93ghz Nehalem machines with 192 gigs of RAM just to