Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread shiv shivaji
Thanks for the pointer. Wanted to figure out if this is the real bottleneck as 
there might be something else contributing to the low speed.
Let me explain our setup in more detail:

We are using cassandra to store about 700 million images. This includes image 
metadata and the image (in binary format).
In the data folder, I see about 53,831 files in the data folder. There are also 
about 3,940 files which have names like
imagestore-b-66171-Data.db
imagestore-b-66171-Filter.db

I assume the latter are the compacted files. This set of files is indeed 
growing. 

The problem is that one node is our cluster is 4.5TB full and we are trying to 
load balance away from that node.
I moved this node to a larger hard disk and trying out the load balance, but it 
is really slow. 

When I do the load balancing, row mutations are taking place and compaction is 
still continuing. The load balancing is happening at the rate of 10 MB per half 
hour. I
suspect this is due to the compaction (and the anti-compaction).

Now for questions:
1. Is there a way to estimate the time it would take to compact this work load? 
I hope the load balancing will be much faster after the compaction. Curious how 
fast I can get the transfer once compaction is done.
2. Any way to make this faster? Is working on the above issue the lowest 
hanging fruit or is there something else?

Thanks, Shiv







From: Jonathan Ellis 
To: cassandra-user@incubator.apache.org
Sent: Thu, March 4, 2010 8:31:16 AM
Subject: Re: Anti-compaction Diskspace issue even when latest patch applied

https://issues.apache.org/jira/browse/CASSANDRA-579 should make a big
difference in speed.  If you want to take a stab at it I can point you
in the right direction. :)

On Thu, Mar 4, 2010 at 10:24 AM, shiv shivaji  wrote:
> Yes.
>
> The IP change trick seems to work. Load balancing seems a little slow, but I
> will open a new thread on that if needed.
>
> Thanks, Shiv
>
>
> 
> From: Jonathan Ellis 
> To: cassandra-user@incubator.apache.org
> Sent: Wed, March 3, 2010 9:21:28 AM
> Subject: Re: Anti-compaction Diskspace issue even when latest patch applied
>
> You are proposing manually moving your data from a 5TB disk to a 12TB
> disk, and that is the only change you want to make?  Then just keep
> the IP the same when you restart it after moving, and you won't have
> to do anything else, it will just look like the node was down
> temporarily and is now back up.
>
> On Tue, Mar 2, 2010 at 7:26 PM, shiv shivaji  wrote:
>> Thanks, just realized this after looking at the source code.
>>
>> Seems like the decommission will not work for me due to disk space issues.
>> I
>> am currently moving all the data on the heavy node (5 TB full) to a 12 TB
>> disk drive. I am planning to remove the old token and resign the old token
>> to this node.
>>
>> According to the docs, it says to use decommission, however lack of disk
>> space does not allow me to do this. If I manually move all the data files
>> and then do a removetoken and start the node with a new token, would that
>> work (as was implied in a JIRA)?
>>
>> Shiv
>>
>>
>> 
>> From: Stu Hood 
>> To: cassandra-user@incubator.apache.org
>> Sent: Sun, February 28, 2010 1:53:29 PM
>> Subject: Re: Anti-compaction Diskspace issue even when latest patch
>> applied
>>
>> `nodetool cleanup` is a very expensive process: it performs a major
>> compaction, and should not be done that frequently.
>>
>> -Original Message-
>> From: "shiv shivaji" 
>> Sent: Sunday, February 28, 2010 3:34pm
>> To: cassandra-user@incubator.apache.org
>> Subject: Re: Anti-compaction Diskspace issue even when latest patch
>> applied
>>
>> Seems like the temporary solution was to run a cron job which calls
>> nodetool
>> cleanup every 5 mins or so. This stopped the disk space from going too
>> low.
>>
>> The manual solution you mentioned is likely worthy of consideration as the
>> load balancing is taking a while.
>>
>> I will track the jira issue of anticompaction and diskspace. Thanks for
>> the
>> pointer.
>>
>>
>> Thanks, Shiv
>>
>>
>>
>>
>> 
>> From: Jonathan Ellis 
>> To: cassandra-user@incubator.apache.org
>> Sent: Wed, February 24, 2010 11:34:59 AM
>> Subject: Re: Anti-compaction Diskspace issue even when latest patch
>> applied
>>
>> as you noticed, "nodeprobe move" first unloads the data, then moves to
>> the new position.  so that won't help you here.
>>
>> If you are using replicationfactor=1, scp the data to the previous
>> node on the ring, then reduce the original node's token so it isn't
>> responsible for so much, and run cleanup.  (you can do this w/ higher
>> RF too, you just have to scp the data more places.)
>>
>> Finally, you could work on
>> https://issues.apache.org/jira/browse/CASSANDRA-579 so it doesn't need
>> to anticompact to disk before moving data.
>>
>> -Jonathan
>>
>> On Wed, Feb 24, 2010 at 12:06 PM, s

Re: Questions while evaluating Cassandra

2010-03-05 Thread Eran Kutner
Thank you Jonathan!


On Fri, Mar 5, 2010 at 00:03, Jonathan Ellis  wrote:
> On Thu, Mar 4, 2010 at 2:51 AM, Eran Kutner  wrote:
>> On Tue, Mar 2, 2010 at 15:44, Jonathan Ellis  wrote:
>>>
>>> On Tue, Mar 2, 2010 at 6:43 AM, Eran Kutner  wrote:
>>> > Is the procedure described in the description of ticket CASSANDRA-44 
>>> > really
>>> > the way to do schema changes in the latest release? I'm not sure what's 
>>> > your
>>> > thoughts about this but our experience is that every release of our 
>>> > software
>>> > requires schema changes because we add new column families for indexes.
>>>
>>> Yes, that is how it is for 0.5 and 0.6.  0.7 will add online schema
>>> changes (i.e., fix -44), Gary is working on that now.
>>
>> So just to be clear, that would require a complete cluster restart as
>> well as stopping the client app (to prevent new writes from coming in
>> after doing the flush), right? Do you know how others are handling it
>> on a live system?
>
> If you're just adding new CFs then you don't need to worry about the
> commitlog.  So in production just leave the old ones defined and
> remove the data files from the FS, client doesn't have to care, and
> you can do a rolling restart of your cassandra nodes.
>
>>> DatacenterShardStrategy will put multiple replicas in each DC, for use
>>> with CL.DCQUORUM, that is, a majority of replicas in the same DC as
>>> the coordinator node for the current request.  DCQOURUM is not yet
>>> finished, though; currently it behaves the same as CL.ALL.
>>
>> Is it planned for any specific release?
>
> It's planned as soon as someone wants it badly enough to finish it. :)
>
>>> The short answer is, we maintained backwards compatibility for 0.4 ->
>>> 0.5 -> 0.6, but we are going to break things in 0.7 moving from String
>>> keys to byte[] and possibly other changes.
>>
>> hmmm... My assumption was that although keys are strings they are
>> still compared as bytes when using the OPP right? That would be the
>> difference between the OPP and the COPP, right? Just confirming
>> because otherwise creating composite keys with different data types
>> may prove problematic.
>
> Right.
>
> -Jonathan
>


Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread Jonathan Ellis
On Fri, Mar 5, 2010 at 2:13 AM, shiv shivaji  wrote:
> 1. Is there a way to estimate the time it would take to compact this work
> load? I hope the load balancing will be much faster after the compaction.
> Curious how fast I can get the transfer once compaction is done.

0.6 gives you compaction progress, so you can estimate from that.

> 2. Any way to make this faster? Is working on the above issue the lowest
> hanging fruit or is there something else?

Not adding new data at the same time would probably make it faster,
although you haven't told us where the bottleneck is.
(http://spyced.blogspot.com/2010/01/linux-performance-basics.html)

-Jonathan


Re: ConcurrentModificationException

2010-03-05 Thread B. Todd Burruss
yes, 0.6 beta2

i'll open ticket

On Thu, 2010-03-04 at 19:00 -0800, Jonathan Ellis wrote:

> This is the 0.6 beta yes?  Looks like a regression, please open a ticket.
> 
> On Thu, Mar 4, 2010 at 8:54 PM, Todd Burruss  wrote:
> > i'm seeing a lot of these ... any idea?
> >
> > 2010-03-04 18:53:21,455 ERROR [MEMTABLE-POST-FLUSHER:1] 
> > [DebuggableThreadPoolExecutor.java:94] Error in executor futuretask
> > java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> > java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> > java.util.ConcurrentModificationException
> >at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> >at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> >at 
> > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
> >at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
> >at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >at java.lang.Thread.run(Thread.java:619)
> > Caused by: java.lang.RuntimeException: java.lang.RuntimeException: 
> > java.util.concurrent.ExecutionException: 
> > java.util.ConcurrentModificationException
> >at 
> > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
> >at 
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> >at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >... 2 more
> > Caused by: java.lang.RuntimeException: 
> > java.util.concurrent.ExecutionException: 
> > java.util.ConcurrentModificationException
> >at 
> > org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:357)
> >at 
> > org.apache.cassandra.db.ColumnFamilyStore$2.runMayThrow(ColumnFamilyStore.java:392)
> >at 
> > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> >... 6 more
> > Caused by: java.util.concurrent.ExecutionException: 
> > java.util.ConcurrentModificationException
> >at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> >at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:349)
> >... 8 more
> > Caused by: java.util.ConcurrentModificationException
> >at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:605)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegmentsInternal(CommitLog.java:385)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLog.access$300(CommitLog.java:71)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLog$6.call(CommitLog.java:343)
> >at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLogExecutorService.process(CommitLogExecutorService.java:113)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLogExecutorService.access$200(CommitLogExecutorService.java:35)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLogExecutorService$1.runMayThrow(CommitLogExecutorService.java:67)
> >at 
> > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> >... 1 more
> >




Re: ConcurrentModificationException

2010-03-05 Thread B. Todd Burruss
https://issues.apache.org/jira/browse/CASSANDRA-853

On Thu, 2010-03-04 at 19:00 -0800, Jonathan Ellis wrote:

> This is the 0.6 beta yes?  Looks like a regression, please open a ticket.
> 
> On Thu, Mar 4, 2010 at 8:54 PM, Todd Burruss  wrote:
> > i'm seeing a lot of these ... any idea?
> >
> > 2010-03-04 18:53:21,455 ERROR [MEMTABLE-POST-FLUSHER:1] 
> > [DebuggableThreadPoolExecutor.java:94] Error in executor futuretask
> > java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> > java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> > java.util.ConcurrentModificationException
> >at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> >at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> >at 
> > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
> >at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
> >at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >at java.lang.Thread.run(Thread.java:619)
> > Caused by: java.lang.RuntimeException: java.lang.RuntimeException: 
> > java.util.concurrent.ExecutionException: 
> > java.util.ConcurrentModificationException
> >at 
> > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
> >at 
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> >at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >... 2 more
> > Caused by: java.lang.RuntimeException: 
> > java.util.concurrent.ExecutionException: 
> > java.util.ConcurrentModificationException
> >at 
> > org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:357)
> >at 
> > org.apache.cassandra.db.ColumnFamilyStore$2.runMayThrow(ColumnFamilyStore.java:392)
> >at 
> > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> >... 6 more
> > Caused by: java.util.concurrent.ExecutionException: 
> > java.util.ConcurrentModificationException
> >at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> >at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:349)
> >... 8 more
> > Caused by: java.util.ConcurrentModificationException
> >at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:605)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegmentsInternal(CommitLog.java:385)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLog.access$300(CommitLog.java:71)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLog$6.call(CommitLog.java:343)
> >at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLogExecutorService.process(CommitLogExecutorService.java:113)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLogExecutorService.access$200(CommitLogExecutorService.java:35)
> >at 
> > org.apache.cassandra.db.commitlog.CommitLogExecutorService$1.runMayThrow(CommitLogExecutorService.java:67)
> >at 
> > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> >... 1 more
> >




ColumnFamilies vs composite rows in one table.

2010-03-05 Thread Erik Holstad
What are the benefits of using multiple ColumnFamilies compared to using a
composite
row name?

Example: You have messages that you want to index on sent and to.

So you can either have
ColumnFamilyFrom:userTo:{userFrom->messageid}
ColumnFamilyTo:userFrom:{userTo->messageid}

or something like
ColumnFamily:user_to:{user1_messageId, user2_messageId}
ColumnFamily:user_from:{user1_messageId, user2_messageId}

One thing that I can see the advantage of using families are if you want to
use different types in the families. But are there others? Like storage
space,
read/write speeds etc.

-- 
Regards Erik


Re: ColumnFamilies vs composite rows in one table.

2010-03-05 Thread David Strauss
On 2010-03-05 18:04, Erik Holstad wrote:
> What are the benefits of using multiple ColumnFamilies compared to using
> a composite row name?

Just for terminology's sake, I'll note that rows have keys, not names.
Only columns and supercolumns have names.

I'm not the top expert here by any means, but I think the choice between
{CF-as-direction, key-as-person} and {key-as-person-and-direction} won't
affect performance substantially if the multiple CFs in the first option
are identically configured. All messages with the same source or
destination still share the same row.

What *would* make a huge difference is composite row keys like
from_userA_userB and to_userB_userA where you'd have to pull key ranges
to get all the messages to or from someone. That design would trade
performance for inbox scalability, assuming users distribute their
messages to a wide breadth other users.

> Example: You have messages that you want to index on sent and to.
> 
> So you can either have
> ColumnFamilyFrom:userTo:{userFrom->messageid}
> ColumnFamilyTo:userFrom:{userTo->messageid}
> 
> or something like
> ColumnFamily:user_to:{user1_messageId, user2_messageId}
> ColumnFamily:user_from:{user1_messageId, user2_messageId}

You've changed two different things between the examples:

(1) Whether direction is distinguished by the key or by the CF.
(2) Something about the columns, but this isn't clear or necessary to
support the change in CF/key structure.

What is the second change, and why did you make it?

-- 
David Strauss
   | da...@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: ColumnFamilies vs composite rows in one table.

2010-03-05 Thread David Strauss
On 2010-03-05 18:30, David Strauss wrote:
> On 2010-03-05 18:04, Erik Holstad wrote:
>> So you can either have
>> ColumnFamilyFrom:userTo:{userFrom->messageid}
>> ColumnFamilyTo:userFrom:{userTo->messageid}
>>
>> or something like
>> ColumnFamily:user_to:{user1_messageId, user2_messageId}
>> ColumnFamily:user_from:{user1_messageId, user2_messageId}
> 
> You've changed two different things between the examples:
> 
> (1) Whether direction is distinguished by the key or by the CF.
> (2) Something about the columns, but this isn't clear or necessary to
> support the change in CF/key structure.

Upon further inspection, the first example appears to use the other
party to a message as the column name. That will only allow one
messageid for any unique . That seems broken to me.

-- 
David Strauss
   | da...@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: ColumnFamilies vs composite rows in one table.

2010-03-05 Thread Jonathan Ellis
Generally, you want to have different types of data in different CFs
so you can tune them separately (key / row caches).

Mixing different row types in one CF also makes doing get_slice_range
scans difficult.

On Fri, Mar 5, 2010 at 12:04 PM, Erik Holstad  wrote:
> What are the benefits of using multiple ColumnFamilies compared to using a
> composite
> row name?
>
> Example: You have messages that you want to index on sent and to.
>
> So you can either have
> ColumnFamilyFrom:userTo:{userFrom->messageid}
> ColumnFamilyTo:userFrom:{userTo->messageid}
>
> or something like
> ColumnFamily:user_to:{user1_messageId, user2_messageId}
> ColumnFamily:user_from:{user1_messageId, user2_messageId}
>
> One thing that I can see the advantage of using families are if you want to
> use different types in the families. But are there others? Like storage
> space,
> read/write speeds etc.
>
> --
> Regards Erik
>


Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread shiv shivaji
Sorry, how to get compaction progress with 0.6. Is it in nodetool or somewhere 
else? I tried a few options after nodetool and did not get this info.

My vmstats are

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 1  0 2234752  85440  0 27270300   43   32  2562  3474   19   21 10  2 83  5
 1  0 2234744  91220  0 27281208   100 24893 41788 10330 2482 10  2 81  
6
 2  0 2234732 102560  0 27271640   390 25230 21048 10300 2346 10  2 82  
6
 1  1 2234720 106660  0 2726819200 24730 34483 10700 2563 10  3 81  
6


iostats:
Linux 2.6.30-gentoo-r4pb (cl201) 03/05/10 _x86_64_(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   9.620.002.174.950.00   83.26

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 126.60  7527.18  9843.04 1252414800 1637741619
sdb  96.95  6828.70  9348.09 1136197889 1555388186
sdc  96.82  6830.49  9324.74 1136496543 1551504064
sde  97.04  6830.45  9327.39 1136488926 1551944491
sdd  98.37  6829.08  9349.83 1136260775 1555678239
sdf  96.46  6828.30  9330.57 1136131741 1552473459
md0   1.9418.5927.9730927904653501
md22239.84 40960.19 55939.76 6815190175 9307575976

The md2 disk contains the data for cassandra.

Regarding my previous reply, I do not mind working on the issue you mentioned, 
but have to get manager approval and if it best solves the problem, then great! 
So far, I am convinced it is related to compaction.

Shiv





From: Jonathan Ellis 
To: cassandra-user@incubator.apache.org
Sent: Fri, March 5, 2010 9:00:19 AM
Subject: Re: Anti-compaction Diskspace issue even when latest patch applied

On Fri, Mar 5, 2010 at 2:13 AM, shiv shivaji  wrote:
> 1. Is there a way to estimate the time it would take to compact this work
> load? I hope the load balancing will be much faster after the compaction.
> Curious how fast I can get the transfer once compaction is done.

0.6 gives you compaction progress, so you can estimate from that.

> 2. Any way to make this faster? Is working on the above issue the lowest
> hanging fruit or is there something else?

Not adding new data at the same time would probably make it faster,
although you haven't told us where the bottleneck is.
(http://spyced.blogspot.com/2010/01/linux-performance-basics.html)

-Jonathan


Dynamically Switching from Ordered Partitioner to Random?

2010-03-05 Thread shiv shivaji
I started with the ordered partitioner as I was hoping to make use of the 
map-reduce functionality. However, my data was likely lopped onto 2 key 
machines with most of it on one (as seen from another thread. There were also 
machine failures to blame for the uneven distribution). One solution which I am 
trying is to load balance. Is there any other thing I can try to convert the 
partitioner to random on a live system? 

I know this sounds like an odd request. Curious about my options though. I did 
see a post mentioning that one can compute the md5 hash of each key and then 
insert using that and have a mapping table from key to md5 hash. Unfortunately, 
the data is already loaded using an ordered partitioner and I was wondering if 
there is a way to switch to random now.

Shiv


Re: Dynamically Switching from Ordered Partitioner to Random?

2010-03-05 Thread Chris Goffinet
At this time, you have to re-import the data.

-Chris

On Fri, Mar 5, 2010 at 11:42 AM, shiv shivaji  wrote:

> I started with the ordered partitioner as I was hoping to make use of the
> map-reduce functionality. However, my data was likely lopped onto 2 key
> machines with most of it on one (as seen from another thread. There were
> also machine failures to blame for the uneven distribution). One solution
> which I am trying is to load balance. Is there any other thing I can try to
> convert the partitioner to random on a live system?
>
> I know this sounds like an odd request. Curious about my options though. I
> did see a post mentioning that one can compute the md5 hash of each key and
> then insert using that and have a mapping table from key to md5 hash.
> Unfortunately, the data is already loaded using an ordered partitioner and I
> was wondering if there is a way to switch to random now.
>
> Shiv
>



-- 
Chris Goffinet


Re: Dynamically Switching from Ordered Partitioner to Random?

2010-03-05 Thread Stu Hood
But rather than switching, you should definitely try the 'loadbalance' approach 
first, and see whether OrderPP works out for you.

-Original Message-
From: "Chris Goffinet" 
Sent: Friday, March 5, 2010 1:43pm
To: cassandra-user@incubator.apache.org
Subject: Re: Dynamically Switching from Ordered Partitioner to Random?

At this time, you have to re-import the data.

-Chris

On Fri, Mar 5, 2010 at 11:42 AM, shiv shivaji  wrote:

> I started with the ordered partitioner as I was hoping to make use of the
> map-reduce functionality. However, my data was likely lopped onto 2 key
> machines with most of it on one (as seen from another thread. There were
> also machine failures to blame for the uneven distribution). One solution
> which I am trying is to load balance. Is there any other thing I can try to
> convert the partitioner to random on a live system?
>
> I know this sounds like an odd request. Curious about my options though. I
> did see a post mentioning that one can compute the md5 hash of each key and
> then insert using that and have a mapping table from key to md5 hash.
> Unfortunately, the data is already loaded using an ordered partitioner and I
> was wondering if there is a way to switch to random now.
>
> Shiv
>



-- 
Chris Goffinet




Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread Jonathan Ellis
On Fri, Mar 5, 2010 at 1:36 PM, shiv shivaji  wrote:
> Sorry, how to get compaction progress with 0.6. Is it in nodetool or
> somewhere else? I tried a few options after nodetool and did not get this
> info.

it's under CompactionManager in jmx.  I'm not sure if nodetool exposes
this but it's easy to find in JConsole mbeans.

> iostats:

what about iostat -x?

-Jonathan


Re: Dynamically Switching from Ordered Partitioner to Random?

2010-03-05 Thread shiv shivaji
Point taken. Was thinking of switching in parallel using a 2nd cassandra 
instance (perhaps on the same set of machines). This way if loadbalancing is 
too slow, I can try this version.





From: Stu Hood 
To: cassandra-user@incubator.apache.org
Sent: Fri, March 5, 2010 11:48:28 AM
Subject: Re: Dynamically Switching from Ordered Partitioner to Random?

But rather than switching, you should definitely try the 'loadbalance' approach 
first, and see whether OrderPP works out for you.

-Original Message-
From: "Chris Goffinet" 
Sent: Friday, March 5, 2010 1:43pm
To: cassandra-user@incubator.apache.org
Subject: Re: Dynamically Switching from Ordered Partitioner to Random?

At this time, you have to re-import the data.

-Chris

On Fri, Mar 5, 2010 at 11:42 AM, shiv shivaji  wrote:

> I started with the ordered partitioner as I was hoping to make use of the
> map-reduce functionality. However, my data was likely lopped onto 2 key
> machines with most of it on one (as seen from another thread. There were
> also machine failures to blame for the uneven distribution). One solution
> which I am trying is to load balance. Is there any other thing I can try to
> convert the partitioner to random on a live system?
>
> I know this sounds like an odd request. Curious about my options though. I
> did see a post mentioning that one can compute the md5 hash of each key and
> then insert using that and have a mapping table from key to md5 hash.
> Unfortunately, the data is already loaded using an ordered partitioner and I
> was wondering if there is a way to switch to random now.
>
> Shiv
>



-- 
Chris Goffinet

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread shiv shivaji
Ah, will look at the jmx console. Thought it was under nodetool.

cont...@cl201 ~/swell/cassandra $ iostat -x
Linux 2.6.30-gentoo-r4pb (cl201) 03/05/10 _x86_64_(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   9.660.002.184.980.00   83.18

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda  86.37   315.56   53.65   73.00  7534.09  9905.44   137.70 
1.47   11.63   2.18  27.66
sdb  24.35   254.96   29.45   67.79  6842.55  9415.54   167.19 
0.737.50   1.83  17.83
sdc  24.41   254.17   29.44   67.63  6844.34  9389.56   167.23 
0.737.50   1.83  17.78
sde  24.73   254.26   29.44   67.87  6844.29  9392.18   166.86 
0.636.50   1.81  17.63
sdd  24.38   254.82   29.44   69.24  6842.92  9417.35   164.76 
0.757.58   1.83  18.01
sdf  24.35   254.57   29.36   67.37  6842.16  9397.45   167.88 
0.697.18   1.83  17.68
md0   0.00 0.000.601.3118.6227.2124.00 
0.000.00   0.00   0.00
md2   0.00 0.00  322.69 1932.98 41043.23 56337.8343.17 
0.000.00   0.00   0.00

Shiv





From: Jonathan Ellis 
To: cassandra-user@incubator.apache.org
Sent: Fri, March 5, 2010 11:52:18 AM
Subject: Re: Anti-compaction Diskspace issue even when latest patch applied

On Fri, Mar 5, 2010 at 1:36 PM, shiv shivaji  wrote:
> Sorry, how to get compaction progress with 0.6. Is it in nodetool or
> somewhere else? I tried a few options after nodetool and did not get this
> info.

it's under CompactionManager in jmx.  I'm not sure if nodetool exposes
this but it's easy to find in JConsole mbeans.

> iostats:

what about iostat -x?

-Jonathan


Re: ConcurrentModificationException

2010-03-05 Thread Jonathan Ellis
Fixed, thanks.

On Fri, Mar 5, 2010 at 11:12 AM, B. Todd Burruss  wrote:
> https://issues.apache.org/jira/browse/CASSANDRA-853
>
> On Thu, 2010-03-04 at 19:00 -0800, Jonathan Ellis wrote:
>
> This is the 0.6 beta yes?  Looks like a regression, please open a ticket.
>
> On Thu, Mar 4, 2010 at 8:54 PM, Todd Burruss  wrote:
>> i'm seeing a lot of these ... any idea?
>>
>> 2010-03-04 18:53:21,455 ERROR [MEMTABLE-POST-FLUSHER:1]
>> [DebuggableThreadPoolExecutor.java:94] Error in executor futuretask
>> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
>> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
>> java.util.ConcurrentModificationException
>>        at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>>        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>        at
>> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>        at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.lang.RuntimeException: java.lang.RuntimeException:
>> java.util.concurrent.ExecutionException:
>> java.util.ConcurrentModificationException
>>        at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>        at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>        at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>        ... 2 more
>> Caused by: java.lang.RuntimeException:
>> java.util.concurrent.ExecutionException:
>> java.util.ConcurrentModificationException
>>        at
>> org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:357)
>>        at
>> org.apache.cassandra.db.ColumnFamilyStore$2.runMayThrow(ColumnFamilyStore.java:392)
>>        at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>        ... 6 more
>> Caused by: java.util.concurrent.ExecutionException:
>> java.util.ConcurrentModificationException
>>        at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>>        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>        at
>> org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:349)
>>        ... 8 more
>> Caused by: java.util.ConcurrentModificationException
>>        at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:605)
>>        at
>> org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegmentsInternal(CommitLog.java:385)
>>        at
>> org.apache.cassandra.db.commitlog.CommitLog.access$300(CommitLog.java:71)
>>        at
>> org.apache.cassandra.db.commitlog.CommitLog$6.call(CommitLog.java:343)
>>        at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>        at
>> org.apache.cassandra.db.commitlog.CommitLogExecutorService.process(CommitLogExecutorService.java:113)
>>        at
>> org.apache.cassandra.db.commitlog.CommitLogExecutorService.access$200(CommitLogExecutorService.java:35)
>>        at
>> org.apache.cassandra.db.commitlog.CommitLogExecutorService$1.runMayThrow(CommitLogExecutorService.java:67)
>>        at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>        ... 1 more
>>
>
>


Unreliable transport layer

2010-03-05 Thread Ashwin Jayaprakash
Hey guys! I have a simple question. I'm a casual observer, not a real
Cassandra user yet. So, excuse my ignorance.

I see that the Gossip feature uses UDP. I was curious to know if you guys
faced issues with unreliable transports in your production clusters? Like
faulty switches, dropped packets etc during heavy network loads?

If I'm not mistaken are all client reads/writes doing point-to-point over
TCP?

Thanks,
Ashwin.


Re: Unreliable transport layer

2010-03-05 Thread Jonathan Ellis
In 0.6 gossip is over TCP.

On Fri, Mar 5, 2010 at 6:54 PM, Ashwin Jayaprakash
 wrote:
> Hey guys! I have a simple question. I'm a casual observer, not a real
> Cassandra user yet. So, excuse my ignorance.
>
> I see that the Gossip feature uses UDP. I was curious to know if you guys
> faced issues with unreliable transports in your production clusters? Like
> faulty switches, dropped packets etc during heavy network loads?
>
> If I'm not mistaken are all client reads/writes doing point-to-point over
> TCP?
>
> Thanks,
> Ashwin.
>
>
>


Cassandra hardware - balancing CPU/memory/iops/disk space

2010-03-05 Thread Rosenberry, Eric
I am looking for advice from others that are further along in deploying 
Cassandra in production environments than we are.  I want to know what you are 
finding your bottlenecks to be.  I would feel silly purchasing dual processor 
quad core 2.93ghz Nehalem machines with 192 gigs of RAM just to find out that 
the two local SATA disks kept all that CPU and RAM from being useful (clearly 
that example would be a dumb).

I need to spec out hardware for an "optimal" Cassandra node (though our 
read/write characteristics are not yet fully defined so let's go with an 
"average" configuration).

My main concern is finding the right balance of:

* Available CPU

* RAM amount

* RAM speed (think Nehalem architecture where memory comes in a few 
speeds, though I doubt this is much of a concern as it is mainly dictated by 
which processor you buy and how many slots you populate)

* Total iops available (i.e. number of disks)

* Total disk space available (depending on the ratio of iops/space 
deciding on SAS vs. SATA and various rotational speeds)

My current thinking is 1U boxes with four 3.5 inch disks since that seems to be 
a readily available config.  One big question is should I go with a single 
processor Nehalem system to go with those four disks, or would two CPU's be 
useful, and also, how much RAM is appropriate to match?  I am making the 
assumption that Cassandra nodes are going to be disk bound as they must do a 
random read to answer any given query (i.e. indexes in RAM, but all data lives 
on disk?).

The other big decision is what type of hard disks others are finding to provide 
the optimal ratio of iops to available space?  SAS or SATA?  And what 
rotational speed?

Let me throw out here an actual hardware config and feel free to tell me the 
error of my ways:

* A SuperMicro SuperServer 6016T-NTRF configured as follows:

o   2.26 ghz E5520 dual processor quad core hyperthreaded Nehalem architecture 
(this proc provides a lot of bang for the buck, faster procs get more expensive 
quickly)

o   Qty 12, 4 gig 1066mhz DIMMS for a total of 48 gigs RAM (the 4 gig DIMMS 
seem to be the price sweet spot)

o   Dual on board 1 gigabit NIC's (perhaps one for client connections and the 
other for cluster communication?)

o   Dual power supplies (I don't want to lose half my cluster due to a failure 
on one power leg)

o   4x 1TB SATA disks (this is a complete SWAG)

o   No RAID controller (all just single individual disks presented to the OS) - 
Though is there any down side to using a RAID controller with RAID 0 (perhaps 
one single disk for the log for sequential io's, and 3x disks in a stripe for 
the random io's)

o   The on-board IPMI based OOB controller (so we can kick the boxes remotely 
if need be)

* http://www.supermicro.com/products/system/1U/6016/SYS-6016T-NTRF.cfm

I can't help but think the above config has way too much RAM and CPU and not 
enough iops capacity.  My understanding is that Cassandra does not cache much 
in RAM though?

Any thoughts are appreciated.  Thanks.

-Eric
___
Eric Rosenberry
Sr. Infrastructure Architect | Chief Bit Plumber


iovation
111 SW Fifth Avenue
Suite 3200
Portland, OR 97204
www.iovation.com

The information contained in this email message may be privileged, confidential 
and protected from disclosure. If you are not the intended recipient, any 
dissemination, distribution or copying is strictly prohibited. If you think 
that you have received this email message in error, please notify the sender by 
reply email and delete the message and any attachments.