Re: How to force GC in Cassandra?

2010-03-13 Thread Weijun Li
Thanks Jonathan! I looked into your code and guessed that compaction is the
one that cleans all deleted columns from sstable.

-Weijun

On Fri, Mar 12, 2010 at 12:05 PM, Jonathan Ellis jbel...@gmail.com wrote:

 I think you mean compaction?

 You can use nodeprobe / nodetool for that.

 http://wiki.apache.org/cassandra/NodeProbe

 On Fri, Mar 12, 2010 at 12:40 PM, Weijun Li weiju...@gmail.com wrote:
  Suppose I insert a lot of new items but also delete a lot of new items
  daily, it will be ideal if I can force GC to happen during mid night
 (when
  traffic is low). Is there any way to manually force GC to be executed? In
  this way I can add a cronjob to trigger gc in mid night. I tried nodetool
  and the JMX interface but they don't seem to have that.
 
  -Weijun
 



How to force GC in Cassandra?

2010-03-12 Thread Weijun Li
Suppose I insert a lot of new items but also delete a lot of new items
daily, it will be ideal if I can force GC to happen during mid night (when
traffic is low). Is there any way to manually force GC to be executed? In
this way I can add a cronjob to trigger gc in mid night. I tried nodetool
and the JMX interface but they don't seem to have that.

-Weijun


Re: Strategy to delete/expire keys in cassandra

2010-03-10 Thread Weijun Li
Hi Sylvain,

I applied your patch to 0.5 but it seems that it's not compilable:

1) column.getTtl() is no defined in RowMutation.java
public static RowMutation getRowMutation(String table, String key,
MapString, ListColumnOrSuperColumn cfmap)
{
RowMutation rm = new RowMutation(table, key.trim());
for (Map.EntryString, ListColumnOrSuperColumn entry :
cfmap.entrySet())
{
String cfName = entry.getKey();
for (ColumnOrSuperColumn cosc : entry.getValue())
{
if (cosc.column == null)
{
assert cosc.super_column != null;
for (org.apache.cassandra.service.Column column :
cosc.super_column.columns)
{
rm.add(new QueryPath(cfName, cosc.super_column.name,
column.name), column.value, column.timestamp, column.getTtl());
}
}
else
{
assert cosc.super_column == null;
rm.add(new QueryPath(cfName, null, cosc.column.name),
cosc.column.value, cosc.column.timestamp, cosc.column.getTtl());
}
}
}
return rm;
}

2) CassandraServer.java: Column.setTtl() is not defined.
if (column instanceof ExpiringColumn)
{
thrift_column.setTtl(((ExpiringColumn)
column).getTimeToLive());
}

3) CliClient.java: type mismatch for ColumnParent
thriftClient_.insert(tableName, key, new ColumnParent(columnFamily,
superColumnName),
 new Column(columnName, value.getBytes(),
System.currentTimeMillis()), ConsistencyLevel.ONE);

It seems that the patch doesn't add getTtl()/setTtl() stuff to Column.java?

Thanks,
-Weijun

-Original Message-
 From: Sylvain Lebresne [mailto:sylv...@yakaz.com]
 Sent: Thursday, February 25, 2010 2:23 AM
 To: Weijun Li
 Cc: cassandra-user@incubator.apache.org
 Subject: Re: Strategy to delete/expire keys in cassandra

 Hi,

  Should I just run command (in Cassandra 0.5 source folder?) like:
  patch –p1 –i  0001-Add-new-ExpiringColumn-class.patch
  for all of the five patches in your ticket?

 Well, actually I lied. The patches were made for a version a little after
 0.5.
 If you really want to try, I attach a version of those patches that
 (should)
 work with 0.5 (There is only the 3 first patch, but the fourth one is for
 tests so not necessary per se). Apply them with your patch command.
 Still, to compile that you will have to regenerate the thrift java
 interface
 (with ant gen-thrift-java), but for that you will have to install the right
 svn revision of thrift (which is libthrift-r820831 for 0.5). And if you
 manage to make it work, you will have to digg in cassandra.thrift as it
 make
 change to it.

 In the end, remember that this is not an official patch yet and it *will
 not* make it in Cassandra in its current form. All I can tell you is that I
 need those expiring columns for quite some of my usage and I will do what I
 can to make this feature included if and when possible.

  Also what’s your opinion on extending ExpiringColumn to expire a key
  completely? Otherwise it will be difficult to track what are expired
  or old rows in Cassandra.

 I'm not sure how to make full rows (or even full superColumns for that
 matter) expire. What if you set a row to expire after some time and add new
 columns before this expiration ? Should you update the expiration of the
 row
 ? Which is to say that a row will expires when it's last column expire,
 which is almost what you get with expiring column.
 The one thing you may want though is that when all the columns of a row
 expire (or, to be precise, get physically deleted), the row itself is
 deleted. Looking at the code, I'm not convince this happen and I'm not sure
 why.

 --
 Sylvain




Re: Strategy to delete/expire keys in cassandra

2010-03-10 Thread Weijun Li
Never mind. Figured out I forgot to compile thrift :)

Thanks,

-Weijun

On Wed, Mar 10, 2010 at 1:43 PM, Weijun Li weiju...@gmail.com wrote:

 Hi Sylvain,

 I applied your patch to 0.5 but it seems that it's not compilable:

 1) column.getTtl() is no defined in RowMutation.java
 public static RowMutation getRowMutation(String table, String key,
 MapString, ListColumnOrSuperColumn cfmap)
 {
 RowMutation rm = new RowMutation(table, key.trim());
 for (Map.EntryString, ListColumnOrSuperColumn entry :
 cfmap.entrySet())
 {
 String cfName = entry.getKey();
 for (ColumnOrSuperColumn cosc : entry.getValue())
 {
 if (cosc.column == null)
 {
 assert cosc.super_column != null;
 for (org.apache.cassandra.service.Column column :
 cosc.super_column.columns)
 {
 rm.add(new QueryPath(cfName,
 cosc.super_column.name, column.name), column.value, column.timestamp,
 column.getTtl());
 }
 }
 else
 {
 assert cosc.super_column == null;
 rm.add(new QueryPath(cfName, null, cosc.column.name),
 cosc.column.value, cosc.column.timestamp, cosc.column.getTtl());
 }
 }
 }
 return rm;
 }

 2) CassandraServer.java: Column.setTtl() is not defined.
 if (column instanceof ExpiringColumn)
 {
 thrift_column.setTtl(((ExpiringColumn)
 column).getTimeToLive());
 }

 3) CliClient.java: type mismatch for ColumnParent
 thriftClient_.insert(tableName, key, new ColumnParent(columnFamily,
 superColumnName),
  new Column(columnName, value.getBytes(),
 System.currentTimeMillis()), ConsistencyLevel.ONE);

 It seems that the patch doesn't add getTtl()/setTtl() stuff to Column.java?


 Thanks,
 -Weijun

  -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@yakaz.com]
 Sent: Thursday, February 25, 2010 2:23 AM
 To: Weijun Li
 Cc: cassandra-user@incubator.apache.org
 Subject: Re: Strategy to delete/expire keys in cassandra

 Hi,

  Should I just run command (in Cassandra 0.5 source folder?) like:
  patch –p1 –i  0001-Add-new-ExpiringColumn-class.patch
  for all of the five patches in your ticket?

 Well, actually I lied. The patches were made for a version a little after
 0.5.
 If you really want to try, I attach a version of those patches that
 (should)
 work with 0.5 (There is only the 3 first patch, but the fourth one is for
 tests so not necessary per se). Apply them with your patch command.
 Still, to compile that you will have to regenerate the thrift java
 interface
 (with ant gen-thrift-java), but for that you will have to install the
 right
 svn revision of thrift (which is libthrift-r820831 for 0.5). And if you
 manage to make it work, you will have to digg in cassandra.thrift as it
 make
 change to it.

 In the end, remember that this is not an official patch yet and it *will
 not* make it in Cassandra in its current form. All I can tell you is that
 I
 need those expiring columns for quite some of my usage and I will do what
 I
 can to make this feature included if and when possible.

  Also what’s your opinion on extending ExpiringColumn to expire a key
  completely? Otherwise it will be difficult to track what are expired
  or old rows in Cassandra.

 I'm not sure how to make full rows (or even full superColumns for that
 matter) expire. What if you set a row to expire after some time and add
 new
 columns before this expiration ? Should you update the expiration of the
 row
 ? Which is to say that a row will expires when it's last column expire,
 which is almost what you get with expiring column.
 The one thing you may want though is that when all the columns of a row
 expire (or, to be precise, get physically deleted), the row itself is
 deleted. Looking at the code, I'm not convince this happen and I'm not
 sure
 why.

 --
 Sylvain






RE: Strategy to delete/expire keys in cassandra

2010-02-26 Thread Weijun Li
Thanks for the patch Sylvain! I remember during build Cassandra re-generates
the thrift java code (in src/) with a libthrift jar, is this correct?

Here's my use case: 

1) Write/read ratio is close to 1:1
2) High volume of traffic and I want low read latency (e.g.,  40ms). That's
why I'm testing a build with row-level cache and mmap (I think Jonathan is
right that mmap does help with performance).
3) A row should expire if its last modified time is too old so we don't need
to worry about scanning all keys to clean up old items. So yes if you write
to a row the last-modified-time should be updated as well.
4) (nice to have) support for range scan (key iteration) with RP.

So ideally a row should have a last modified time field. Or, I can use one
column to record the last modified time (this means each write to a row will
be followed by another one to update the last modified column, which is kind
of ugly). For the simplest case: suppose each row just have one
ExpiringColumn, will the row be deleted automatically if it has no column
associated with it?  Does it make sense for Cassandra to keep a row without
any column?

Please let me know if the following plan will work or not:

1) Manually apply your patch to the trunk build that I use (which has
row-level cache and mmap). If will be nice if you can throw some words about
the design flow of your ExpringColum :-)
2) Find the API entry point for deleting a row, and modify the expiration
handler (suppose you have one) of ExpiringColumn to call the key delete
method if the key has no other columns (if it doesn't happen for now). How
do you trigger the expiration check for a ExpiringColumn? Upon hit of a
column? Or use a timer to scan all columns for expiration??

Thanks,

-Weijun




-Original Message-
From: Sylvain Lebresne [mailto:sylv...@yakaz.com] 
Sent: Thursday, February 25, 2010 2:23 AM
To: Weijun Li
Cc: cassandra-user@incubator.apache.org
Subject: Re: Strategy to delete/expire keys in cassandra

Hi,

 Should I just run command (in Cassandra 0.5 source folder?) like:
 patch –p1 –i  0001-Add-new-ExpiringColumn-class.patch
 for all of the five patches in your ticket?

Well, actually I lied. The patches were made for a version a little after
0.5.
If you really want to try, I attach a version of those patches that (should)
work with 0.5 (There is only the 3 first patch, but the fourth one is for
tests so not necessary per se). Apply them with your patch command.
Still, to compile that you will have to regenerate the thrift java interface
(with ant gen-thrift-java), but for that you will have to install the right
svn revision of thrift (which is libthrift-r820831 for 0.5). And if you
manage to make it work, you will have to digg in cassandra.thrift as it make
change to it.

In the end, remember that this is not an official patch yet and it *will
not* make it in Cassandra in its current form. All I can tell you is that I
need those expiring columns for quite some of my usage and I will do what I
can to make this feature included if and when possible.

 Also what’s your opinion on extending ExpiringColumn to expire a key 
 completely? Otherwise it will be difficult to track what are expired 
 or old rows in Cassandra.

I'm not sure how to make full rows (or even full superColumns for that
matter) expire. What if you set a row to expire after some time and add new
columns before this expiration ? Should you update the expiration of the row
? Which is to say that a row will expires when it's last column expire,
which is almost what you get with expiring column.
The one thing you may want though is that when all the columns of a row
expire (or, to be precise, get physically deleted), the row itself is
deleted. Looking at the code, I'm not convince this happen and I'm not sure
why.

--
Sylvain




 From: Weijun Li [mailto:weiju...@gmail.com]
 Sent: Tuesday, February 23, 2010 6:18 PM
 To: cassandra-user@incubator.apache.org
 Subject: Re: Strategy to delete/expire keys in cassandra



 Thanks for the answer.  A dumb question: how did you apply the patch 
 file to
 0.5 source? The link you gave doesn't mention that the patch is for 0.5??

 Also, this ExpiringColumn feature doesn't seem to expire key/row, 
 meaning the number of keys will keep grow (even if you drop columns 
 for them) unless you delete them. In your case, how do you manage 
 deleting/expiring keys from Cassandra? Do you keep a list of keys 
 somewhere and go through them once a while?

 Thanks,

 -Weijun

 On Tue, Feb 23, 2010 at 2:26 AM, Sylvain Lebresne sylv...@yakaz.com
wrote:

 Hi,

 Maybe the following ticket/patch may be what you are looking for:
 https://issues.apache.org/jira/browse/CASSANDRA-699

 It's flagged for 0.7 but as it breaks the API (and if I understand 
 correctly the release plan) it may not make it in cassandra before 0.8 
 (and the patch will have to change to accommodate the change that will 
 be made to the internals in 0.7).

 Anyway, what I can at least tell you

RE: Strategy to delete/expire keys in cassandra

2010-02-24 Thread Weijun Li
Hi Sylvain, I just noticed that you are the one that implemented the
Expiring Column feature. Could you please help on my questions?

 

Should I just run command (in Cassandra 0.5 source folder?) like: 

 

patch -p1 -i  0001-Add-new-ExpiringColumn-class.patch

 

for all of the five patches in your ticket?

 

Also what's your opinion on extending ExpiringColumn to expire a key
completely? Otherwise it will be difficult to track what are expired or old
rows in Cassandra.

 

Thanks,

-Weijun

 

From: Weijun Li [mailto:weiju...@gmail.com] 
Sent: Tuesday, February 23, 2010 6:18 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Strategy to delete/expire keys in cassandra

 

Thanks for the answer.  A dumb question: how did you apply the patch file to
0.5 source? The link you gave doesn't mention that the patch is for 0.5??

Also, this ExpiringColumn feature doesn't seem to expire key/row, meaning
the number of keys will keep grow (even if you drop columns for them) unless
you delete them. In your case, how do you manage deleting/expiring keys from
Cassandra? Do you keep a list of keys somewhere and go through them once a
while?

Thanks,

-Weijun

On Tue, Feb 23, 2010 at 2:26 AM, Sylvain Lebresne sylv...@yakaz.com wrote:

Hi,

Maybe the following ticket/patch may be what you are looking for:
https://issues.apache.org/jira/browse/CASSANDRA-699

It's flagged for 0.7 but as it breaks the API (and if I understand correctly
the release plan) it may not make it in cassandra before 0.8 (and the
patch will have to change to accommodate the change that will be
made to the internals in 0.7).

Anyway, what I can at least tell you is that I'm using the patch against
0.5 in a test cluster without problem so far.


 3)  Once keys are deleted, do you have to wait till next GC to clean
 them from disk or memory (suppose you don't run cleanup manually)? What's
 the strategy for Cassandra to handle deleted items (notify other replica
 nodes, cleanup memory/disk, defrag/rebuild disk files, rebuild bloom
filter
 etc). I'm asking this because if the keys refresh very fast (i.e., high
 volume write/read and expiration is kind of short) how will the data file
 grow and how does this impact the system performance.

Items are deleted only during compaction, and you may actually have to
wait for the GCGraceSeconds before deletion. This value is configurable in
storage-conf.xml, but is 10 days by default. You can decrease this value
but because of consistency (and the fact that you have to at least wait for
compaction to occurs) you will always have a delay before the actual delete
(all this is also true for the patch I mention above by the way). But when
it's
deleted, it's just skipping the items during compaction, so it's really
cheap.

--
Sylvain

 



Strategy to delete/expire keys in cassandra

2010-02-23 Thread Weijun Li
It seems that we are mostly talking about write and read keys into/from 
Cassandra cluster. I’m wondering how did you successfully deal with 
deleting/expiring keys in Cassandra? An typical example is you want to delete 
keys that haven’t been modified in certain time period (i.e., old keys). Here’s 
my thoughts:

 

1)  If you use order preserve partition, you need to iterate through all 
keys, periodically, to check their last modified time to decide whether a key 
should be deleted. When you have hundreds million of keys with high write/read 
traffic, it will be very time and resource consuming to iterate all keys in all 
clusters.

2)  If you use random partition, you’ll need to keep a list of ALL keys 
somewhere and keep it updated through the time, then go through it periodically 
to delete expired items. Again when you have hundreds million of keys, 
maintaining such a big dynamic key list with their expiration time is not 
trivial work.

3)  Once keys are deleted, do you have to wait till next GC to clean them 
from disk or memory (suppose you don’t run cleanup manually)? What’s the 
strategy for Cassandra to handle deleted items (notify other replica nodes, 
cleanup memory/disk, defrag/rebuild disk files, rebuild bloom filter etc). I’m 
asking this because if the keys refresh very fast (i.e., high volume write/read 
and expiration is kind of short) how will the data file grow and how does this 
impact the system performance. 

 

So what’s your opinion to deal with the above cases to expire keys? I’m trying 
to decide whether we can use Cassandra for just high traffic read-only, 
write-only or both read and write.

 

Thanks,

 

-Weijun



Re: Strategy to delete/expire keys in cassandra

2010-02-23 Thread Weijun Li
Thanks for the answer.  A dumb question: how did you apply the patch file to
0.5 source? The link you gave doesn't mention that the patch is for 0.5??

Also, this ExpiringColumn feature doesn't seem to expire key/row, meaning
the number of keys will keep grow (even if you drop columns for them) unless
you delete them. In your case, how do you manage deleting/expiring keys from
Cassandra? Do you keep a list of keys somewhere and go through them once a
while?

Thanks,

-Weijun

On Tue, Feb 23, 2010 at 2:26 AM, Sylvain Lebresne sylv...@yakaz.com wrote:

 Hi,

 Maybe the following ticket/patch may be what you are looking for:
 https://issues.apache.org/jira/browse/CASSANDRA-699

 It's flagged for 0.7 but as it breaks the API (and if I understand
 correctly
 the release plan) it may not make it in cassandra before 0.8 (and the
 patch will have to change to accommodate the change that will be
 made to the internals in 0.7).

 Anyway, what I can at least tell you is that I'm using the patch against
 0.5 in a test cluster without problem so far.

  3)  Once keys are deleted, do you have to wait till next GC to clean
  them from disk or memory (suppose you don’t run cleanup manually)? What’s
  the strategy for Cassandra to handle deleted items (notify other replica
  nodes, cleanup memory/disk, defrag/rebuild disk files, rebuild bloom
 filter
  etc). I’m asking this because if the keys refresh very fast (i.e., high
  volume write/read and expiration is kind of short) how will the data file
  grow and how does this impact the system performance.

 Items are deleted only during compaction, and you may actually have to
 wait for the GCGraceSeconds before deletion. This value is configurable in
 storage-conf.xml, but is 10 days by default. You can decrease this value
 but because of consistency (and the fact that you have to at least wait for
 compaction to occurs) you will always have a delay before the actual delete
 (all this is also true for the patch I mention above by the way). But when
 it's
 deleted, it's just skipping the items during compaction, so it's really
 cheap.

 --
 Sylvain



Re: Testing row cache feature in trunk: write should put record in cache

2010-02-19 Thread Weijun Li
I see. How much is the overhead of java serialization? Does it slow down the
system a lot? It seems to be a tradeoff between CPU usage and memory.

As for mmap of 0.6, do you mmap the sstable data file even it is a lot
larger than the available memory (e.g., the data file is over 100GB while
you have only 8GB ram)? How efficient is mmap in this case? Is mmap already
checked into 0.6 branch?

-Weijun

On Fri, Feb 19, 2010 at 4:56 AM, Jonathan Ellis jbel...@gmail.com wrote:

 The whole point of rowcache is to avoid the serialization overhead,
 though.  If we just wanted the serialized form cached, we would let
 the os block cache handle that without adding an extra layer.  (0.6
 uses mmap'd i/o by default on 64bit JVMs so this is very efficient.)

 On Fri, Feb 19, 2010 at 3:29 AM, Weijun Li weiju...@gmail.com wrote:
  The memory overhead issue is not directly related to GC because when JVM
 ran
  out of memory the GC has been very busy for quite a while. In my case JVM
  consumed all of the 6GB when the row cache size hit 1.4mil.
 
  I haven't started test the row cache feature yet. But I think data
  compression is useful to reduce memory consumption because in my
 impression
  disk i/o is always the bottleneck for Cassandra while its CPU usage is
  usually low all the time. In addition to this, compression should also
 help
  to reduce the number of java objects dramatically (correct me if I'm
 wrong),
  --especially in case we need to cache most of the data to achieve decent
  read latency.
 
  If ColumnFamily is serializable it shouldn't be that hard to implement
 the
  compression feature which can be controlled by an option (again :-) in
  storage conf xml.
 
  When I get to that point you can instruct me to implement this feature
 along
  with the row-cache-write-through. Our goal is straightforward: to support
  short read latency in high volume web application with write/read ratio
 to
  be 1:1.
 
  -Weijun
 
  -Original Message-
  From: Jonathan Ellis [mailto:jbel...@gmail.com]
  Sent: Thursday, February 18, 2010 12:04 PM
  To: cassandra-user@incubator.apache.org
  Subject: Re: Testing row cache feature in trunk: write should put record
 in
  cache
 
  Did you force a GC from jconsole to make sure you weren't just
  measuring uncollected garbage?
 
  On Wed, Feb 17, 2010 at 2:51 PM, Weijun Li weiju...@gmail.com wrote:
  OK I'll work on the change later because there's another problem to
 solve:
  the overhead for cache is too big that 1.4mil records (1k each) consumed
  all
  of the 6gb memory of JVM (I guess 4gb are consumed by the row cache).
 I'm
  thinking that ConcurrentHashMap is not a good choice for LRU and the row
  cache needs to store compressed key data to reduce memory usage. I'll do
  more investigation on this and let you know.
 
  -Weijun
 
  On Tue, Feb 16, 2010 at 9:22 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  ... tell you what, if you write the option-processing part in
  DatabaseDescriptor I will do the actual cache part. :)
 
  On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
   https://issues.apache.org/jira/secure/CreateIssue!default.jspahttps://issues.apache.org/jira/secure/CreateIssue%21default.jspa,
 but
   this is pretty low priority for me.
  
   On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com
 wrote:
   Just tried to make quick change to enable it but it didn't work out
  :-(
  
  ColumnFamily cachedRow =
   cfs.getRawCachedRow(mutation.key());
  
   // What I modified
   if( cachedRow == null ) {
   cfs.cacheRow(mutation.key());
   cachedRow = cfs.getRawCachedRow(mutation.key());
   }
  
   if (cachedRow != null)
   cachedRow.addAll(columnFamily);
  
   How can I open a ticket for you to make the change (enable row cache
   write
   through with an option)?
  
   Thanks,
   -Weijun
  
   On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com
   wrote:
  
   On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com
 
   wrote:
On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com
wrote:
Just started to play with the row cache feature in trunk: it
 seems
to
be
working fine so far except that for RowsCached parameter you
 need
to
specify
number of rows rather than a percentage (e.g., 20% doesn't
  work).
   
20% works, but it's 20% of the rows at server startup.  So on a
fresh
start that is zero.
   
Maybe we should just get rid of the % feature...
  
   (Actually, it shouldn't be hard to update this on flush, if you
 want
   to open a ticket.)
  
  
  
 
 
 
 



Unbalanced read latency among nodes in a cluster

2010-02-19 Thread Weijun Li
I setup a two cassandra clusters with 2 nodes each. Both use random
partitioner. It's strange that for each cluster, one node has much shortter
read latency than the other one

This is the info of one of the cluster:

Node A: read count 77302, data file 41GB, read latency 58180, io saturation
100%
Node B: read count 488753, data file 26GB, read latency 5822 , io saturation
35%.

I first started node A, then ran B to join the cluster. Both machines have
exactly the same hardware and OS. The test client randomly pick a node to
write and it worked fine for the other cluster.

Address   Status Load
Range  Ring

169400792707028208569145873749456918214
10.xxx Up 38.39 GB
103633195217832666843316719920043079797|--|
10.xxx Up 24.22 GB
169400792707028208569145873749456918214|--|

For both clusters, whichever node that took more reads (with larger data
file) owns the much worse read latency.

What's the algorithm that cassandra use to split token when a new node is
joining? What could cause this unbalanced read latency issue? How can I fix
this? How to make sure all nodes get evenly distributed data and traffic?

-Weijun


Re: Testing row cache feature in trunk: write should put record in cache

2010-02-17 Thread Weijun Li
OK I'll work on the change later because there's another problem to solve:
the overhead for cache is too big that 1.4mil records (1k each) consumed all
of the 6gb memory of JVM (I guess 4gb are consumed by the row cache). I'm
thinking that ConcurrentHashMap is not a good choice for LRU and the row
cache needs to store compressed key data to reduce memory usage. I'll do
more investigation on this and let you know.

-Weijun

On Tue, Feb 16, 2010 at 9:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

 ... tell you what, if you write the option-processing part in
 DatabaseDescriptor I will do the actual cache part. :)

 On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
  https://issues.apache.org/jira/secure/CreateIssue!default.jspahttps://issues.apache.org/jira/secure/CreateIssue%21default.jspa,
 but
  this is pretty low priority for me.
 
  On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com wrote:
  Just tried to make quick change to enable it but it didn't work out :-(
 
 ColumnFamily cachedRow =
 cfs.getRawCachedRow(mutation.key());
 
  // What I modified
  if( cachedRow == null ) {
  cfs.cacheRow(mutation.key());
  cachedRow = cfs.getRawCachedRow(mutation.key());
  }
 
  if (cachedRow != null)
  cachedRow.addAll(columnFamily);
 
  How can I open a ticket for you to make the change (enable row cache
 write
  through with an option)?
 
  Thanks,
  -Weijun
 
  On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
   On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com
 wrote:
   Just started to play with the row cache feature in trunk: it seems
 to
   be
   working fine so far except that for RowsCached parameter you need to
   specify
   number of rows rather than a percentage (e.g., 20% doesn't work).
  
   20% works, but it's 20% of the rows at server startup.  So on a fresh
   start that is zero.
  
   Maybe we should just get rid of the % feature...
 
  (Actually, it shouldn't be hard to update this on flush, if you want
  to open a ticket.)
 
 
 



Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
Dumped 50mil records into my 2-node cluster overnight, made sure that
there's not many data files (around 30 only) per Martin's suggestion. The
size of the data directory is 63GB. Now when I read records from the cluster
the read latency is still ~44ms, --there's no write happening during the
read. And iostats shows that the disk (RAID10, 4 250GB 15k SAS) is
saturated:

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda  47.6767.67 190.33 17.00 23933.33   677.33   118.70
5.24   25.25   4.64  96.17
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda2 47.6767.67 190.33 17.00 23933.33   677.33   118.70
5.24   25.25   4.64  96.17
sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00

CPU usage is low.

Does this mean disk i/o is the bottleneck for my case? Will it help if I
increase KCF to cache all sstable index?

Also, this is the almost a read-only mode test, and in reality, our
write/read ratio is close to 1:1 so I'm guessing read latency will even go
higher in that case because there will be difficult for cassandra to find a
good moment to compact the data files that are being busy written.

Thanks,
-Weijun


On Tue, Feb 16, 2010 at 6:06 AM, Brandon Williams dri...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller 
 martin.grabmuel...@eleven.de wrote:

 In my tests I have observed that good read latency depends on keeping
 the number of data files low.  In my current test setup, I have stored
 1.9 TB of data on a single node, which is in 21 data files, and read
 latency is between 10 and 60ms (for small reads, larger read of course
 take more time).  In earlier stages of my test, I had up to 5000
 data files, and read performance was quite bad: my configured 10-second
 RPC timeout was regularly encountered.


 I believe it is known that crossing sstables is O(NlogN) but I'm unable to
 find the ticket on this at the moment.  Perhaps Stu Hood will jump in and
 enlighten me, but in any case I believe
 https://issues.apache.org/jira/browse/CASSANDRA-674 will eventually solve
 it.

 Keeping write volume low enough that compaction can keep up is one
 solution, and throwing hardware at the problem is another, if necessary.
  Also, the row caching in trunk (soon to be 0.6 we hope) helps greatly for
 repeat hits.

 -Brandon



Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
One more thoughts about Martin's suggestion: is it possible to put the data
files into multiple directories that are located in different physical
disks? This should help to improve the i/o bottleneck issue.

Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?

-Weijun

On Tue, Feb 16, 2010 at 9:50 AM, Weijun Li weiju...@gmail.com wrote:

 Dumped 50mil records into my 2-node cluster overnight, made sure that
 there's not many data files (around 30 only) per Martin's suggestion. The
 size of the data directory is 63GB. Now when I read records from the cluster
 the read latency is still ~44ms, --there's no write happening during the
 read. And iostats shows that the disk (RAID10, 4 250GB 15k SAS) is
 saturated:

 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda  47.6767.67 190.33 17.00 23933.33   677.33   118.70
 5.24   25.25   4.64  96.17
 sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
 0.000.00   0.00   0.00
 sda2 47.6767.67 190.33 17.00 23933.33   677.33   118.70
 5.24   25.25   4.64  96.17
 sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
 0.000.00   0.00   0.00

 CPU usage is low.

 Does this mean disk i/o is the bottleneck for my case? Will it help if I
 increase KCF to cache all sstable index?

 Also, this is the almost a read-only mode test, and in reality, our
 write/read ratio is close to 1:1 so I'm guessing read latency will even go
 higher in that case because there will be difficult for cassandra to find a
 good moment to compact the data files that are being busy written.

 Thanks,
 -Weijun



 On Tue, Feb 16, 2010 at 6:06 AM, Brandon Williams dri...@gmail.comwrote:

 On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller 
 martin.grabmuel...@eleven.de wrote:

 In my tests I have observed that good read latency depends on keeping
 the number of data files low.  In my current test setup, I have stored
 1.9 TB of data on a single node, which is in 21 data files, and read
 latency is between 10 and 60ms (for small reads, larger read of course
 take more time).  In earlier stages of my test, I had up to 5000
 data files, and read performance was quite bad: my configured 10-second
 RPC timeout was regularly encountered.


 I believe it is known that crossing sstables is O(NlogN) but I'm unable to
 find the ticket on this at the moment.  Perhaps Stu Hood will jump in and
 enlighten me, but in any case I believe
 https://issues.apache.org/jira/browse/CASSANDRA-674 will eventually solve
 it.

 Keeping write volume low enough that compaction can keep up is one
 solution, and throwing hardware at the problem is another, if necessary.
  Also, the row caching in trunk (soon to be 0.6 we hope) helps greatly for
 repeat hits.

 -Brandon





Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
Thanks for for DataFileDirectory trick and I'll give a try.

Just noticed the impact of number of data files: node A has 13 data files
with read latency of 20ms and node B has 27 files with read latency of 60ms.
After I ran nodeprobe compact on node B its read latency went up to 150ms.
The read latency of node A became as low as 10ms. Is this normal behavior?
I'm using random partitioner and the hardware/JVM settings are exactly the
same for these two nodes.

Another problem is that Java heap usage is always 900mb out of 6GB? Is there
any way to utilize all of the heap space to decrease the read latency?

-Weijun

On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams dri...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com wrote:

 One more thoughts about Martin's suggestion: is it possible to put the
 data files into multiple directories that are located in different physical
 disks? This should help to improve the i/o bottleneck issue.


 Yes, you can already do this, just add more DataFileDirectory directives
 pointed at multiple drives.


 Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?


 Row cache and key cache both help tremendously if your read pattern has a
 decent repeat rate.  Completely random io can only be so fast, however.

 -Brandon



Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
Still have high read latency with 50mil records in the 2-node cluster
(replica 2). I restarted both nodes but read latency is still above 60ms and
disk i/o saturation is high. Tried compact and repair but doesn't help much.
When I reduced the client threads from 15 to 5 it looks a lot better but
throughput is kind of low. I changed using flushing thread of 16 instead the
defaulted 8, could that cause the disk saturation issue?

For benchmark with decent throughput and latency, how many client threads do
they use? Can anyone share your storage-conf.xml in well-tuned high volume
cluster?

-Weijun

On Tue, Feb 16, 2010 at 10:31 AM, Stu Hood stu.h...@rackspace.com wrote:

  After I ran nodeprobe compact on node B its read latency went up to
 150ms.
 The compaction process can take a while to finish... in 0.5 you need to
 watch the logs to figure out when it has actually finished, and then you
 should start seeing the improvement in read latency.

  Is there any way to utilize all of the heap space to decrease the read
 latency?
 In 0.5 you can adjust the number of keys that are cached by changing the
 'KeysCachedFraction' parameter in your config file. In 0.6 you can
 additionally cache rows. You don't want to use up all of the memory on your
 box for those caches though: you'll want to leave at least 50% for your OS's
 disk cache, which will store the full row content.


 -Original Message-
 From: Weijun Li weiju...@gmail.com
 Sent: Tuesday, February 16, 2010 12:16pm
 To: cassandra-user@incubator.apache.org
 Subject: Re: Cassandra benchmark shows OK throughput but high read latency
 ( 100ms)?

 Thanks for for DataFileDirectory trick and I'll give a try.

 Just noticed the impact of number of data files: node A has 13 data files
 with read latency of 20ms and node B has 27 files with read latency of
 60ms.
 After I ran nodeprobe compact on node B its read latency went up to
 150ms.
 The read latency of node A became as low as 10ms. Is this normal behavior?
 I'm using random partitioner and the hardware/JVM settings are exactly the
 same for these two nodes.

 Another problem is that Java heap usage is always 900mb out of 6GB? Is
 there
 any way to utilize all of the heap space to decrease the read latency?

 -Weijun

 On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams dri...@gmail.com
 wrote:

  On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com wrote:
 
  One more thoughts about Martin's suggestion: is it possible to put the
  data files into multiple directories that are located in different
 physical
  disks? This should help to improve the i/o bottleneck issue.
 
 
  Yes, you can already do this, just add more DataFileDirectory
 directives
  pointed at multiple drives.
 
 
  Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
 
 
  Row cache and key cache both help tremendously if your read pattern has a
  decent repeat rate.  Completely random io can only be so fast, however.
 
  -Brandon
 





Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Weijun Li
Just started to play with the row cache feature in trunk: it seems to be
working fine so far except that for RowsCached parameter you need to specify
number of rows rather than a percentage (e.g., 20% doesn't work). Thanks
for this great feature that improves read latency dramatically so that disk
i/o is no longer a serious bottleneck.

The problem is: when you write to Cassandra it doesn't seem to put the new
keys in row cache (it is said to update instead invalidate if the entry is
already in cache). Is it easy to implement this feature? What are the
classes that should be touched for this? I'm guessing that
RowMutationVerbHandler should be the one to insert the entry in row cache?

-Weijun


Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
Yes my KeysCachedFraction is already 0.3 but it doesn't relief the disk i/o.
I compacted the data to be a 60GB (took quite a while to finish and it
increased latency as expected) one but doesn't help much either.

If I set KCF to 1 (meaning to cache all sstable index), how much memory will
it take for 50mil keys? Is the index a straight key-offset map? I guess key
is 16 bytes and offset is 8 bytes. Will KCF=1 help to reduce disk i/o?

-Weijun

On Tue, Feb 16, 2010 at 5:18 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Have you tried increasing KeysCachedFraction?

 On Tue, Feb 16, 2010 at 6:15 PM, Weijun Li weiju...@gmail.com wrote:
  Still have high read latency with 50mil records in the 2-node cluster
  (replica 2). I restarted both nodes but read latency is still above 60ms
 and
  disk i/o saturation is high. Tried compact and repair but doesn't help
 much.
  When I reduced the client threads from 15 to 5 it looks a lot better but
  throughput is kind of low. I changed using flushing thread of 16 instead
 the
  defaulted 8, could that cause the disk saturation issue?
 
  For benchmark with decent throughput and latency, how many client threads
 do
  they use? Can anyone share your storage-conf.xml in well-tuned high
 volume
  cluster?
 
  -Weijun
 
  On Tue, Feb 16, 2010 at 10:31 AM, Stu Hood stu.h...@rackspace.com
 wrote:
 
   After I ran nodeprobe compact on node B its read latency went up to
   150ms.
  The compaction process can take a while to finish... in 0.5 you need to
  watch the logs to figure out when it has actually finished, and then you
  should start seeing the improvement in read latency.
 
   Is there any way to utilize all of the heap space to decrease the read
   latency?
  In 0.5 you can adjust the number of keys that are cached by changing the
  'KeysCachedFraction' parameter in your config file. In 0.6 you can
  additionally cache rows. You don't want to use up all of the memory on
 your
  box for those caches though: you'll want to leave at least 50% for your
 OS's
  disk cache, which will store the full row content.
 
 
  -Original Message-
  From: Weijun Li weiju...@gmail.com
  Sent: Tuesday, February 16, 2010 12:16pm
  To: cassandra-user@incubator.apache.org
  Subject: Re: Cassandra benchmark shows OK throughput but high read
 latency
  ( 100ms)?
 
  Thanks for for DataFileDirectory trick and I'll give a try.
 
  Just noticed the impact of number of data files: node A has 13 data
 files
  with read latency of 20ms and node B has 27 files with read latency of
  60ms.
  After I ran nodeprobe compact on node B its read latency went up to
  150ms.
  The read latency of node A became as low as 10ms. Is this normal
 behavior?
  I'm using random partitioner and the hardware/JVM settings are exactly
 the
  same for these two nodes.
 
  Another problem is that Java heap usage is always 900mb out of 6GB? Is
  there
  any way to utilize all of the heap space to decrease the read latency?
 
  -Weijun
 
  On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams dri...@gmail.com
  wrote:
 
   On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com
 wrote:
  
   One more thoughts about Martin's suggestion: is it possible to put
 the
   data files into multiple directories that are located in different
   physical
   disks? This should help to improve the i/o bottleneck issue.
  
  
   Yes, you can already do this, just add more DataFileDirectory
   directives
   pointed at multiple drives.
  
  
   Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
  
  
   Row cache and key cache both help tremendously if your read pattern
 has
   a
   decent repeat rate.  Completely random io can only be so fast,
 however.
  
   -Brandon
  
 
 
 
 



Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Weijun Li
Just tried to make quick change to enable it but it didn't work out :-(

   ColumnFamily cachedRow = cfs.getRawCachedRow(mutation.key());

// What I modified
if( cachedRow == null ) {
cfs.cacheRow(mutation.key());
cachedRow = cfs.getRawCachedRow(mutation.key());
}

if (cachedRow != null)
cachedRow.addAll(columnFamily);

How can I open a ticket for you to make the change (enable row cache write
through with an option)?

Thanks,
-Weijun

On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com wrote:
  On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com wrote:
  Just started to play with the row cache feature in trunk: it seems to be
  working fine so far except that for RowsCached parameter you need to
 specify
  number of rows rather than a percentage (e.g., 20% doesn't work).
 
  20% works, but it's 20% of the rows at server startup.  So on a fresh
  start that is zero.
 
  Maybe we should just get rid of the % feature...

 (Actually, it shouldn't be hard to update this on flush, if you want
 to open a ticket.)



RE: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-15 Thread Weijun Li
It seems that read latency is sensitive to number of threads (or thrift
clients): after reducing number of threads to 15 and read latency decreased
to ~20ms. 

The other problem is: if I keep mixed write and read (e.g, 8 write threads
plus 7 read threads) against the 2-nodes cluster continuously, the read
latency will go up gradually (along with the size of Cassandra data file),
and at the end it will become ~40ms (up from ~20ms) even with only 15
threads. During this process the data file grew from 1.6GB to over 3GB even
if I kept writing the same key/values to Cassandra. It seems that Cassandra
keeps appending to sstable data files and will only clean up them during
node cleanup or compact (please correct me if this is incorrect). 
 
Here's my test settings:

JVM xmx: 6GB
KCF: 0.3
Memtable: 512MB.
Number of records: 1 millon (payload is 1000 bytes)

I used JMX and iostat to watch the cluster but can't find any clue for the
increasing read latency issue: JVM memory, GC, CPU usage, tpstats and io
saturation all seem to be clean. One exception is that the wait time in
iostat goes up quickly once a while but is a small number for most of the
time. Another thing I noticed is that JVM doesn't use more than 1GB of
memory (out of the 6GB I specified for JVM) even if I set KCF to 0.3 and
increased memtable size to 512MB.

Did I miss anything here? How can I diagnose this kind of increasing read
latency issue? Is there any performance tuning guide available?

Thanks,
-Weijun


-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Sunday, February 14, 2010 6:22 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra benchmark shows OK throughput but high read latency
( 100ms)?

are you i/o bound?  what is your on-disk data set size?  what does
iostats tell you?
http://spyced.blogspot.com/2010/01/linux-performance-basics.html

do you have a lot of pending compactions?  (tpstats will tell you)

have you increased KeysCachedFraction?

On Sun, Feb 14, 2010 at 8:18 PM, Weijun Li weiju...@gmail.com wrote:
 Hello,



 I saw some Cassandra benchmark reports mentioning read latency that is
less
 than 50ms or even 30ms. But my benchmark with 0.5 doesn't seem to support
 that. Here's my settings:



 Nodes: 2 machines. 2x2.5GHZ Xeon Quad Core (thus 8 cores), 8GB RAM

 ReplicationFactor=2 Partitioner=Random

 JVM Xmx: 4GB

 Memory table size: 512MB (haven't figured out how to enable binary
memtable
 so I set both memtable number to 512mb)

 Flushing threads: 2-4

 Payload: ~1000 bytes, 3 columns in one CF.

 Read/write time measure: get startTime right before each Java thrift call,
 transport objects are pre-created upon creation of each thread.



 The result shows that total write throughput is around 2000/sec (for 2
nodes
 in the cluster) which is not bad, and read throughput is just around
 750/sec. However for each thread the average read latency is more than
 100ms. I'm running 100 threads for the testing and each thread randomly
pick
 a node for thrift call. So the read/sec of each thread is just around 7.5,
 meaning duration of each thrift call is 1000/7.5=133ms. Without
replication
 the cluster write throughput is around 3300/s, and read throughput is
around
 1400/s, so the read latency is still around 70ms without replication.



 Is there anything wrong in my benchmark test? How can I achieve a
reasonable
 read latency ( 30ms)?



 Thanks,

 -Weijun







Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-14 Thread Weijun Li
Hello,

 

I saw some Cassandra benchmark reports mentioning read latency that is less 
than 50ms or even 30ms. But my benchmark with 0.5 doesn’t seem to support that. 
Here’s my settings:

 

Nodes: 2 machines. 2x2.5GHZ Xeon Quad Core (thus 8 cores), 8GB RAM

ReplicationFactor=2 Partitioner=Random

JVM Xmx: 4GB

Memory table size: 512MB (haven’t figured out how to enable binary memtable so 
I set both memtable number to 512mb)

Flushing threads: 2-4

Payload: ~1000 bytes, 3 columns in one CF.

Read/write time measure: get startTime right before each Java thrift call, 
transport objects are pre-created upon creation of each thread.

 

The result shows that total write throughput is around 2000/sec (for 2 nodes in 
the cluster) which is not bad, and read throughput is just around 750/sec. 
However for each thread the average read latency is more than 100ms. I’m 
running 100 threads for the testing and each thread randomly pick a node for 
thrift call. So the read/sec of each thread is just around 7.5, meaning 
duration of each thrift call is 1000/7.5=133ms. Without replication the cluster 
write throughput is around 3300/s, and read throughput is around 1400/s, so the 
read latency is still around 70ms without replication.

 

Is there anything wrong in my benchmark test? How can I achieve a reasonable 
read latency ( 30ms)?

 

Thanks,

-Weijun

 

 



RackAwareStrategy - add the third datacenter to live cluster with replication factor 3

2010-02-11 Thread Weijun Li
Hello,

I have a testing cluster with: A (dc1), B (dc1), C(dc2), D(dc2). The
replication factor is 2 so I assume each DC will have a complete copy of the
data. Also I'm using PropertyFileEndPointSnitch with rack.properties for the
dc and rack settings.

So, what's the steps to add another datacenter and increase replication
factor to 3 to ensure that dc3 will also get a complete copy of the data?
Meaning each of these 3 dc will have a complete copy of the data and they
keep synchronize with each other with new changes. What I'm guessing is:

1) Increase replication factor of A/B/C/D to 3, modify their rack.properties
to include E(dc3) and F(dc3) then restart them one by one. At this point E
and F haven't been started yet.
2) Bootstrap E and F (both from dc3) to join the cluster.

in this case, will cassandra automatically put the 3rd replica to E and F?

Thanks,
-Weijun

 P.S here what the cassandra document says about dc replication but I'm not
sure what will happen when you join nodes from the 3rd dc.


   -

   RackAwareStrategy: replica 2 is is placed in the first node along the
   ring the belongs in *another* data center than the first; the remaining
   N-2 replicas, if any, are placed on the first nodes along the ring in the
   *same* rack as the first


nodeprobe flush not implemented in 0.5?

2010-02-11 Thread Weijun Li
Hello,

I tried to run nodeprobe flush but it display the usage info without doing
anything? What are the list of supported command for nodeprobe?

Thanks,
-Weijun


Rebalance after adding new nodes

2010-02-11 Thread Weijun Li
When you add a new node, cassandra will pick the node that has the most data
then split its token. In this case the data distribution among all nodes
become uneven. What's the right strategy/steps to rebalance the node load
after adding new nodes? Here's one example: I have a cluster of node A, B,
C, D. Now I want to add E and F, after adding the nodes, the data
distribution will change from 1/1/1/1 to 1/1/0.5/0.5/0.5/0.5, is this
correct?

Thanks,
-Weijun


nodeprobe freezes when connecting to remote cassandra node

2010-02-09 Thread Weijun Li
Hello, got one more issue when I was trying to run nodeprobe to connect to a
remote cassandra node, it freezed for a while then showed the following
error. The jmxremote port 8080 is open, and I tried to  change the port but
it doesn't help. This command works properly if I run it in the same machine
as the node (thus localhost).

Thanks,
-Weijun

bin/nodeprobe --host [hostname] --port 8080 ring

Error connecting to remote JMX agent!
java.rmi.ConnectException: Connection refused to host: 10.xxx.xxx.xxx;
nested exception is:
java.net.ConnectException: Operation timed out
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601)
at
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110)
at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown
Source)
at
javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2327)
at
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:279)
at
javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:153)
at org.apache.cassandra.tools.NodeProbe.init(NodeProbe.java:115)
at org.apache.cassandra.tools.NodeProbe.main(NodeProbe.java:514)
Caused by: java.net.ConnectException: Operation timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)
at java.net.Socket.connect(Socket.java:525)
at java.net.Socket.connect(Socket.java:475)
at java.net.Socket.init(Socket.java:372)
at java.net.Socket.init(Socket.java:186)
at
sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22)
at
sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595)
... 10 more