Re: Testing row cache feature in trunk: write should put record in cache

2010-03-10 Thread Jonathan Ellis
Thanks for that, Daniel.

I'm pretty heads down finishing off the last 0.6 issues right now, but
this is on my list to get to.

On Mon, Mar 8, 2010 at 1:25 PM, Daniel Kluesing d...@bluekai.com wrote:
 This is interesting for the use cases I'm looking at Cassandra for, so if 
 that offer still stands I'll take you up on it. I took a crack at it in 
 https://issues.apache.org/jira/browse/CASSANDRA-860 - also in large part to 
 get my feet wet with the code.

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Tuesday, February 16, 2010 9:22 PM
 To: cassandra-user@incubator.apache.org
 Subject: Re: Testing row cache feature in trunk: write should put record in 
 cache

 ... tell you what, if you write the option-processing part in
 DatabaseDescriptor I will do the actual cache part. :)

 On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com wrote:
 https://issues.apache.org/jira/secure/CreateIssue!default.jspa, but
 this is pretty low priority for me.

 On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com wrote:
 Just tried to make quick change to enable it but it didn't work out :-(

    ColumnFamily cachedRow = cfs.getRawCachedRow(mutation.key());

     // What I modified
     if( cachedRow == null ) {
         cfs.cacheRow(mutation.key());
         cachedRow = cfs.getRawCachedRow(mutation.key());
     }

     if (cachedRow != null)
         cachedRow.addAll(columnFamily);

 How can I open a ticket for you to make the change (enable row cache write
 through with an option)?

 Thanks,
 -Weijun

 On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com wrote:
  On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com wrote:
  Just started to play with the row cache feature in trunk: it seems to
  be
  working fine so far except that for RowsCached parameter you need to
  specify
  number of rows rather than a percentage (e.g., 20% doesn't work).
 
  20% works, but it's 20% of the rows at server startup.  So on a fresh
  start that is zero.
 
  Maybe we should just get rid of the % feature...

 (Actually, it shouldn't be hard to update this on flush, if you want
 to open a ticket.)






RE: Testing row cache feature in trunk: write should put record in cache

2010-03-08 Thread Daniel Kluesing
This is interesting for the use cases I'm looking at Cassandra for, so if that 
offer still stands I'll take you up on it. I took a crack at it in 
https://issues.apache.org/jira/browse/CASSANDRA-860 - also in large part to get 
my feet wet with the code. 

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Tuesday, February 16, 2010 9:22 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Testing row cache feature in trunk: write should put record in 
cache

... tell you what, if you write the option-processing part in
DatabaseDescriptor I will do the actual cache part. :)

On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com wrote:
 https://issues.apache.org/jira/secure/CreateIssue!default.jspa, but
 this is pretty low priority for me.

 On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com wrote:
 Just tried to make quick change to enable it but it didn't work out :-(

    ColumnFamily cachedRow = cfs.getRawCachedRow(mutation.key());

     // What I modified
     if( cachedRow == null ) {
         cfs.cacheRow(mutation.key());
         cachedRow = cfs.getRawCachedRow(mutation.key());
     }

     if (cachedRow != null)
         cachedRow.addAll(columnFamily);

 How can I open a ticket for you to make the change (enable row cache write
 through with an option)?

 Thanks,
 -Weijun

 On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com wrote:
  On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com wrote:
  Just started to play with the row cache feature in trunk: it seems to
  be
  working fine so far except that for RowsCached parameter you need to
  specify
  number of rows rather than a percentage (e.g., 20% doesn't work).
 
  20% works, but it's 20% of the rows at server startup.  So on a fresh
  start that is zero.
 
  Maybe we should just get rid of the % feature...

 (Actually, it shouldn't be hard to update this on flush, if you want
 to open a ticket.)





Re: Testing row cache feature in trunk: write should put record in cache

2010-02-21 Thread Tatu Saloranta
On Sat, Feb 20, 2010 at 12:20 PM, Jonathan Ellis jbel...@gmail.com wrote:
 We don't use native java serialization for anything but the on-disk
 BitSets in our bloom filters (because those are deserialized once at
 startup, so the overhead doesn't matter), btw.

Right, tangential use is pretty immaterial. I misunderstood comments
to indicate that it would be used for payload (which is often not a
good idea for lots of reasons anyway).

 We're talking about adding compression after
 https://issues.apache.org/jira/browse/CASSANDRA-674.

Great. At that level it could have significant impact (if I understand
plan correctly).

-+ Tatu +-


Re: Testing row cache feature in trunk: write should put record in cache

2010-02-19 Thread Jonathan Ellis
The whole point of rowcache is to avoid the serialization overhead,
though.  If we just wanted the serialized form cached, we would let
the os block cache handle that without adding an extra layer.  (0.6
uses mmap'd i/o by default on 64bit JVMs so this is very efficient.)

On Fri, Feb 19, 2010 at 3:29 AM, Weijun Li weiju...@gmail.com wrote:
 The memory overhead issue is not directly related to GC because when JVM ran
 out of memory the GC has been very busy for quite a while. In my case JVM
 consumed all of the 6GB when the row cache size hit 1.4mil.

 I haven't started test the row cache feature yet. But I think data
 compression is useful to reduce memory consumption because in my impression
 disk i/o is always the bottleneck for Cassandra while its CPU usage is
 usually low all the time. In addition to this, compression should also help
 to reduce the number of java objects dramatically (correct me if I'm wrong),
 --especially in case we need to cache most of the data to achieve decent
 read latency.

 If ColumnFamily is serializable it shouldn't be that hard to implement the
 compression feature which can be controlled by an option (again :-) in
 storage conf xml.

 When I get to that point you can instruct me to implement this feature along
 with the row-cache-write-through. Our goal is straightforward: to support
 short read latency in high volume web application with write/read ratio to
 be 1:1.

 -Weijun

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Thursday, February 18, 2010 12:04 PM
 To: cassandra-user@incubator.apache.org
 Subject: Re: Testing row cache feature in trunk: write should put record in
 cache

 Did you force a GC from jconsole to make sure you weren't just
 measuring uncollected garbage?

 On Wed, Feb 17, 2010 at 2:51 PM, Weijun Li weiju...@gmail.com wrote:
 OK I'll work on the change later because there's another problem to solve:
 the overhead for cache is too big that 1.4mil records (1k each) consumed
 all
 of the 6gb memory of JVM (I guess 4gb are consumed by the row cache). I'm
 thinking that ConcurrentHashMap is not a good choice for LRU and the row
 cache needs to store compressed key data to reduce memory usage. I'll do
 more investigation on this and let you know.

 -Weijun

 On Tue, Feb 16, 2010 at 9:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

 ... tell you what, if you write the option-processing part in
 DatabaseDescriptor I will do the actual cache part. :)

 On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
  https://issues.apache.org/jira/secure/CreateIssue!default.jspa, but
  this is pretty low priority for me.
 
  On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com wrote:
  Just tried to make quick change to enable it but it didn't work out
 :-(
 
     ColumnFamily cachedRow =
  cfs.getRawCachedRow(mutation.key());
 
      // What I modified
      if( cachedRow == null ) {
          cfs.cacheRow(mutation.key());
          cachedRow = cfs.getRawCachedRow(mutation.key());
      }
 
      if (cachedRow != null)
          cachedRow.addAll(columnFamily);
 
  How can I open a ticket for you to make the change (enable row cache
  write
  through with an option)?
 
  Thanks,
  -Weijun
 
  On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
 
  On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
   On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com
   wrote:
   Just started to play with the row cache feature in trunk: it seems
   to
   be
   working fine so far except that for RowsCached parameter you need
   to
   specify
   number of rows rather than a percentage (e.g., 20% doesn't
 work).
  
   20% works, but it's 20% of the rows at server startup.  So on a
   fresh
   start that is zero.
  
   Maybe we should just get rid of the % feature...
 
  (Actually, it shouldn't be hard to update this on flush, if you want
  to open a ticket.)
 
 
 






Re: Testing row cache feature in trunk: write should put record in cache

2010-02-19 Thread Weijun Li
I see. How much is the overhead of java serialization? Does it slow down the
system a lot? It seems to be a tradeoff between CPU usage and memory.

As for mmap of 0.6, do you mmap the sstable data file even it is a lot
larger than the available memory (e.g., the data file is over 100GB while
you have only 8GB ram)? How efficient is mmap in this case? Is mmap already
checked into 0.6 branch?

-Weijun

On Fri, Feb 19, 2010 at 4:56 AM, Jonathan Ellis jbel...@gmail.com wrote:

 The whole point of rowcache is to avoid the serialization overhead,
 though.  If we just wanted the serialized form cached, we would let
 the os block cache handle that without adding an extra layer.  (0.6
 uses mmap'd i/o by default on 64bit JVMs so this is very efficient.)

 On Fri, Feb 19, 2010 at 3:29 AM, Weijun Li weiju...@gmail.com wrote:
  The memory overhead issue is not directly related to GC because when JVM
 ran
  out of memory the GC has been very busy for quite a while. In my case JVM
  consumed all of the 6GB when the row cache size hit 1.4mil.
 
  I haven't started test the row cache feature yet. But I think data
  compression is useful to reduce memory consumption because in my
 impression
  disk i/o is always the bottleneck for Cassandra while its CPU usage is
  usually low all the time. In addition to this, compression should also
 help
  to reduce the number of java objects dramatically (correct me if I'm
 wrong),
  --especially in case we need to cache most of the data to achieve decent
  read latency.
 
  If ColumnFamily is serializable it shouldn't be that hard to implement
 the
  compression feature which can be controlled by an option (again :-) in
  storage conf xml.
 
  When I get to that point you can instruct me to implement this feature
 along
  with the row-cache-write-through. Our goal is straightforward: to support
  short read latency in high volume web application with write/read ratio
 to
  be 1:1.
 
  -Weijun
 
  -Original Message-
  From: Jonathan Ellis [mailto:jbel...@gmail.com]
  Sent: Thursday, February 18, 2010 12:04 PM
  To: cassandra-user@incubator.apache.org
  Subject: Re: Testing row cache feature in trunk: write should put record
 in
  cache
 
  Did you force a GC from jconsole to make sure you weren't just
  measuring uncollected garbage?
 
  On Wed, Feb 17, 2010 at 2:51 PM, Weijun Li weiju...@gmail.com wrote:
  OK I'll work on the change later because there's another problem to
 solve:
  the overhead for cache is too big that 1.4mil records (1k each) consumed
  all
  of the 6gb memory of JVM (I guess 4gb are consumed by the row cache).
 I'm
  thinking that ConcurrentHashMap is not a good choice for LRU and the row
  cache needs to store compressed key data to reduce memory usage. I'll do
  more investigation on this and let you know.
 
  -Weijun
 
  On Tue, Feb 16, 2010 at 9:22 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  ... tell you what, if you write the option-processing part in
  DatabaseDescriptor I will do the actual cache part. :)
 
  On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
   https://issues.apache.org/jira/secure/CreateIssue!default.jspahttps://issues.apache.org/jira/secure/CreateIssue%21default.jspa,
 but
   this is pretty low priority for me.
  
   On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com
 wrote:
   Just tried to make quick change to enable it but it didn't work out
  :-(
  
  ColumnFamily cachedRow =
   cfs.getRawCachedRow(mutation.key());
  
   // What I modified
   if( cachedRow == null ) {
   cfs.cacheRow(mutation.key());
   cachedRow = cfs.getRawCachedRow(mutation.key());
   }
  
   if (cachedRow != null)
   cachedRow.addAll(columnFamily);
  
   How can I open a ticket for you to make the change (enable row cache
   write
   through with an option)?
  
   Thanks,
   -Weijun
  
   On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com
   wrote:
  
   On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com
 
   wrote:
On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com
wrote:
Just started to play with the row cache feature in trunk: it
 seems
to
be
working fine so far except that for RowsCached parameter you
 need
to
specify
number of rows rather than a percentage (e.g., 20% doesn't
  work).
   
20% works, but it's 20% of the rows at server startup.  So on a
fresh
start that is zero.
   
Maybe we should just get rid of the % feature...
  
   (Actually, it shouldn't be hard to update this on flush, if you
 want
   to open a ticket.)
  
  
  
 
 
 
 



Re: Testing row cache feature in trunk: write should put record in cache

2010-02-19 Thread Jonathan Ellis
mmap is designed to handle that case, yes.  it is already in 0.6 branch.

On Fri, Feb 19, 2010 at 2:44 PM, Weijun Li weiju...@gmail.com wrote:
 I see. How much is the overhead of java serialization? Does it slow down the
 system a lot? It seems to be a tradeoff between CPU usage and memory.

 As for mmap of 0.6, do you mmap the sstable data file even it is a lot
 larger than the available memory (e.g., the data file is over 100GB while
 you have only 8GB ram)? How efficient is mmap in this case? Is mmap already
 checked into 0.6 branch?

 -Weijun

 On Fri, Feb 19, 2010 at 4:56 AM, Jonathan Ellis jbel...@gmail.com wrote:

 The whole point of rowcache is to avoid the serialization overhead,
 though.  If we just wanted the serialized form cached, we would let
 the os block cache handle that without adding an extra layer.  (0.6
 uses mmap'd i/o by default on 64bit JVMs so this is very efficient.)

 On Fri, Feb 19, 2010 at 3:29 AM, Weijun Li weiju...@gmail.com wrote:
  The memory overhead issue is not directly related to GC because when JVM
  ran
  out of memory the GC has been very busy for quite a while. In my case
  JVM
  consumed all of the 6GB when the row cache size hit 1.4mil.
 
  I haven't started test the row cache feature yet. But I think data
  compression is useful to reduce memory consumption because in my
  impression
  disk i/o is always the bottleneck for Cassandra while its CPU usage is
  usually low all the time. In addition to this, compression should also
  help
  to reduce the number of java objects dramatically (correct me if I'm
  wrong),
  --especially in case we need to cache most of the data to achieve decent
  read latency.
 
  If ColumnFamily is serializable it shouldn't be that hard to implement
  the
  compression feature which can be controlled by an option (again :-) in
  storage conf xml.
 
  When I get to that point you can instruct me to implement this feature
  along
  with the row-cache-write-through. Our goal is straightforward: to
  support
  short read latency in high volume web application with write/read ratio
  to
  be 1:1.
 
  -Weijun
 
  -Original Message-
  From: Jonathan Ellis [mailto:jbel...@gmail.com]
  Sent: Thursday, February 18, 2010 12:04 PM
  To: cassandra-user@incubator.apache.org
  Subject: Re: Testing row cache feature in trunk: write should put record
  in
  cache
 
  Did you force a GC from jconsole to make sure you weren't just
  measuring uncollected garbage?
 
  On Wed, Feb 17, 2010 at 2:51 PM, Weijun Li weiju...@gmail.com wrote:
  OK I'll work on the change later because there's another problem to
  solve:
  the overhead for cache is too big that 1.4mil records (1k each)
  consumed
  all
  of the 6gb memory of JVM (I guess 4gb are consumed by the row cache).
  I'm
  thinking that ConcurrentHashMap is not a good choice for LRU and the
  row
  cache needs to store compressed key data to reduce memory usage. I'll
  do
  more investigation on this and let you know.
 
  -Weijun
 
  On Tue, Feb 16, 2010 at 9:22 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
 
  ... tell you what, if you write the option-processing part in
  DatabaseDescriptor I will do the actual cache part. :)
 
  On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
   https://issues.apache.org/jira/secure/CreateIssue!default.jspa, but
   this is pretty low priority for me.
  
   On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com
   wrote:
   Just tried to make quick change to enable it but it didn't work out
  :-(
  
      ColumnFamily cachedRow =
   cfs.getRawCachedRow(mutation.key());
  
       // What I modified
       if( cachedRow == null ) {
           cfs.cacheRow(mutation.key());
           cachedRow =
   cfs.getRawCachedRow(mutation.key());
       }
  
       if (cachedRow != null)
           cachedRow.addAll(columnFamily);
  
   How can I open a ticket for you to make the change (enable row
   cache
   write
   through with an option)?
  
   Thanks,
   -Weijun
  
   On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com
   wrote:
  
   On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis
   jbel...@gmail.com
   wrote:
On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com
wrote:
Just started to play with the row cache feature in trunk: it
seems
to
be
working fine so far except that for RowsCached parameter you
need
to
specify
number of rows rather than a percentage (e.g., 20% doesn't
  work).
   
20% works, but it's 20% of the rows at server startup.  So on a
fresh
start that is zero.
   
Maybe we should just get rid of the % feature...
  
   (Actually, it shouldn't be hard to update this on flush, if you
   want
   to open a ticket.)
  
  
  
 
 
 
 




Re: Testing row cache feature in trunk: write should put record in cache

2010-02-18 Thread Jonathan Ellis
Did you force a GC from jconsole to make sure you weren't just
measuring uncollected garbage?

On Wed, Feb 17, 2010 at 2:51 PM, Weijun Li weiju...@gmail.com wrote:
 OK I'll work on the change later because there's another problem to solve:
 the overhead for cache is too big that 1.4mil records (1k each) consumed all
 of the 6gb memory of JVM (I guess 4gb are consumed by the row cache). I'm
 thinking that ConcurrentHashMap is not a good choice for LRU and the row
 cache needs to store compressed key data to reduce memory usage. I'll do
 more investigation on this and let you know.

 -Weijun

 On Tue, Feb 16, 2010 at 9:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

 ... tell you what, if you write the option-processing part in
 DatabaseDescriptor I will do the actual cache part. :)

 On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
  https://issues.apache.org/jira/secure/CreateIssue!default.jspa, but
  this is pretty low priority for me.
 
  On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com wrote:
  Just tried to make quick change to enable it but it didn't work out :-(
 
     ColumnFamily cachedRow =
  cfs.getRawCachedRow(mutation.key());
 
      // What I modified
      if( cachedRow == null ) {
          cfs.cacheRow(mutation.key());
          cachedRow = cfs.getRawCachedRow(mutation.key());
      }
 
      if (cachedRow != null)
          cachedRow.addAll(columnFamily);
 
  How can I open a ticket for you to make the change (enable row cache
  write
  through with an option)?
 
  Thanks,
  -Weijun
 
  On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
 
  On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
   On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com
   wrote:
   Just started to play with the row cache feature in trunk: it seems
   to
   be
   working fine so far except that for RowsCached parameter you need
   to
   specify
   number of rows rather than a percentage (e.g., 20% doesn't work).
  
   20% works, but it's 20% of the rows at server startup.  So on a
   fresh
   start that is zero.
  
   Maybe we should just get rid of the % feature...
 
  (Actually, it shouldn't be hard to update this on flush, if you want
  to open a ticket.)
 
 
 




Re: Testing row cache feature in trunk: write should put record in cache

2010-02-17 Thread Weijun Li
OK I'll work on the change later because there's another problem to solve:
the overhead for cache is too big that 1.4mil records (1k each) consumed all
of the 6gb memory of JVM (I guess 4gb are consumed by the row cache). I'm
thinking that ConcurrentHashMap is not a good choice for LRU and the row
cache needs to store compressed key data to reduce memory usage. I'll do
more investigation on this and let you know.

-Weijun

On Tue, Feb 16, 2010 at 9:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

 ... tell you what, if you write the option-processing part in
 DatabaseDescriptor I will do the actual cache part. :)

 On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
  https://issues.apache.org/jira/secure/CreateIssue!default.jspahttps://issues.apache.org/jira/secure/CreateIssue%21default.jspa,
 but
  this is pretty low priority for me.
 
  On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com wrote:
  Just tried to make quick change to enable it but it didn't work out :-(
 
 ColumnFamily cachedRow =
 cfs.getRawCachedRow(mutation.key());
 
  // What I modified
  if( cachedRow == null ) {
  cfs.cacheRow(mutation.key());
  cachedRow = cfs.getRawCachedRow(mutation.key());
  }
 
  if (cachedRow != null)
  cachedRow.addAll(columnFamily);
 
  How can I open a ticket for you to make the change (enable row cache
 write
  through with an option)?
 
  Thanks,
  -Weijun
 
  On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
   On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com
 wrote:
   Just started to play with the row cache feature in trunk: it seems
 to
   be
   working fine so far except that for RowsCached parameter you need to
   specify
   number of rows rather than a percentage (e.g., 20% doesn't work).
  
   20% works, but it's 20% of the rows at server startup.  So on a fresh
   start that is zero.
  
   Maybe we should just get rid of the % feature...
 
  (Actually, it shouldn't be hard to update this on flush, if you want
  to open a ticket.)
 
 
 



Re: Testing row cache feature in trunk: write should put record in cache

2010-02-17 Thread Jonathan Ellis
Great!

On Wed, Feb 17, 2010 at 1:51 PM, Weijun Li weiju...@gmail.com wrote:
 OK I'll work on the change later because there's another problem to solve:
 the overhead for cache is too big that 1.4mil records (1k each) consumed all
 of the 6gb memory of JVM (I guess 4gb are consumed by the row cache). I'm
 thinking that ConcurrentHashMap is not a good choice for LRU and the row
 cache needs to store compressed key data to reduce memory usage. I'll do
 more investigation on this and let you know.

 -Weijun

 On Tue, Feb 16, 2010 at 9:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

 ... tell you what, if you write the option-processing part in
 DatabaseDescriptor I will do the actual cache part. :)

 On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
  https://issues.apache.org/jira/secure/CreateIssue!default.jspa, but
  this is pretty low priority for me.
 
  On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com wrote:
  Just tried to make quick change to enable it but it didn't work out :-(
 
     ColumnFamily cachedRow =
  cfs.getRawCachedRow(mutation.key());
 
      // What I modified
      if( cachedRow == null ) {
          cfs.cacheRow(mutation.key());
          cachedRow = cfs.getRawCachedRow(mutation.key());
      }
 
      if (cachedRow != null)
          cachedRow.addAll(columnFamily);
 
  How can I open a ticket for you to make the change (enable row cache
  write
  through with an option)?
 
  Thanks,
  -Weijun
 
  On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
 
  On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
   On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com
   wrote:
   Just started to play with the row cache feature in trunk: it seems
   to
   be
   working fine so far except that for RowsCached parameter you need
   to
   specify
   number of rows rather than a percentage (e.g., 20% doesn't work).
  
   20% works, but it's 20% of the rows at server startup.  So on a
   fresh
   start that is zero.
  
   Maybe we should just get rid of the % feature...
 
  (Actually, it shouldn't be hard to update this on flush, if you want
  to open a ticket.)
 
 
 




Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Weijun Li
Just started to play with the row cache feature in trunk: it seems to be
working fine so far except that for RowsCached parameter you need to specify
number of rows rather than a percentage (e.g., 20% doesn't work). Thanks
for this great feature that improves read latency dramatically so that disk
i/o is no longer a serious bottleneck.

The problem is: when you write to Cassandra it doesn't seem to put the new
keys in row cache (it is said to update instead invalidate if the entry is
already in cache). Is it easy to implement this feature? What are the
classes that should be touched for this? I'm guessing that
RowMutationVerbHandler should be the one to insert the entry in row cache?

-Weijun


Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Jonathan Ellis
On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com wrote:
 Just started to play with the row cache feature in trunk: it seems to be
 working fine so far except that for RowsCached parameter you need to specify
 number of rows rather than a percentage (e.g., 20% doesn't work).

20% works, but it's 20% of the rows at server startup.  So on a fresh
start that is zero.

Maybe we should just get rid of the % feature...

 The problem is: when you write to Cassandra it doesn't seem to put the new
 keys in row cache (it is said to update instead invalidate if the entry is
 already in cache). Is it easy to implement this feature?

It's deliberately not done.  For many (most?) workloads you don't want
fresh writes blowing away your read cache.  The code is in
Table.apply:

ColumnFamily cachedRow = cfs.getRawCachedRow(mutation.key());
if (cachedRow != null)
cachedRow.addAll(columnFamily);

I think it would be okay to have a WriteThrough option for what you're
asking, though.

-Jonathan


Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Jonathan Ellis
On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com wrote:
 On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com wrote:
 Just started to play with the row cache feature in trunk: it seems to be
 working fine so far except that for RowsCached parameter you need to specify
 number of rows rather than a percentage (e.g., 20% doesn't work).

 20% works, but it's 20% of the rows at server startup.  So on a fresh
 start that is zero.

 Maybe we should just get rid of the % feature...

(Actually, it shouldn't be hard to update this on flush, if you want
to open a ticket.)


Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Weijun Li
Just tried to make quick change to enable it but it didn't work out :-(

   ColumnFamily cachedRow = cfs.getRawCachedRow(mutation.key());

// What I modified
if( cachedRow == null ) {
cfs.cacheRow(mutation.key());
cachedRow = cfs.getRawCachedRow(mutation.key());
}

if (cachedRow != null)
cachedRow.addAll(columnFamily);

How can I open a ticket for you to make the change (enable row cache write
through with an option)?

Thanks,
-Weijun

On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com wrote:
  On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com wrote:
  Just started to play with the row cache feature in trunk: it seems to be
  working fine so far except that for RowsCached parameter you need to
 specify
  number of rows rather than a percentage (e.g., 20% doesn't work).
 
  20% works, but it's 20% of the rows at server startup.  So on a fresh
  start that is zero.
 
  Maybe we should just get rid of the % feature...

 (Actually, it shouldn't be hard to update this on flush, if you want
 to open a ticket.)



Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Jonathan Ellis
... tell you what, if you write the option-processing part in
DatabaseDescriptor I will do the actual cache part. :)

On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com wrote:
 https://issues.apache.org/jira/secure/CreateIssue!default.jspa, but
 this is pretty low priority for me.

 On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com wrote:
 Just tried to make quick change to enable it but it didn't work out :-(

    ColumnFamily cachedRow = cfs.getRawCachedRow(mutation.key());

     // What I modified
     if( cachedRow == null ) {
         cfs.cacheRow(mutation.key());
         cachedRow = cfs.getRawCachedRow(mutation.key());
     }

     if (cachedRow != null)
         cachedRow.addAll(columnFamily);

 How can I open a ticket for you to make the change (enable row cache write
 through with an option)?

 Thanks,
 -Weijun

 On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com wrote:
  On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com wrote:
  Just started to play with the row cache feature in trunk: it seems to
  be
  working fine so far except that for RowsCached parameter you need to
  specify
  number of rows rather than a percentage (e.g., 20% doesn't work).
 
  20% works, but it's 20% of the rows at server startup.  So on a fresh
  start that is zero.
 
  Maybe we should just get rid of the % feature...

 (Actually, it shouldn't be hard to update this on flush, if you want
 to open a ticket.)