Re: Cassandra row-cache API

2020-12-14 Thread Chidamber Kulkarni
Thanks Jeff, the summary at the end is very insightful - something we also
are observing.

On a related note, we do observe that the "first N clustering" doesn't
exactly behave the way it is documented to. Is it related to this open
ticket CASSANDRA-8646 <https://issues.apache.org/jira/browse/CASSANDRA-8646>
?



On Mon, Dec 14, 2020 at 9:53 AM Jeff Jirsa  wrote:

> Sometime around 2.0 or 2.1, it was changed from a "partition cache" to a
> "head of the partition cache", where "head of the partition" means "first N
> clustering".
>
> The reason individual rows are "hard" is the same reason most things with
> Cassandra caching and consistency are "hard" - a clustering / row may not
> change, but it may be deleted by a range delete that deletes it and many
> other clusterings / rows, which makes maintaining correctness of an
> individual row cache not that different from maintenance of the data around
> it, which ends up looking a lot like "keep a part of the partition in
> memory", which is basically what's there now.
>
> That said:
> - The implementation is not great. I haven't looked into specifics but it
> is incredibly rare to find a use case where it's a win, even on very narrow
> partitions (you basically need workloads that are ALMOST immutable), partly
> because:
> - You're still caching the data on one replica of N, and caching the
> converged result usually ends up to be a bigger win and easier to
> manage/invalidate. So memcached/redis/etc outside of the result still
> usually ends up better.
>
>
>
>
> On Mon, Dec 14, 2020 at 9:44 AM Chidamber Kulkarni 
> wrote:
>
>> Hello All,
>>
>> Wondering if anyone has tried to modify the row-cache API to use both the
>> partition key and the clustering keys to convert the row-cache, which is
>> really a partition cache today, into a true row-cache? This might help with
>> broader adoption of row-cache for use-cases with large partition sizes.
>> Would appreciate any thoughts from the experts here.
>>
>> thanks,
>> Chidamber
>>
>>


Re: Cassandra row-cache API

2020-12-14 Thread Jeff Jirsa
Sometime around 2.0 or 2.1, it was changed from a "partition cache" to a
"head of the partition cache", where "head of the partition" means "first N
clustering".

The reason individual rows are "hard" is the same reason most things with
Cassandra caching and consistency are "hard" - a clustering / row may not
change, but it may be deleted by a range delete that deletes it and many
other clusterings / rows, which makes maintaining correctness of an
individual row cache not that different from maintenance of the data around
it, which ends up looking a lot like "keep a part of the partition in
memory", which is basically what's there now.

That said:
- The implementation is not great. I haven't looked into specifics but it
is incredibly rare to find a use case where it's a win, even on very narrow
partitions (you basically need workloads that are ALMOST immutable), partly
because:
- You're still caching the data on one replica of N, and caching the
converged result usually ends up to be a bigger win and easier to
manage/invalidate. So memcached/redis/etc outside of the result still
usually ends up better.




On Mon, Dec 14, 2020 at 9:44 AM Chidamber Kulkarni 
wrote:

> Hello All,
>
> Wondering if anyone has tried to modify the row-cache API to use both the
> partition key and the clustering keys to convert the row-cache, which is
> really a partition cache today, into a true row-cache? This might help with
> broader adoption of row-cache for use-cases with large partition sizes.
> Would appreciate any thoughts from the experts here.
>
> thanks,
> Chidamber
>
>


Cassandra row-cache API

2020-12-14 Thread Chidamber Kulkarni
Hello All,

Wondering if anyone has tried to modify the row-cache API to use both the
partition key and the clustering keys to convert the row-cache, which is
really a partition cache today, into a true row-cache? This might help with
broader adoption of row-cache for use-cases with large partition sizes.
Would appreciate any thoughts from the experts here.

thanks,
Chidamber


JMX for row cache churn

2018-08-20 Thread John Sumsion
Is there a JMX property somewhere that I could monitor to see how old the 
oldest row cache item is?


I want to see how much churn there is.


Thanks in advance,

John...


Re: Row cache functionality - Some confusion

2018-03-13 Thread Rahul Singh
It’s pretty clear to me that the only thing that gets put into the caches are 
the top N rows.

https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L523

It may fetch more, but it doesn’t cache it. It may get more if its not the full 
partition cache, but theres no code that inserts into the CacheService except

https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L528



--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 12, 2018, 8:56 AM -0400, Hannu Kröger <hkro...@gmail.com>, wrote:
>
> > On 12 Mar 2018, at 14:45, Rahul Singh <rahul.xavier.si...@gmail.com> wrote:
> >
> > I may be wrong, but what I’ve read and used in the past assumes that the 
> > “first” N rows are cached and the clustering key design is how I change 
> > what N rows are put into memory. Looking at the code, it seems that’s the 
> > case.
>
> So we agree that we row cache is storing only N rows from the beginning of 
> the partition. So if only the last row in a partition is read, then it 
> probably doesn’t get cached assuming there are more than N rows in a 
> partition?
>
> > The language of the comment basically says that it holds in cache what 
> > satisfies the query if and only if it’s the head of the partition, if not 
> > it fetches it and saves it - I dont interpret it differently from what I 
> > have seen in the documentation.
>
> Hmm, I’m trying to understand this. Does it mean that it stores the results 
> in cache if it is head and if not, it will fetch the head and store that 
> (instead of the results for the query) ?
>
> Hannu


Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger

> On 12 Mar 2018, at 14:45, Rahul Singh <rahul.xavier.si...@gmail.com> wrote:
> 
> I may be wrong, but what I’ve read and used in the past assumes that the 
> “first” N rows are cached and the clustering key design is how I change what 
> N rows are put into memory. Looking at the code, it seems that’s the case. 

So we agree that we row cache is storing only N rows from the beginning of the 
partition. So if only the last row in a partition is read, then it probably 
doesn’t get cached assuming there are more than N rows in a partition?

> The language of the comment basically says that it holds in cache what 
> satisfies the query if and only if it’s the head of the partition, if not it 
> fetches it and saves it - I dont interpret it differently from what I have 
> seen in the documentation. 

Hmm, I’m trying to understand this. Does it mean that it stores the results in 
cache if it is head and if not, it will fetch the head and store that (instead 
of the results for the query) ?

Hannu

Re: Row cache functionality - Some confusion

2018-03-12 Thread Rahul Singh
I may be wrong, but what I’ve read and used in the past assumes that the 
“first” N rows are cached and the clustering key design is how I change what N 
rows are put into memory. Looking at the code, it seems that’s the case.

The language of the comment basically says that it holds in cache what 
satisfies the query if and only if it’s the head of the partition, if not it 
fetches it and saves it - I dont interpret it differently from what I have seen 
in the documentation.



--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 12, 2018, 7:13 AM -0400, Hannu Kröger , wrote:
>
> rows_per_partition


Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger
Hi,

My goal is to make sure that I understand functionality correctly and that the 
documentation is accurate. 

The question in other words: Is the documentation or the comment in the code 
wrong (or inaccurate).

Hannu

> On 12 Mar 2018, at 13:00, Rahul Singh <rahul.xavier.si...@gmail.com> wrote:
> 
> What’s the goal? How big are your partitions , size in MB and in rows?
> 
> --
> Rahul Singh
> rahul.si...@anant.us
> 
> Anant Corporation
> 
> On Mar 12, 2018, 6:37 AM -0400, Hannu Kröger <hkro...@gmail.com>, wrote:
>> Anyone?
>> 
>>> On 4 Mar 2018, at 20:45, Hannu Kröger <hkro...@gmail.com 
>>> <mailto:hkro...@gmail.com>> wrote:
>>> 
>>> Hello,
>>> 
>>> I am trying to verify and understand fully the functionality of row cache 
>>> in Cassandra.
>>> 
>>> I have been using mainly two different sources for information:
>>> https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
>>>  
>>> <https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476>
>>> AND
>>> http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options 
>>> <http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options>
>>> 
>>> and based on what I read documentation is not correct. 
>>> 
>>> Documentation says like this:
>>> “rows_per_partition: The amount of rows to cache per partition (“row 
>>> cache”). If an integer n is specified, the first n queried rows of a 
>>> partition will be cached. Other possible options are ALL, to cache all rows 
>>> of a queried partition, or NONE to disable row caching.”
>>> 
>>> The problematic part is "the first n queried rows of a partition will be 
>>> cached”. Shouldn’t it be that the first N rows in a partition will be 
>>> cached? Not first N that are queried?
>>> 
>>> If this is the case, I’m more than happy to create a ticket (and maybe even 
>>> create a patch) for the doc update.
>>> 
>>> BR,
>>> Hannu
>>> 
>> 



Re: Row cache functionality - Some confusion

2018-03-12 Thread Rahul Singh
What’s the goal? How big are your partitions , size in MB and in rows?

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 12, 2018, 6:37 AM -0400, Hannu Kröger <hkro...@gmail.com>, wrote:
> Anyone?
>
> > On 4 Mar 2018, at 20:45, Hannu Kröger <hkro...@gmail.com> wrote:
> >
> > Hello,
> >
> > I am trying to verify and understand fully the functionality of row cache 
> > in Cassandra.
> >
> > I have been using mainly two different sources for information:
> > https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
> > AND
> > http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options
> >
> > and based on what I read documentation is not correct.
> >
> > Documentation says like this:
> > “rows_per_partition: The amount of rows to cache per partition (“row 
> > cache”). If an integer n is specified, the first n queried rows of a 
> > partition will be cached. Other possible options are ALL, to cache all rows 
> > of a queried partition, or NONE to disable row caching.”
> >
> > The problematic part is "the first n queried rows of a partition will be 
> > cached”. Shouldn’t it be that the first N rows in a partition will be 
> > cached? Not first N that are queried?
> >
> > If this is the case, I’m more than happy to create a ticket (and maybe even 
> > create a patch) for the doc update.
> >
> > BR,
> > Hannu
> >
>


Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger
Anyone?

> On 4 Mar 2018, at 20:45, Hannu Kröger <hkro...@gmail.com> wrote:
> 
> Hello,
> 
> I am trying to verify and understand fully the functionality of row cache in 
> Cassandra.
> 
> I have been using mainly two different sources for information:
> https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
>  
> <https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476>
> AND
> http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options 
> <http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options>
> 
> and based on what I read documentation is not correct. 
> 
> Documentation says like this:
> “rows_per_partition: The amount of rows to cache per partition (“row cache”). 
> If an integer n is specified, the first n queried rows of a partition will be 
> cached. Other possible options are ALL, to cache all rows of a queried 
> partition, or NONE to disable row caching.”
> 
> The problematic part is "the first n queried rows of a partition will be 
> cached”. Shouldn’t it be that the first N rows in a partition will be cached? 
> Not first N that are queried?
> 
> If this is the case, I’m more than happy to create a ticket (and maybe even 
> create a patch) for the doc update.
> 
> BR,
> Hannu
> 



Row cache functionality - Some confusion

2018-03-04 Thread Hannu Kröger
Hello,

I am trying to verify and understand fully the functionality of row cache in 
Cassandra.

I have been using mainly two different sources for information:
https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
 
<https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476>
AND
http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options 
<http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options>

and based on what I read documentation is not correct. 

Documentation says like this:
“rows_per_partition: The amount of rows to cache per partition (“row cache”). 
If an integer n is specified, the first n queried rows of a partition will be 
cached. Other possible options are ALL, to cache all rows of a queried 
partition, or NONE to disable row caching.”

The problematic part is "the first n queried rows of a partition will be 
cached”. Shouldn’t it be that the first N rows in a partition will be cached? 
Not first N that are queried?

If this is the case, I’m more than happy to create a ticket (and maybe even 
create a patch) for the doc update.

BR,
Hannu



??????RE: Row Cache hit issue

2017-09-19 Thread Peng Xiao
Thanks All.




--  --
??: "Steinmaurer, Thomas";<thomas.steinmau...@dynatrace.com>;
: 2017??9??20??(??) 1:38
??: "user@cassandra.apache.org"<user@cassandra.apache.org>;

: RE: Row Cache hit issue



  
Hi,
 
 
 
additionally, with saved (key) caches, we had some sort of corruption (I think, 
for whatever reason) once. So, if you  see something like that upon Cassandra 
startup:
 
 
 
INFO [main] 2017-01-04 15:38:58,772 AutoSavingCache.java (line 114) reading 
saved cache /var/opt/xxx/cassandra/saved_caches/ks-cf-KeyCache-b.db
 
ERROR [main] 2017-01-04 15:38:58,891 CassandraDaemon.java (line 571) Exception 
encountered during startup
 
java.lang.OutOfMemoryError: Java heap space
 
at java.util.ArrayList.(ArrayList.java:152)
 
at 
org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:132)
 
at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
 
at 
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
 
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:276)
 
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:435)
 
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:406)
 
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:322)
 
at org.apache.cassandra.db.Keyspace.(Keyspace.java:268)
 
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
 
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
 
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:364)
 
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
 
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)
 
 
 
resulting in Cassandra going OOM, with a ??reading saved cache?? log entry 
close before the OOM, you may have hit some sort  of corruption. Workaround is 
to physically delete the saved cache file and Cassandra will start up just fine.
 
 
 
Regards,
 
Thomas
 
 
 
 
 
From: Dikang Gu [mailto:dikan...@gmail.com] 
 Sent: Mittwoch, 20. September 2017 06:06
 To: cassandra <user@cassandra.apache.org>
 Subject: Re: Row Cache hit issue
 
 
  
Hi Peng,
  
 
 
  
C* periodically saves cache to disk, to solve cold start problem. If 
row_cache_save_period=0, it means C* does not save cache to disk. But the cache 
is still working, if it's enabled in table schema, just the cache will be empty 
after restart.
 
  
 
 
  
--Dikang.
 
 
  
 
  
On Tue, Sep 19, 2017 at 8:27 PM, Peng Xiao <2535...@qq.com> wrote:
   
And we are using C* 2.1.18.
 
   
 
 
  
 
 
  
-- Original --
 
   
From:  "";<2535...@qq.com>;
 
  
Date:  Wed, Sep 20, 2017 11:27 AM
 
  
To:  "user"<user@cassandra.apache.org>;
 
  
Subject:  Row Cache hit issue
 
 
    
 
 
  
Dear All,
 
  
 
 
  
The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
 
  
but we can still see the row cache hit.
 
  
 
 
  
Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds
 
  
 
 
  
Could anyone please explain this?
 
  
 
 
  
Thanks,
 
  
Peng Xiao
 
 
 
 
  
 

 
 
  
 
 
 
-- 
   
Dikang
  
 
 
 
 
 
 
 The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received  it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?0?1dterstra?0?8e 313

RE: Row Cache hit issue

2017-09-19 Thread Steinmaurer, Thomas
Hi,

additionally, with saved (key) caches, we had some sort of corruption (I think, 
for whatever reason) once. So, if you see something like that upon Cassandra 
startup:

INFO [main] 2017-01-04 15:38:58,772 AutoSavingCache.java (line 114) reading 
saved cache /var/opt/xxx/cassandra/saved_caches/ks-cf-KeyCache-b.db
ERROR [main] 2017-01-04 15:38:58,891 CassandraDaemon.java (line 571) Exception 
encountered during startup
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.(ArrayList.java:152)
at 
org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:132)
at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
at 
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:276)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:435)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:406)
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:322)
at org.apache.cassandra.db.Keyspace.(Keyspace.java:268)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:364)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)

resulting in Cassandra going OOM, with a “reading saved cache” log entry close 
before the OOM, you may have hit some sort of corruption. Workaround is to 
physically delete the saved cache file and Cassandra will start up just fine.

Regards,
Thomas


From: Dikang Gu [mailto:dikan...@gmail.com]
Sent: Mittwoch, 20. September 2017 06:06
To: cassandra <user@cassandra.apache.org>
Subject: Re: Row Cache hit issue

Hi Peng,

C* periodically saves cache to disk, to solve cold start problem. If 
row_cache_save_period=0, it means C* does not save cache to disk. But the cache 
is still working, if it's enabled in table schema, just the cache will be empty 
after restart.

--Dikang.

On Tue, Sep 19, 2017 at 8:27 PM, Peng Xiao 
<2535...@qq.com<mailto:2535...@qq.com>> wrote:
And we are using C* 2.1.18.


-- Original --
From:  "我自己的邮箱";<2535...@qq.com<mailto:2535...@qq.com>>;
Date:  Wed, Sep 20, 2017 11:27 AM
To:  "user"<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>;
Subject:  Row Cache hit issue

Dear All,

The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
but we can still see the row cache hit.

Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds

Could anyone please explain this?

Thanks,
Peng Xiao



--
Dikang

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Re: Row Cache hit issue

2017-09-19 Thread Dikang Gu
Hi Peng,

C* periodically saves cache to disk, to solve cold start problem. If
row_cache_save_period=0, it means C* does not save cache to disk. But the
cache is still working, if it's enabled in table schema, just the cache
will be empty after restart.

--Dikang.

On Tue, Sep 19, 2017 at 8:27 PM, Peng Xiao <2535...@qq.com> wrote:

> And we are using C* 2.1.18.
>
>
> -- Original --
> *From: * "我自己的邮箱";<2535...@qq.com>;
> *Date: * Wed, Sep 20, 2017 11:27 AM
> *To: * "user"<user@cassandra.apache.org>;
> *Subject: * Row Cache hit issue
>
> Dear All,
>
> The default row_cache_save_period=0,looks Row Cache does not work in this
> situation?
> but we can still see the row cache hit.
>
> Row Cache  : entries 202787, size 100 MB, capacity 100 MB,
> 3095293 hits, 6796801 requests, 0.455 recent hit rate, 0 save period in
> seconds
>
> Could anyone please explain this?
>
> Thanks,
> Peng Xiao
>



-- 
Dikang


Re: Row Cache hit issue

2017-09-19 Thread Peng Xiao
And we are using C* 2.1.18.




-- Original --
From:  "";<2535...@qq.com>;
Date:  Wed, Sep 20, 2017 11:27 AM
To:  "user"<user@cassandra.apache.org>;

Subject:  Row Cache hit issue



Dear All,


The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
but we can still see the row cache hit.


Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds


Could anyone please explain this?


Thanks,
Peng Xiao

Row Cache hit issue

2017-09-19 Thread Peng Xiao
Dear All,


The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
but we can still see the row cache hit.


Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds


Could anyone please explain this?


Thanks,
Peng Xiao

Re: Row cache tuning

2017-03-13 Thread Thomas Julian
Hi Matija,



​Leveraging page cache yields good results and if accounted for can provide you 
with performance increase on read side
​

I would like to leverage the page cache to improve read performance. How this 
can be done?




Best Regards,

Julian.











 On Mon, 13 Mar 2017 03:42:32 +0530 preetika tyagi 
preetikaty...@gmail.com wrote 




I see. Thanks, Arvydas!



In terms of eviction policy in the row cache, does a write operation 
invalidates only the row(s) which are going be modified or the whole partition? 
In older version of Cassandra, I believe the whole partition gets invalidated 
even if only one row is modified. Is that still true for the latest release 
(3.10). I browsed through many online articles and tutorials but cannot find 
information on this.




On Sun, Mar 12, 2017 at 2:25 PM, Arvydas Jonusonis 
arvydas.jonuso...@gmail.com wrote:






You can experiment quite easily without even needing to restart the Cassandra 
service.



The caches (row and key) can be enabled on a table-by-table basis via a schema 
directive. But the cache capacity (which is the one that you referred to in 
your original post, set to 0 in cassandra.yaml) is a global setting and can be 
manipulated via JMX or nodetool (nodetool setcachecapacity).



Arvydas



On Sun, Mar 12, 2017 at 9:46 AM, preetika tyagi preetikaty...@gmail.com 
wrote:

Thanks, Matija! That was insightful.



I don't really have a use case in particular, however, what I'm trying to do is 
to figure out how the Cassandra performance can be leveraged by using different 
caching mechanisms, such as row cache, key cache, partition summary etc. Of 
course, it will also heavily depend on the type of workload but I'm trying to 
gain more understanding of what's available in the Cassandra framework.



Also, I read somewhere that either row cache or key cache can be turned on for 
a specific table, not both. Based on your comment, I guess the combination of 
page cache and key cache is used widely for tuning the performance.



Thanks,

Preetika




On Sat, Mar 11, 2017 at 2:01 PM, Matija Gobec matija0...@gmail.com 
wrote:

Hi,



In 99% of use cases Cassandra's row cache is not something you should look 
into. Leveraging page cache yields good results and if accounted for can 
provide you with performance increase on read side.

I'm not a fan of a default row cache implementation and its invalidation 
mechanism on updates so you really need to be careful when and how you use it. 
There isn't much to configuration as there is to your use case. Maybe explain 
what are you trying to solve with row cache and people can get into discussion 
with more context.



Regards,

Matija




On Sat, Mar 11, 2017 at 9:15 PM, preetika tyagi preetikaty...@gmail.com 
wrote:

Hi,



I'm new to Cassandra and trying to get a better understanding on how the row 
cache can be tuned to optimize the performance.



I came across think this article: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsConfiguringCaches.html



And it suggests not to even touch row cache unless read workload is  95% 
and mostly rely on machine's default cache mechanism which comes with OS.



The default row cache size is 0 in cassandra.yaml file so the row cache won't 
be utilized at all.



Therefore, I'm wondering how exactly I can decide to chose to tweak row cache 
if needed. Are there any good pointers one can provide on this?



Thanks,

Preetika






















Re: Row cache tuning

2017-03-12 Thread preetika tyagi
I see. Thanks, Arvydas!

In terms of eviction policy in the row cache, does a write operation
invalidates only the row(s) which are going be modified or the whole
partition? In older version of Cassandra, I believe the whole partition
gets invalidated even if only one row is modified. Is that still true for
the latest release (3.10). I browsed through many online articles and
tutorials but cannot find information on this.

On Sun, Mar 12, 2017 at 2:25 PM, Arvydas Jonusonis <
arvydas.jonuso...@gmail.com> wrote:

> You can experiment quite easily without even needing to restart the
> Cassandra service.
>
> The caches (row and key) can be enabled on a table-by-table basis via a
> schema directive. But the cache capacity (which is the one that you
> referred to in your original post, set to 0 in cassandra.yaml) is a global
> setting and can be manipulated via JMX or nodetool (nodetool
> setcachecapacity
> <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSetCacheCapacity.html>
> ).
>
> Arvydas
>
> On Sun, Mar 12, 2017 at 9:46 AM, preetika tyagi <preetikaty...@gmail.com>
> wrote:
>
>> Thanks, Matija! That was insightful.
>>
>> I don't really have a use case in particular, however, what I'm trying to
>> do is to figure out how the Cassandra performance can be leveraged by using
>> different caching mechanisms, such as row cache, key cache, partition
>> summary etc. Of course, it will also heavily depend on the type of workload
>> but I'm trying to gain more understanding of what's available in the
>> Cassandra framework.
>>
>> Also, I read somewhere that either row cache or key cache can be turned
>> on for a specific table, not both. Based on your comment, I guess the
>> combination of page cache and key cache is used widely for tuning the
>> performance.
>>
>> Thanks,
>> Preetika
>>
>> On Sat, Mar 11, 2017 at 2:01 PM, Matija Gobec <matija0...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> In 99% of use cases Cassandra's row cache is not something you should
>>> look into. Leveraging page cache yields good results and if accounted for
>>> can provide you with performance increase on read side.
>>> I'm not a fan of a default row cache implementation and its invalidation
>>> mechanism on updates so you really need to be careful when and how you use
>>> it. There isn't much to configuration as there is to your use case. Maybe
>>> explain what are you trying to solve with row cache and people can get into
>>> discussion with more context.
>>>
>>> Regards,
>>> Matija
>>>
>>> On Sat, Mar 11, 2017 at 9:15 PM, preetika tyagi <preetikaty...@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm new to Cassandra and trying to get a better understanding on how
>>>> the row cache can be tuned to optimize the performance.
>>>>
>>>> I came across think this article: https://docs.datastax
>>>> .com/en/cassandra/3.0/cassandra/operations/opsConfiguringCaches.html
>>>>
>>>> And it suggests not to even touch row cache unless read workload is >
>>>> 95% and mostly rely on machine's default cache mechanism which comes with
>>>> OS.
>>>>
>>>> The default row cache size is 0 in cassandra.yaml file so the row cache
>>>> won't be utilized at all.
>>>>
>>>> Therefore, I'm wondering how exactly I can decide to chose to tweak row
>>>> cache if needed. Are there any good pointers one can provide on this?
>>>>
>>>> Thanks,
>>>> Preetika
>>>>
>>>
>>>
>>
>


Re: Row cache tuning

2017-03-12 Thread Arvydas Jonusonis
You can experiment quite easily without even needing to restart the
Cassandra service.

The caches (row and key) can be enabled on a table-by-table basis via a
schema directive. But the cache capacity (which is the one that you
referred to in your original post, set to 0 in cassandra.yaml) is a global
setting and can be manipulated via JMX or nodetool (nodetool
setcachecapacity
<https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSetCacheCapacity.html>
).

Arvydas

On Sun, Mar 12, 2017 at 9:46 AM, preetika tyagi <preetikaty...@gmail.com>
wrote:

> Thanks, Matija! That was insightful.
>
> I don't really have a use case in particular, however, what I'm trying to
> do is to figure out how the Cassandra performance can be leveraged by using
> different caching mechanisms, such as row cache, key cache, partition
> summary etc. Of course, it will also heavily depend on the type of workload
> but I'm trying to gain more understanding of what's available in the
> Cassandra framework.
>
> Also, I read somewhere that either row cache or key cache can be turned on
> for a specific table, not both. Based on your comment, I guess the
> combination of page cache and key cache is used widely for tuning the
> performance.
>
> Thanks,
> Preetika
>
> On Sat, Mar 11, 2017 at 2:01 PM, Matija Gobec <matija0...@gmail.com>
> wrote:
>
>> Hi,
>>
>> In 99% of use cases Cassandra's row cache is not something you should
>> look into. Leveraging page cache yields good results and if accounted for
>> can provide you with performance increase on read side.
>> I'm not a fan of a default row cache implementation and its invalidation
>> mechanism on updates so you really need to be careful when and how you use
>> it. There isn't much to configuration as there is to your use case. Maybe
>> explain what are you trying to solve with row cache and people can get into
>> discussion with more context.
>>
>> Regards,
>> Matija
>>
>> On Sat, Mar 11, 2017 at 9:15 PM, preetika tyagi <preetikaty...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm new to Cassandra and trying to get a better understanding on how the
>>> row cache can be tuned to optimize the performance.
>>>
>>> I came across think this article: https://docs.datastax
>>> .com/en/cassandra/3.0/cassandra/operations/opsConfiguringCaches.html
>>>
>>> And it suggests not to even touch row cache unless read workload is >
>>> 95% and mostly rely on machine's default cache mechanism which comes with
>>> OS.
>>>
>>> The default row cache size is 0 in cassandra.yaml file so the row cache
>>> won't be utilized at all.
>>>
>>> Therefore, I'm wondering how exactly I can decide to chose to tweak row
>>> cache if needed. Are there any good pointers one can provide on this?
>>>
>>> Thanks,
>>> Preetika
>>>
>>
>>
>


Re: Row cache tuning

2017-03-12 Thread preetika tyagi
Thanks, Matija! That was insightful.

I don't really have a use case in particular, however, what I'm trying to
do is to figure out how the Cassandra performance can be leveraged by using
different caching mechanisms, such as row cache, key cache, partition
summary etc. Of course, it will also heavily depend on the type of workload
but I'm trying to gain more understanding of what's available in the
Cassandra framework.

Also, I read somewhere that either row cache or key cache can be turned on
for a specific table, not both. Based on your comment, I guess the
combination of page cache and key cache is used widely for tuning the
performance.

Thanks,
Preetika

On Sat, Mar 11, 2017 at 2:01 PM, Matija Gobec <matija0...@gmail.com> wrote:

> Hi,
>
> In 99% of use cases Cassandra's row cache is not something you should look
> into. Leveraging page cache yields good results and if accounted for can
> provide you with performance increase on read side.
> I'm not a fan of a default row cache implementation and its invalidation
> mechanism on updates so you really need to be careful when and how you use
> it. There isn't much to configuration as there is to your use case. Maybe
> explain what are you trying to solve with row cache and people can get into
> discussion with more context.
>
> Regards,
> Matija
>
> On Sat, Mar 11, 2017 at 9:15 PM, preetika tyagi <preetikaty...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm new to Cassandra and trying to get a better understanding on how the
>> row cache can be tuned to optimize the performance.
>>
>> I came across think this article: https://docs.datastax
>> .com/en/cassandra/3.0/cassandra/operations/opsConfiguringCaches.html
>>
>> And it suggests not to even touch row cache unless read workload is > 95%
>> and mostly rely on machine's default cache mechanism which comes with OS.
>>
>> The default row cache size is 0 in cassandra.yaml file so the row cache
>> won't be utilized at all.
>>
>> Therefore, I'm wondering how exactly I can decide to chose to tweak row
>> cache if needed. Are there any good pointers one can provide on this?
>>
>> Thanks,
>> Preetika
>>
>
>


Re: Row cache tuning

2017-03-11 Thread Matija Gobec
Hi,

In 99% of use cases Cassandra's row cache is not something you should look
into. Leveraging page cache yields good results and if accounted for can
provide you with performance increase on read side.
I'm not a fan of a default row cache implementation and its invalidation
mechanism on updates so you really need to be careful when and how you use
it. There isn't much to configuration as there is to your use case. Maybe
explain what are you trying to solve with row cache and people can get into
discussion with more context.

Regards,
Matija

On Sat, Mar 11, 2017 at 9:15 PM, preetika tyagi <preetikaty...@gmail.com>
wrote:

> Hi,
>
> I'm new to Cassandra and trying to get a better understanding on how the
> row cache can be tuned to optimize the performance.
>
> I came across think this article: https://docs.
> datastax.com/en/cassandra/3.0/cassandra/operations/
> opsConfiguringCaches.html
>
> And it suggests not to even touch row cache unless read workload is > 95%
> and mostly rely on machine's default cache mechanism which comes with OS.
>
> The default row cache size is 0 in cassandra.yaml file so the row cache
> won't be utilized at all.
>
> Therefore, I'm wondering how exactly I can decide to chose to tweak row
> cache if needed. Are there any good pointers one can provide on this?
>
> Thanks,
> Preetika
>


Row cache tuning

2017-03-11 Thread preetika tyagi
Hi,

I'm new to Cassandra and trying to get a better understanding on how the
row cache can be tuned to optimize the performance.

I came across think this article:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsConfiguringCaches.html

And it suggests not to even touch row cache unless read workload is > 95%
and mostly rely on machine's default cache mechanism which comes with OS.

The default row cache size is 0 in cassandra.yaml file so the row cache
won't be utilized at all.

Therefore, I'm wondering how exactly I can decide to chose to tweak row
cache if needed. Are there any good pointers one can provide on this?

Thanks,
Preetika


Re: Row cache not working

2016-10-03 Thread Jeff Jirsa
That’s true for versions 2.1 and newer. However, it’s possible that 3.0 engine 
rewrite introduced a bug or two that haven’t yet been found. 

 

 

From: Hannu Kröger <hkro...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, October 3, 2016 at 3:52 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Row cache not working

 

If I remember correctly row cache caches only N rows from the beginning of the 
partition. N being some configurable number. 

 

See this link which is suggesting that:

http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1

 

Br,

Hannu

 

On 4 Oct 2016, at 1.32, Edward Capriolo <edlinuxg...@gmail.com> wrote:

Since the feature is off by default. The coverage might could be only as deep 
as the specific tests that test it.

 

On Mon, Oct 3, 2016 at 4:54 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:

Seems like it’s probably worth opening a jira issue to track it (either to 
confirm it’s a bug, or to be able to better explain if/that it’s working as 
intended – the row cache is probably missing because trace indicates the read 
isn’t cacheable, but I suspect it should be cacheable). 





Do note, though, that setting rows_per_partition to ALL can be very very very 
dangerous if you have very wide rows in any of your tables with row cache 
enabled.

 

 

 

From: Abhinav Solan <abhinav.so...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, October 3, 2016 at 1:38 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Row cache not working

 

It's cassandra 3.0.7,  

I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then only 
it works don't know why.

If I set 'rows_per_partition':'1' then it does not work.

 

Also wanted to ask one thing, if I set row_cache_save_period: 60 then this 
cache would be refreshed automatically or it would be lazy, whenever the fetch 
call is made then only it caches it.

 

On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:

Which version of Cassandra are you running (I can tell it’s newer than 2.1, but 
exact version would be useful)? 

 

From: Abhinav Solan <abhinav.so...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, October 3, 2016 at 11:35 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Row cache not working

 

Hi, can anyone please help me with this 

 

Thanks,

Abhinav

 

On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan <abhinav.so...@gmail.com> wrote:

Hi Everyone, 

 

My table looks like this -

CREATE TABLE test.reads (

svc_pt_id bigint,

meas_type_id bigint,

flags bigint,

read_time timestamp,

value double,

PRIMARY KEY ((svc_pt_id, meas_type_id))

) WITH bloom_filter_fp_chance = 0.1

AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}

AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

 

Have set up the C* nodes with

row_cache_size_in_mb: 1024

row_cache_save_period: 14400

 

and I am making this query 

select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146;

 

with tracing on every time it says Row cache miss

 

activity

  | timestamp  | source  | source_elapsed

---++-+


Execute CQL3 
query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  0

 Parsing select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read 

Re: Row cache not working

2016-10-03 Thread Hannu Kröger
If I remember correctly row cache caches only N rows from the beginning of the 
partition. N being some configurable number. 

See this link which is suggesting that:
http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1

Br,
Hannu

> On 4 Oct 2016, at 1.32, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> 
> Since the feature is off by default. The coverage might could be only as deep 
> as the specific tests that test it.
> 
>> On Mon, Oct 3, 2016 at 4:54 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> 
>> wrote:
>> Seems like it’s probably worth opening a jira issue to track it (either to 
>> confirm it’s a bug, or to be able to better explain if/that it’s working as 
>> intended – the row cache is probably missing because trace indicates the 
>> read isn’t cacheable, but I suspect it should be cacheable).
>> 
>>  
>>  
>>  
>> 
>> 
>> Do note, though, that setting rows_per_partition to ALL can be very very 
>> very dangerous if you have very wide rows in any of your tables with row 
>> cache enabled.
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> From: Abhinav Solan <abhinav.so...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Date: Monday, October 3, 2016 at 1:38 PM
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: Row cache not working
>> 
>>  
>> 
>> It's cassandra 3.0.7, 
>> 
>> I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then 
>> only it works don't know why.
>> 
>> If I set 'rows_per_partition':'1' then it does not work.
>> 
>>  
>> 
>> Also wanted to ask one thing, if I set row_cache_save_period: 60 then this 
>> cache would be refreshed automatically or it would be lazy, whenever the 
>> fetch call is made then only it caches it.
>> 
>>  
>> 
>> On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
>> 
>> Which version of Cassandra are you running (I can tell it’s newer than 2.1, 
>> but exact version would be useful)?
>> 
>>  
>> 
>> From: Abhinav Solan <abhinav.so...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Date: Monday, October 3, 2016 at 11:35 AM
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: Row cache not working
>> 
>>  
>> 
>> Hi, can anyone please help me with this
>> 
>>  
>> 
>> Thanks,
>> 
>> Abhinav
>> 
>>  
>> 
>> On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan <abhinav.so...@gmail.com> 
>> wrote:
>> 
>> Hi Everyone,
>> 
>>  
>> 
>> My table looks like this -
>> 
>> CREATE TABLE test.reads (
>> 
>> svc_pt_id bigint,
>> 
>> meas_type_id bigint,
>> 
>> flags bigint,
>> 
>> read_time timestamp,
>> 
>> value double,
>> 
>> PRIMARY KEY ((svc_pt_id, meas_type_id))
>> 
>> ) WITH bloom_filter_fp_chance = 0.1
>> 
>> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
>> 
>> AND comment = ''
>> 
>> AND compaction = {'class': 
>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>> 
>> AND compression = {'chunk_length_in_kb': '64', 'class': 
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> 
>> AND crc_check_chance = 1.0
>> 
>> AND dclocal_read_repair_chance = 0.1
>> 
>> AND default_time_to_live = 0
>> 
>> AND gc_grace_seconds = 864000
>> 
>> AND max_index_interval = 2048
>> 
>> AND memtable_flush_period_in_ms = 0
>> 
>> AND min_index_interval = 128
>> 
>> AND read_repair_chance = 0.0
>> 
>> AND speculative_retry = '99PERCENTILE';
>> 
>>  
>> 
>> Have set up the C* nodes with
>> 
>> row_cache_size_in_mb: 1024
>> 
>> row_cache_save_period: 14400
>> 
>>  
>> 
>> and I am making this query 
>> 
>> select svc_pt_id, meas_type_id, read_time, value FROM 
>> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146;
>> 
>>  
>> 
>> wit

Re: Row cache not working

2016-10-03 Thread Edward Capriolo
Since the feature is off by default. The coverage might could be only as
deep as the specific tests that test it.

On Mon, Oct 3, 2016 at 4:54 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> Seems like it’s probably worth opening a jira issue to track it (either to
> confirm it’s a bug, or to be able to better explain if/that it’s working as
> intended – the row cache is probably missing because trace indicates the
> read isn’t cacheable, but I suspect it should be cacheable).
>
>
>
>
>
>
> Do note, though, that setting rows_per_partition to ALL can be very very
> very dangerous if you have very wide rows in any of your tables with row
> cache enabled.
>
>
>
>
>
>
>
> *From: *Abhinav Solan <abhinav.so...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, October 3, 2016 at 1:38 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Row cache not working
>
>
>
> It's cassandra 3.0.7,
>
> I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then
> only it works don't know why.
>
> If I set 'rows_per_partition':'1' then it does not work.
>
>
>
> Also wanted to ask one thing, if I set row_cache_save_period: 60 then this
> cache would be refreshed automatically or it would be lazy, whenever the
> fetch call is made then only it caches it.
>
>
>
> On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> Which version of Cassandra are you running (I can tell it’s newer than
> 2.1, but exact version would be useful)?
>
>
>
> *From: *Abhinav Solan <abhinav.so...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, October 3, 2016 at 11:35 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Row cache not working
>
>
>
> Hi, can anyone please help me with this
>
>
>
> Thanks,
>
> Abhinav
>
>
>
> On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan <abhinav.so...@gmail.com>
> wrote:
>
> Hi Everyone,
>
>
>
> My table looks like this -
>
> CREATE TABLE test.reads (
>
> svc_pt_id bigint,
>
> meas_type_id bigint,
>
> flags bigint,
>
> read_time timestamp,
>
> value double,
>
> PRIMARY KEY ((svc_pt_id, meas_type_id))
>
> ) WITH bloom_filter_fp_chance = 0.1
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
>
> AND comment = ''
>
> AND compaction = {'class': 'org.apache.cassandra.db.compaction.
> LeveledCompactionStrategy'}
>
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 864000
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99PERCENTILE';
>
>
>
> Have set up the C* nodes with
>
> row_cache_size_in_mb: 1024
>
> row_cache_save_period: 14400
>
>
>
> and I am making this query
>
> select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146;
>
>
>
> with tracing on every time it says Row cache miss
>
>
>
> activity
>
>| timestamp  | source  | source_elapsed
>
> 
> 
> ---+
> +-+
>
>
>
> Execute CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>  0
>
>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>111
>
>
>Preparing statement
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>209
>
>
> reading data from /192.168.170.186
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.170.186=DQMGaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3Cta

Re: Row cache not working

2016-10-03 Thread Jeff Jirsa
Seems like it’s probably worth opening a jira issue to track it (either to 
confirm it’s a bug, or to be able to better explain if/that it’s working as 
intended – the row cache is probably missing because trace indicates the read 
isn’t cacheable, but I suspect it should be cacheable). 



    

Do note, though, that setting rows_per_partition to ALL can be very very very 
dangerous if you have very wide rows in any of your tables with row cache 
enabled.

 

 

 

From: Abhinav Solan <abhinav.so...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, October 3, 2016 at 1:38 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Row cache not working

 

It's cassandra 3.0.7,  

I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then only 
it works don't know why.

If I set 'rows_per_partition':'1' then it does not work.

 

Also wanted to ask one thing, if I set row_cache_save_period: 60 then this 
cache would be refreshed automatically or it would be lazy, whenever the fetch 
call is made then only it caches it.

 

On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:

Which version of Cassandra are you running (I can tell it’s newer than 2.1, but 
exact version would be useful)? 

 

From: Abhinav Solan <abhinav.so...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, October 3, 2016 at 11:35 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Row cache not working

 

Hi, can anyone please help me with this 

 

Thanks,

Abhinav

 

On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan <abhinav.so...@gmail.com> wrote:

Hi Everyone, 

 

My table looks like this -

CREATE TABLE test.reads (

svc_pt_id bigint,

meas_type_id bigint,

flags bigint,

read_time timestamp,

value double,

PRIMARY KEY ((svc_pt_id, meas_type_id))

) WITH bloom_filter_fp_chance = 0.1

AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}

AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

 

Have set up the C* nodes with

row_cache_size_in_mb: 1024

row_cache_save_period: 14400

 

and I am making this query 

select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146;

 

with tracing on every time it says Row cache miss

 

activity

  | timestamp  | source  | source_elapsed

---++-+


Execute CQL3 
query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  0

 Parsing select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146; 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  
  111


 Preparing statement 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  
  209


  reading data from /192.168.170.186 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |  
  370

 
Sending READ message to /192.168.170.186 
[MessagingService-Outgoing-/192.168.170.186] | 2016-09-30 18:15:00.446001 |  
192.168.199.75 |450

  REQUEST_RESPONSE 
message received from /192.168.170.186 
[MessagingService-Inc

Re: Row cache not working

2016-10-03 Thread Abhinav Solan
It's cassandra 3.0.7,
I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then
only it works don't know why.
If I set 'rows_per_partition':'1' then it does not work.

Also wanted to ask one thing, if I set row_cache_save_period: 60 then this
cache would be refreshed automatically or it would be lazy, whenever the
fetch call is made then only it caches it.

On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> Which version of Cassandra are you running (I can tell it’s newer than
> 2.1, but exact version would be useful)?
>
>
>
> *From: *Abhinav Solan <abhinav.so...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, October 3, 2016 at 11:35 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Row cache not working
>
>
>
> Hi, can anyone please help me with this
>
>
>
> Thanks,
>
> Abhinav
>
>
>
> On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan <abhinav.so...@gmail.com>
> wrote:
>
> Hi Everyone,
>
>
>
> My table looks like this -
>
> CREATE TABLE test.reads (
>
> svc_pt_id bigint,
>
> meas_type_id bigint,
>
> flags bigint,
>
> read_time timestamp,
>
> value double,
>
> PRIMARY KEY ((svc_pt_id, meas_type_id))
>
> ) WITH bloom_filter_fp_chance = 0.1
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
>
> AND comment = ''
>
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 864000
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99PERCENTILE';
>
>
>
> Have set up the C* nodes with
>
> row_cache_size_in_mb: 1024
>
> row_cache_save_period: 14400
>
>
>
> and I am making this query
>
> select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146;
>
>
>
> with tracing on every time it says Row cache miss
>
>
>
> activity
>
>| timestamp  | source  | source_elapsed
>
>
> ---++-+
>
>
>
> Execute CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>  0
>
>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>111
>
>
>Preparing statement
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>209
>
>
> reading data from /192.168.170.186
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.170.186=DQMGaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=TzZ71ThTYrI2Cs7eYc2nhu4gOJpHM6B89KY97yj0Pp4=Rsg4cca5QVAWlI6cS1M673hWQ66Jxg2B5zK-HoJ6ZlQ=>
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>370
>
>
>Sending READ message to /192.168.170.186
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.170.186=DQMGaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=TzZ71ThTYrI2Cs7eYc2nhu4gOJpHM6B89KY97yj0Pp4=Rsg4cca5QVAWlI6cS1M673hWQ66Jxg2B5zK-HoJ6ZlQ=>
> [MessagingService-Outgoing-/192.168.170.186
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.170.186=DQMGaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=TzZ71ThTYrI2Cs7eYc2nhu4gOJpHM6B89KY97yj0Pp4=Rsg4cca5QVAWlI6cS1M673hWQ66Jxg2B5zK-HoJ6ZlQ=>]
> | 2016-09-30 18:15:00.446001 |  192.168.199.75 |450
>
>
> REQUEST_RESPONSE message received from /192.168.170.186
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.170.186=DQMGaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=TzZ71ThTYrI2Cs7eYc2nhu4gOJpHM6B89KY97yj0Pp4=Rsg4cca5QVAWlI6cS

Re: Row cache not working

2016-10-03 Thread Jeff Jirsa
Which version of Cassandra are you running (I can tell it’s newer than 2.1, but 
exact version would be useful)? 


 

From: Abhinav Solan <abhinav.so...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, October 3, 2016 at 11:35 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Row cache not working

 

Hi, can anyone please help me with this 

 

Thanks,

Abhinav

 

On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan <abhinav.so...@gmail.com> wrote:

Hi Everyone, 

 

My table looks like this -

CREATE TABLE test.reads (

svc_pt_id bigint,

meas_type_id bigint,

flags bigint,

read_time timestamp,

value double,

PRIMARY KEY ((svc_pt_id, meas_type_id))

) WITH bloom_filter_fp_chance = 0.1

AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}

AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

 

Have set up the C* nodes with

row_cache_size_in_mb: 1024

row_cache_save_period: 14400

 

and I am making this query 

select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146;

 

with tracing on every time it says Row cache miss

 

activity

  | timestamp  | source  | source_elapsed

---++-+


Execute CQL3 
query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  0

 Parsing select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146; 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  
  111


 Preparing statement 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  
  209


  reading data from /192.168.170.186 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |  
  370

 
Sending READ message to /192.168.170.186 
[MessagingService-Outgoing-/192.168.170.186] | 2016-09-30 18:15:00.446001 |  
192.168.199.75 |450

  REQUEST_RESPONSE 
message received from /192.168.170.186 
[MessagingService-Incoming-/192.168.170.186] | 2016-09-30 18:15:00.448000 |  
192.168.199.75 |   2469


   Processing response from /192.168.170.186 
[SharedPool-Worker-8] | 2016-09-30 18:15:00.448000 |  192.168.199.75 |  
 2609


READ message received from /192.168.199.75 
[MessagingService-Incoming-/192.168.199.75] | 2016-09-30 18:15:00.449000 | 
192.168.170.186 | 75


          Row cache miss 
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |  
  218

  Fetching data but not populating 
cache as query does not query from the start of the partition 
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |  
  246

  
Executing single-partition query on cts_svc_pt_latest_int_read 
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |  
  259


Acquiring sstable references 
[SharedPool-Worker-2] | 2016-09-30 18:15:0

Re: Row cache not working

2016-10-03 Thread Edward Capriolo
I was thinking about this issue. I was wondering on the dev side if it
would make sense to make a utility for the unit tests that could enable
tracing and then assert that a number of steps in the trace happened.

Something like:

setup()
runQuery("SELECT * FROM X")
Assertion.assertTrace("Preparing statement").then("Row cache
hit").then("Request complete");

This would be a pretty awesome way to verify things without mock/mockito.



On Mon, Oct 3, 2016 at 2:35 PM, Abhinav Solan <abhinav.so...@gmail.com>
wrote:

> Hi, can anyone please help me with this
>
> Thanks,
> Abhinav
>
> On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan <abhinav.so...@gmail.com>
> wrote:
>
>> Hi Everyone,
>>
>> My table looks like this -
>> CREATE TABLE test.reads (
>> svc_pt_id bigint,
>> meas_type_id bigint,
>> flags bigint,
>> read_time timestamp,
>> value double,
>> PRIMARY KEY ((svc_pt_id, meas_type_id))
>> ) WITH bloom_filter_fp_chance = 0.1
>> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
>> AND comment = ''
>> AND compaction = {'class': 'org.apache.cassandra.db.compaction.
>> LeveledCompactionStrategy'}
>> AND compression = {'chunk_length_in_kb': '64', 'class': '
>> org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 1.0
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE';
>>
>> Have set up the C* nodes with
>> row_cache_size_in_mb: 1024
>> row_cache_save_period: 14400
>>
>> and I am making this query
>> select svc_pt_id, meas_type_id, read_time, value FROM
>> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
>> 146;
>>
>> with tracing on every time it says Row cache miss
>>
>> activity
>>
>>  | timestamp  | source  | source_elapsed
>> 
>> 
>> ---+
>> +-+
>>
>>
>> Execute CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>>  0
>>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM
>> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
>> 146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>>111
>>
>>Preparing statement
>> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>>209
>>
>> reading data from /192.168.170.186
>> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>>370
>>
>>Sending READ message to /192.168.170.186 [MessagingService-Outgoing-/
>> 192.168.170.186] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>>  450
>>
>> REQUEST_RESPONSE message received from /192.168.170.186
>> [MessagingService-Incoming-/192.168.170.186] | 2016-09-30
>> 18:15:00.448000 |  192.168.199.75 |   2469
>>
>>  Processing response from /192.168.170.186
>> [SharedPool-Worker-8] | 2016-09-30 18:15:00.448000 |  192.168.199.75 |
>>   2609
>>
>>   READ message received from /192.168.199.75 [MessagingService-Incoming-/
>> 192.168.199.75] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>> 75
>>
>> Row cache miss
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>>218
>>   Fetching data but not
>> populating cache as query does not query from the start of the partition
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>>246
>>
>> Executing single-partition query on cts_svc_pt_latest_int_read
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>>259
>>
>>   Acquiring sstable references
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>>281
>>
>> 

Re: Row cache not working

2016-10-03 Thread Abhinav Solan
Hi, can anyone please help me with this

Thanks,
Abhinav

On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan <abhinav.so...@gmail.com>
wrote:

> Hi Everyone,
>
> My table looks like this -
> CREATE TABLE test.reads (
> svc_pt_id bigint,
> meas_type_id bigint,
> flags bigint,
> read_time timestamp,
> value double,
> PRIMARY KEY ((svc_pt_id, meas_type_id))
> ) WITH bloom_filter_fp_chance = 0.1
> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
> Have set up the C* nodes with
> row_cache_size_in_mb: 1024
> row_cache_save_period: 14400
>
> and I am making this query
> select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146;
>
> with tracing on every time it says Row cache miss
>
> activity
>
>| timestamp  | source  | source_elapsed
>
> ---++-+
>
>
> Execute CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>  0
>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>111
>
>Preparing statement
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>209
>
> reading data from /192.168.170.186
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>370
>
>Sending READ message to /192.168.170.186 [MessagingService-Outgoing-/
> 192.168.170.186] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>450
>
> REQUEST_RESPONSE message received from /192.168.170.186
> [MessagingService-Incoming-/192.168.170.186] | 2016-09-30 18:15:00.448000
> |  192.168.199.75 |   2469
>
>  Processing response from /192.168.170.186
> [SharedPool-Worker-8] | 2016-09-30 18:15:00.448000 |  192.168.199.75 |
>       2609
>
>   READ message received from /192.168.199.75 [MessagingService-Incoming-/
> 192.168.199.75] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
> 75
>
> Row cache miss
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>218
>   Fetching data but not
> populating cache as query does not query from the start of the partition
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>246
>
> Executing single-partition query on cts_svc_pt_latest_int_read
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>259
>
>   Acquiring sstable references
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>281
>
>  Merging memtable contents
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>295
>
>Merging data from sstable 8
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>326
>
>Key cache hit for sstable 8
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>351
>
>Merging data from sstable 7
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>439
>
>Key cache hit for sstable 7
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>468
>
>  Read 1 live and 0 tombstone cells
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>615
>
>

Row cache not working

2016-09-30 Thread Abhinav Solan
Hi Everyone,

My table looks like this -
CREATE TABLE test.reads (
svc_pt_id bigint,
meas_type_id bigint,
flags bigint,
read_time timestamp,
value double,
PRIMARY KEY ((svc_pt_id, meas_type_id))
) WITH bloom_filter_fp_chance = 0.1
AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

Have set up the C* nodes with
row_cache_size_in_mb: 1024
row_cache_save_period: 14400

and I am making this query
select svc_pt_id, meas_type_id, read_time, value FROM
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
146;

with tracing on every time it says Row cache miss

activity

   | timestamp  | source  | source_elapsed
---++-+

Execute
CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  0
 Parsing select svc_pt_id, meas_type_id, read_time, value FROM
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
   111

 Preparing statement
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
   209

  reading data from /192.168.170.186
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
   370

 Sending READ message to /192.168.170.186 [MessagingService-Outgoing-/
192.168.170.186] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
 450

REQUEST_RESPONSE message received from /192.168.170.186
[MessagingService-Incoming-/192.168.170.186] | 2016-09-30 18:15:00.448000 |
 192.168.199.75 |   2469

   Processing response from /192.168.170.186
[SharedPool-Worker-8] | 2016-09-30 18:15:00.448000 |  192.168.199.75 |
  2609

READ message received from /192.168.199.75 [MessagingService-Incoming-/
192.168.199.75] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
  75

  Row cache miss
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
   218
  Fetching data but not
populating cache as query does not query from the start of the partition
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
   246

  Executing single-partition query on cts_svc_pt_latest_int_read
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
   259

Acquiring sstable references
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   281

   Merging memtable contents
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   295

 Merging data from sstable 8
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   326

 Key cache hit for sstable 8
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   351

 Merging data from sstable 7
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   439

 Key cache hit for sstable 7
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   468

   Read 1 live and 0 tombstone cells
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
   615

   Enqueuing response to /192.168.199.75
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449002 | 192.168.170.186 |
   766
   Sending
REQUEST_RESPONSE message to /192.168.199.75 [MessagingService-Outgoing-/
192.168.199.75] | 2016-09-30 18:15:00.449002 | 192.168.170.186 |
 897


Request complete | 2016-09-30 18:15:00.44 |  192.168.199.75 |
2888

Can please anyone tell me what I am doing wrong?

Thanks,
Abhinav


Thrift row cache in Cassandra 2.1

2016-03-30 Thread AJ
Hi,

I am having to tune a legacy app to use row caching (the why is unimportant). I 
know Thrift is EOL etc.. However, I have to do it.

I am unable to work out what the values to set on the column family are now 
with the changes in Caching (i.e. rows_per_partition). Previously you would set 
them to all, keys_only, rows_only, or none - is this still the case? The docs 
seem to indicate you can only set it to keys or rows_per_partition… When I set 
it to all n CF via cassandra-cli it says rows_per_partition: 0 when I look at 
the CQL for the same CF.

Just a bit confused - if anyone can clarify it, would be appreciated.

Thanks,

AJ




row cache hit is costlier for partiton with large rows

2015-01-21 Thread nitin padalia
Hi,

With two different families when I do a read, row cache hit is almost
15x costlier with larger rows (1 rows per partition), in
comparison to partition with only 100 rows.

Difference in two column families is one is having 100 rows per
partition another 1 rows per partition. Schema for two tables is:
CREATE TABLE table1_row_cache (
  user_id uuid,
  dept_id uuid,
  location_id text,
  locationmap_id uuid,
  PRIMARY KEY ((user_id, location_id), dept_id)
)

CREATE TABLE table2_row_cache (
  user_id uuid,
  dept_id uuid,
  location_id text,
  locationmap_id uuid,
  PRIMARY KEY ((user_id, dept_id), location_id)
)

Here is the tracing:

Row cache Hit with Column Family table1_row_cache, 100 rows per partition:
 Preparing statement [SharedPool-Worker-2] | 2015-01-20
14:35:47.54 | x.x.x.x |   1023
  Row cache hit [SharedPool-Worker-5] | 2015-01-20
14:35:47.542000 | x.x.x.x |   2426

Row cache Hit with CF table2_row_cache, 1 rows per partition:
Preparing statement [SharedPool-Worker-1] | 2015-01-20 16:02:51.696000
| x.x.x.x |490
 Row cache hit [SharedPool-Worker-2] | 2015-01-20
16:02:51.711000 | x.x.x.x |  15146


If for both cases data is in memory why its not same? Can someone
point me what wrong here?

Nitin Padalia


Re: row cache hit is costlier for partiton with large rows

2015-01-21 Thread Sylvain Lebresne
The row cache saves partition data off-heap, which means that every cache
hit require copying/deserializing the cached partition into the heap, and
the more rows per partition you cache, the long it will take. Which is why
it's currently not a good cache too much rows per partition (unless you
know what you're doing).

On Wed, Jan 21, 2015 at 1:15 PM, nitin padalia padalia.ni...@gmail.com
wrote:

 Hi,

 With two different families when I do a read, row cache hit is almost
 15x costlier with larger rows (1 rows per partition), in
 comparison to partition with only 100 rows.

 Difference in two column families is one is having 100 rows per
 partition another 1 rows per partition. Schema for two tables is:
 CREATE TABLE table1_row_cache (
   user_id uuid,
   dept_id uuid,
   location_id text,
   locationmap_id uuid,
   PRIMARY KEY ((user_id, location_id), dept_id)
 )

 CREATE TABLE table2_row_cache (
   user_id uuid,
   dept_id uuid,
   location_id text,
   locationmap_id uuid,
   PRIMARY KEY ((user_id, dept_id), location_id)
 )

 Here is the tracing:

 Row cache Hit with Column Family table1_row_cache, 100 rows per partition:
  Preparing statement [SharedPool-Worker-2] | 2015-01-20
 14:35:47.54 | x.x.x.x |   1023
   Row cache hit [SharedPool-Worker-5] | 2015-01-20
 14:35:47.542000 | x.x.x.x |   2426

 Row cache Hit with CF table2_row_cache, 1 rows per partition:
 Preparing statement [SharedPool-Worker-1] | 2015-01-20 16:02:51.696000
 | x.x.x.x |490
  Row cache hit [SharedPool-Worker-2] | 2015-01-20
 16:02:51.711000 | x.x.x.x |  15146


 If for both cases data is in memory why its not same? Can someone
 point me what wrong here?

 Nitin Padalia



Why Cassandra 2.1.2 couldn't populate row cache in between

2015-01-20 Thread nitin padalia
Hi,

If I've enable row cache for some column family, when I request some
row which is not from the begining of the partition, then cassandra
doesn't populate, row cache.

Why it is so? For older version I think it was because we're saying
the its caching complete merged partition so, incomplete partition
can't reside in row cache.

However in new version since we could resize the cache, so why not we
populate from other than the start?

Nitin Padalia


Re: Why Cassandra 2.1.2 couldn't populate row cache in between

2015-01-20 Thread Robert Coli
On Mon, Jan 19, 2015 at 11:57 PM, nitin padalia padalia.ni...@gmail.com
wrote:

 If I've enable row cache for some column family, when I request some
 row which is not from the begining of the partition, then cassandra
 doesn't populate, row cache.

 Why it is so? For older version I think it was because we're saying
 the its caching complete merged partition so, incomplete partition
 can't reside in row cache.

 However in new version since we could resize the cache, so why not we
 populate from other than the start?


https://issues.apache.org/jira/browse/CASSANDRA-5357

Has the details of the new row version of the row cache.

=Rob


Re: Quickly loading C* dataset into memory (row cache)

2014-09-15 Thread Robert Coli
On Sat, Sep 13, 2014 at 11:48 PM, Paulo Ricardo Motta Gomes 
paulo.mo...@chaordicsystems.com wrote:

 Apparently Apple is using Cassandra as a massive multi-DC cache, as per
 their announcement during the summit, but probably DSE with in-memory
 enabled option. Would love to hear about similar use cases.


There's caches and there's caches. I submit that, thus far, the usage of
the term cache in this conversation has not been specific enough to
enhance understanding.

I continue to assert, in a very limited scope, that 6GB of row cache in
Cassandra on a system with 7GB of RAM is Doing It Wrong.  :D

=Rob


Re: Quickly loading C* dataset into memory (row cache)

2014-09-14 Thread Paulo Ricardo Motta Gomes
Apparently Apple is using Cassandra as a massive multi-DC cache, as per
their announcement during the summit, but probably DSE with in-memory
enabled option. Would love to hear about similar use cases.

On Fri, Sep 12, 2014 at 12:20 PM, Ken Hancock ken.hanc...@schange.com
wrote:

 +1 for Redis.

 It's really nice, good primitives, and then you can do some really cool
 stuff chaining multiple atomic operations to create larger atomics through
 the lua scripting.

 On Thu, Sep 11, 2014 at 12:26 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Thu, Sep 11, 2014 at 8:30 AM, Danny Chan tofuda...@gmail.com wrote:

 What are you referring to when you say memory store?

 RAM disk? memcached?


 In 2014, probably Redis?

 =Rob





 --
 *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
 http://www.schange.com/en-US/Company/InvestorRelations.aspx
 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
 http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.




-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200


Re: Quickly loading C* dataset into memory (row cache)

2014-09-12 Thread Ken Hancock
+1 for Redis.

It's really nice, good primitives, and then you can do some really cool
stuff chaining multiple atomic operations to create larger atomics through
the lua scripting.

On Thu, Sep 11, 2014 at 12:26 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 11, 2014 at 8:30 AM, Danny Chan tofuda...@gmail.com wrote:

 What are you referring to when you say memory store?

 RAM disk? memcached?


 In 2014, probably Redis?

 =Rob





-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.


Re: Quickly loading C* dataset into memory (row cache)

2014-09-11 Thread Danny Chan
What are you referring to when you say memory store?

RAM disk? memcached?

Thanks,

Danny

On Wed, Sep 10, 2014 at 1:11 AM, DuyHai Doan doanduy...@gmail.com wrote:
 Rob Coli strikes again, you're Doing It Wrong, and he's right :D

 Using Cassandra as an distributed cache is a bad idea, seriously. Putting
 6GB into row cache is another one.


 On Tue, Sep 9, 2014 at 9:21 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Sep 9, 2014 at 12:10 PM, Danny Chan tofuda...@gmail.com wrote:

 Is there a method to quickly load a large dataset into the row cache?
 I use row caching as I want the entire dataset to be in memory.


 You're doing it wrong. Use a memory store.

 =Rob





Re: Quickly loading C* dataset into memory (row cache)

2014-09-11 Thread Robert Coli
On Thu, Sep 11, 2014 at 8:30 AM, Danny Chan tofuda...@gmail.com wrote:

 What are you referring to when you say memory store?

 RAM disk? memcached?


In 2014, probably Redis?

=Rob


Quickly loading C* dataset into memory (row cache)

2014-09-09 Thread Danny Chan
Hello all,

Is there a method to quickly load a large dataset into the row cache?
I use row caching as I want the entire dataset to be in memory.

I'm running a Cassandra-1.2 database server with a dataset of 555
records (6GB size) and a row cache of 6GB. Key caching is disabled and
I am using SerializingCacheProvider. The machine running the Cassandra
server has 7GB memory and 2 CPUs.

I have a YCSB client running on another machine and it runs a readonly
benchmark on the Cassandra server. As the benchmark progresses, the
Cassandra server loads the dataset into the row cache.

However, it takes up to 2 hours to load the entire dataset into the row cache.

Is there any other method to load the entire dataset into row cache
quickly (does not need to use YCSB)?


Any help is appreciated,

Danny


Re: Quickly loading C* dataset into memory (row cache)

2014-09-09 Thread Robert Coli
On Tue, Sep 9, 2014 at 12:10 PM, Danny Chan tofuda...@gmail.com wrote:

 Is there a method to quickly load a large dataset into the row cache?
 I use row caching as I want the entire dataset to be in memory.


You're doing it wrong. Use a memory store.

=Rob


Re: Quickly loading C* dataset into memory (row cache)

2014-09-09 Thread DuyHai Doan
Rob Coli strikes again, you're Doing It Wrong, and he's right :D

Using Cassandra as an distributed cache is a bad idea, seriously. Putting
6GB into row cache is another one.


On Tue, Sep 9, 2014 at 9:21 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Sep 9, 2014 at 12:10 PM, Danny Chan tofuda...@gmail.com wrote:

 Is there a method to quickly load a large dataset into the row cache?
 I use row caching as I want the entire dataset to be in memory.


 You're doing it wrong. Use a memory store.

 =Rob




Fetching ONE cell with a row cache hit takes 1 second on an idle box?

2014-07-01 Thread Kevin Burton
I'm really perplexed on this one..  I think this must be a bug or some
misconfiguration somewhere.

I'm fetching ONE row which is one cell in my table.

The row is in the row cache, sitting in memory.

SELECT sequence from content where bucket=98 AND sequence =
140348149405742;

And look at the trace below… it's taking 1000ms to go to the row cache?
 That seems insane!

Something must be broken somewhere.. Any advice in what I should be looking
into?

The VM has plenty of memory so I don't think it's GC.

 activity
  | timestamp| source  | source_elapsed
--+--+-+

   execute_cql3_query | 19:48:48,559 | 10.24.23.94 |  0
 Parsing SELECT sequence from content where bucket=98 AND sequence =
140348149405742 LIMIT 1; | 19:48:48,559 | 10.24.23.94 |
63

  Preparing statement | 19:48:48,559 | 10.24.23.94 |153

Row cache hit | 19:48:49,706 | 10.24.23.94 |1147108
   Read 1
live and 0 tombstoned cells | 19:48:49,706 | 10.24.23.94 |1147236

 Request complete | 19:48:49,706 | 10.24.23.94 |1147412


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Fetching ONE cell with a row cache hit takes 1 second on an idle box?

2014-07-01 Thread Kevin Burton
you know.. one thing I failed to mention.. .is that this is going into a
bucket and while it's a logical row, the physical row is like 500MB …
according to compaction logs.

is the ENTIRE physical row going into the cache as one unit?  That's
definitely going to be a problem in this model.  500MB is a big atomic unit.

also.. I assume it's having to do a binary search within the physical row ?


On Tue, Jul 1, 2014 at 5:54 PM, Kevin Burton bur...@spinn3r.com wrote:

 I'm really perplexed on this one..  I think this must be a bug or some
 misconfiguration somewhere.

 I'm fetching ONE row which is one cell in my table.

 The row is in the row cache, sitting in memory.

 SELECT sequence from content where bucket=98 AND sequence =
 140348149405742;

 And look at the trace below… it's taking 1000ms to go to the row cache?
  That seems insane!

 Something must be broken somewhere.. Any advice in what I should be
 looking into?

 The VM has plenty of memory so I don't think it's GC.

  activity
 | timestamp| source  | source_elapsed

 --+--+-+

  execute_cql3_query | 19:48:48,559 | 10.24.23.94 |  0
  Parsing SELECT sequence from content where bucket=98 AND sequence =
 140348149405742 LIMIT 1; | 19:48:48,559 | 10.24.23.94 |
 63

 Preparing statement | 19:48:48,559 | 10.24.23.94 |153

   Row cache hit | 19:48:49,706 | 10.24.23.94 |1147108
Read 1
 live and 0 tombstoned cells | 19:48:49,706 | 10.24.23.94 |1147236

Request complete | 19:48:49,706 | 10.24.23.94 |1147412


 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
  … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Fetching ONE cell with a row cache hit takes 1 second on an idle box?

2014-07-01 Thread Robert Coli
On Tue, Jul 1, 2014 at 6:06 PM, Kevin Burton bur...@spinn3r.com wrote:

 you know.. one thing I failed to mention.. .is that this is going into a
 bucket and while it's a logical row, the physical row is like 500MB …
 according to compaction logs.

 is the ENTIRE physical row going into the cache as one unit?  That's
 definitely going to be a problem in this model.  500MB is a big atomic unit.


Yes, the row cache is a row cache. It caches what the storage engine calls
rows, which CQL calls partitions. [1] Rows have to be assembled from all
of their row fragments in Memtables/SSTables.

This is a big part of why the off-heap row cache's behavior of
invalidation on write is so bad for its overall performance. Updating a
single column in your 500MB row invalidates it and forces you to assemble
the entire 500MB row from disk.

The only valid use case for the current off-heap row cache seems to be :
very small, very uniform in size, very hot, and very rarely modified.

https://issues.apache.org/jira/browse/CASSANDRA-5357

Is the ticket for replacing the row cache and its unexpected
characteristics with something more like an actual query cache.

also.. I assume it's having to do a binary search within the physical row ?


Since the column level bloom filter's removal in 1.2, the only way it can
get to specific columns is via the index.

=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-6632


Re: Fetching ONE cell with a row cache hit takes 1 second on an idle box?

2014-07-01 Thread Kevin Burton
so.. caching the *queries* ?

it seems like a better mechanism would be to cache the actually logical
row…, not the physical row.

Query caches just don't work in production,  If you re-word your query, or
structure it a different way, you get a miss…

In my experience.. query caches have a 0% hit rate.


On Tue, Jul 1, 2014 at 6:40 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Jul 1, 2014 at 6:06 PM, Kevin Burton bur...@spinn3r.com wrote:

 you know.. one thing I failed to mention.. .is that this is going into a
 bucket and while it's a logical row, the physical row is like 500MB …
 according to compaction logs.

 is the ENTIRE physical row going into the cache as one unit?  That's
 definitely going to be a problem in this model.  500MB is a big atomic unit.


 Yes, the row cache is a row cache. It caches what the storage engine calls
 rows, which CQL calls partitions. [1] Rows have to be assembled from all
 of their row fragments in Memtables/SSTables.

 This is a big part of why the off-heap row cache's behavior of
 invalidation on write is so bad for its overall performance. Updating a
 single column in your 500MB row invalidates it and forces you to assemble
 the entire 500MB row from disk.

 The only valid use case for the current off-heap row cache seems to be :
 very small, very uniform in size, very hot, and very rarely modified.

 https://issues.apache.org/jira/browse/CASSANDRA-5357

 Is the ticket for replacing the row cache and its unexpected
 characteristics with something more like an actual query cache.

 also.. I assume it's having to do a binary search within the physical row
 ?


 Since the column level bloom filter's removal in 1.2, the only way it can
 get to specific columns is via the index.

 =Rob
 [1] https://issues.apache.org/jira/browse/CASSANDRA-6632




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Fetching ONE cell with a row cache hit takes 1 second on an idle box?

2014-07-01 Thread Kevin Burton
A work around for this, is the VFS page cache.. basically, disabling
compression, and then allowing the VFS page cache to keep your data in
memory.

The only downside is the per column overhead.  But if you can store
everything in a 'blob' which is optionally compressed, you're generally
going to be ok.

Kevin


On Tue, Jul 1, 2014 at 6:50 PM, Kevin Burton bur...@spinn3r.com wrote:

 so.. caching the *queries* ?

 it seems like a better mechanism would be to cache the actually logical
 row…, not the physical row.

 Query caches just don't work in production,  If you re-word your query, or
 structure it a different way, you get a miss…

 In my experience.. query caches have a 0% hit rate.


 On Tue, Jul 1, 2014 at 6:40 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Jul 1, 2014 at 6:06 PM, Kevin Burton bur...@spinn3r.com wrote:

 you know.. one thing I failed to mention.. .is that this is going into a
 bucket and while it's a logical row, the physical row is like 500MB …
 according to compaction logs.

 is the ENTIRE physical row going into the cache as one unit?  That's
 definitely going to be a problem in this model.  500MB is a big atomic unit.


 Yes, the row cache is a row cache. It caches what the storage engine
 calls rows, which CQL calls partitions. [1] Rows have to be assembled
 from all of their row fragments in Memtables/SSTables.

 This is a big part of why the off-heap row cache's behavior of
 invalidation on write is so bad for its overall performance. Updating a
 single column in your 500MB row invalidates it and forces you to assemble
 the entire 500MB row from disk.

 The only valid use case for the current off-heap row cache seems to be :
 very small, very uniform in size, very hot, and very rarely modified.

 https://issues.apache.org/jira/browse/CASSANDRA-5357

 Is the ticket for replacing the row cache and its unexpected
 characteristics with something more like an actual query cache.

 also.. I assume it's having to do a binary search within the physical row
 ?


 Since the column level bloom filter's removal in 1.2, the only way it can
 get to specific columns is via the index.

 =Rob
 [1] https://issues.apache.org/jira/browse/CASSANDRA-6632




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Fetching ONE cell with a row cache hit takes 1 second on an idle box?

2014-07-01 Thread Kevin Burton
WOW.. so based on your advice, and a test, I disabled the row cache for the
table.

The query was instantly 20x faster.

so this is definitely an anti-pattern.. I suspect cassandra just tries to
read in they entire physical row into memory and since my physical row is
rather big.. ha.  Well that wasn't very fun :)

BIG win though ;)


On Tue, Jul 1, 2014 at 6:52 PM, Kevin Burton bur...@spinn3r.com wrote:

 A work around for this, is the VFS page cache.. basically, disabling
 compression, and then allowing the VFS page cache to keep your data in
 memory.

 The only downside is the per column overhead.  But if you can store
 everything in a 'blob' which is optionally compressed, you're generally
 going to be ok.

 Kevin


 On Tue, Jul 1, 2014 at 6:50 PM, Kevin Burton bur...@spinn3r.com wrote:

 so.. caching the *queries* ?

 it seems like a better mechanism would be to cache the actually logical
 row…, not the physical row.

 Query caches just don't work in production,  If you re-word your query,
 or structure it a different way, you get a miss…

 In my experience.. query caches have a 0% hit rate.


 On Tue, Jul 1, 2014 at 6:40 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Jul 1, 2014 at 6:06 PM, Kevin Burton bur...@spinn3r.com wrote:

 you know.. one thing I failed to mention.. .is that this is going into
 a bucket and while it's a logical row, the physical row is like 500MB …
 according to compaction logs.

 is the ENTIRE physical row going into the cache as one unit?  That's
 definitely going to be a problem in this model.  500MB is a big atomic 
 unit.


 Yes, the row cache is a row cache. It caches what the storage engine
 calls rows, which CQL calls partitions. [1] Rows have to be assembled
 from all of their row fragments in Memtables/SSTables.

 This is a big part of why the off-heap row cache's behavior of
 invalidation on write is so bad for its overall performance. Updating a
 single column in your 500MB row invalidates it and forces you to assemble
 the entire 500MB row from disk.

 The only valid use case for the current off-heap row cache seems to be :
 very small, very uniform in size, very hot, and very rarely modified.

 https://issues.apache.org/jira/browse/CASSANDRA-5357

 Is the ticket for replacing the row cache and its unexpected
 characteristics with something more like an actual query cache.

 also.. I assume it's having to do a binary search within the physical
 row ?


 Since the column level bloom filter's removal in 1.2, the only way it
 can get to specific columns is via the index.

 =Rob
 [1] https://issues.apache.org/jira/browse/CASSANDRA-6632




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Fetching ONE cell with a row cache hit takes 1 second on an idle box?

2014-07-01 Thread Colin
Rowcache is typically turned off because it is only useful in very specific 
situations-the row(s) need to fit in memory.  Also, the access patterns have to 
fit.

If all the rows you're accessing can fit, Rowcache is a great thing. Otherwise, 
not so great.

--
Colin
320-221-9531


 On Jul 1, 2014, at 10:40 PM, Kevin Burton bur...@spinn3r.com wrote:
 
 WOW.. so based on your advice, and a test, I disabled the row cache for the 
 table.
 
 The query was instantly 20x faster.
 
 so this is definitely an anti-pattern.. I suspect cassandra just tries to 
 read in they entire physical row into memory and since my physical row is 
 rather big.. ha.  Well that wasn't very fun :)
 
 BIG win though ;)
 
 
 On Tue, Jul 1, 2014 at 6:52 PM, Kevin Burton bur...@spinn3r.com wrote:
 A work around for this, is the VFS page cache.. basically, disabling 
 compression, and then allowing the VFS page cache to keep your data in 
 memory.
 
 The only downside is the per column overhead.  But if you can store 
 everything in a 'blob' which is optionally compressed, you're generally 
 going to be ok.
 
 Kevin
 
 
 On Tue, Jul 1, 2014 at 6:50 PM, Kevin Burton bur...@spinn3r.com wrote:
 so.. caching the *queries* ?
 
 it seems like a better mechanism would be to cache the actually logical 
 row…, not the physical row.  
 
 Query caches just don't work in production,  If you re-word your query, or 
 structure it a different way, you get a miss…
 
 In my experience.. query caches have a 0% hit rate.
 
 
 On Tue, Jul 1, 2014 at 6:40 PM, Robert Coli rc...@eventbrite.com wrote:
 On Tue, Jul 1, 2014 at 6:06 PM, Kevin Burton bur...@spinn3r.com wrote:
 you know.. one thing I failed to mention.. .is that this is going into a 
 bucket and while it's a logical row, the physical row is like 500MB … 
 according to compaction logs.
 
 is the ENTIRE physical row going into the cache as one unit?  That's 
 definitely going to be a problem in this model.  500MB is a big atomic 
 unit.
 
 Yes, the row cache is a row cache. It caches what the storage engine calls 
 rows, which CQL calls partitions. [1] Rows have to be assembled from all 
 of their row fragments in Memtables/SSTables.
 
 This is a big part of why the off-heap row cache's behavior of 
 invalidation on write is so bad for its overall performance. Updating a 
 single column in your 500MB row invalidates it and forces you to assemble 
 the entire 500MB row from disk. 
 
 The only valid use case for the current off-heap row cache seems to be : 
 very small, very uniform in size, very hot, and very rarely modified.
 
 https://issues.apache.org/jira/browse/CASSANDRA-5357
 
 Is the ticket for replacing the row cache and its unexpected 
 characteristics with something more like an actual query cache.
 
 also.. I assume it's having to do a binary search within the physical row 
 ? 
 
 Since the column level bloom filter's removal in 1.2, the only way it can 
 get to specific columns is via the index.
 
 =Rob
 [1] https://issues.apache.org/jira/browse/CASSANDRA-6632
 
 
 
 -- 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 
 
 
 -- 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 
 
 
 -- 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 


Weird row cache behaviour

2014-04-06 Thread Janne Jalkanen
Heya!

I’ve been observing some strange and worrying behaviour all this week with row 
cache hits taking hundreds of milliseconds.

Cassandra 1.2.15, Datastax CQL driver 1.0.4.
EC2 m1.xlarge instances
RF=3, N=4
vnodes in use
key cache: 200M
row cache: 200M
row_cache_provider: SerializingCacheProvider
Query: PreparedStatement SELECT * from uniques3 WHERE hash=? AND item=? AND 
event=?. All values are  20 bytes. 
All data is written with a TTL of days.

Row is not particularly wide (see cfhistograms in the pastebin).  Row cache hit 
can take hundreds of milliseconds, pretty much screwing performance. My initial 
thought was garbage collection, but I collected traces and GC logs to the 
pastebin below, so while there *is* plenty of GC going on, I don’t think it’s 
the reason.  We also have other column families accessed through Thrift which 
do not exhibit this behaviour at all. There are no abnormal query times for 
cache misses.

http://pastebin.com/ac6PVHhm

Notice also the weird “triple hump” on the cfhistograms - I’m kinda used to 
seeing two humps, one for cache hits and one for disk access, but this one has 
clearly three humps, one at the 200ms area. Also odd is the very large fp false 
ratio, but that might be just our data.

Armed with the traces I formed a hypothesis that perhaps row cache is a bad 
idea, turned it off for this CF, and hey! The average read latencies dropped to 
about 2 milliseconds.  So I’m kinda fine here now, but I would really 
appreciate it if someone could explain to me what is going on, and why would a 
row cache hit ever take up to 450 milliseconds? In our usecase, this CF does 
contain some hot data, and the row cache hit ratio is around 80%, so keeping it 
would be kinda useful.

(The pastebin contains a couple of traces, GC logs from all servers noted in 
the trace, cfstats, cfhistograms and schema.)

/Janne

Re: Row cache for writes

2014-04-01 Thread Tyler Hobbs
On Mon, Mar 31, 2014 at 11:37 AM, Wayne Schroeder 
wschroe...@pinsightmedia.com wrote:

 I found a lot of documentation about the read path for key and row caches,
 but I haven't found anything in regard to the write path.  My app has the
 need to record a large quantity of very short lived temporal data that will
 expire within seconds and only have a small percentage of the rows accessed
 before they expire.  Ideally, and I have done the math, I would like the
 data to never hit disk and just stay in memory once written until it
 expires.  How might I accomplish this?


It's not perfect, but set a short TTL on the data and set gc_grace_seconds
to 0 for the table.  Tombstones will still be written to disk, but almost
everything will be discarded in its first compaction.  You could also lower
the min compaction threshold for size-tiered compaction to 2 to force
compactions to happen more quickly.


  I am not concerned about data consistency at all on this so if I could
 even avoid the commit log, that would be even better.


You can set durable_writes = false for the keyspace.



 My main concern is that I don't see any evidence that writes end up in the
 cache--that it takes at least one read to get it into the cache.  I also
 realize that, assuming I don't cause SSTable writes due to sheer quantity,
 that the data would be in memory anyway.

 Has anyone done anything similar to this that could provide direction?


Writes invalidate row cache entries, so that's not what you want.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Row cache for writes

2014-03-31 Thread Wayne Schroeder
I found a lot of documentation about the read path for key and row caches, but 
I haven't found anything in regard to the write path.  My app has the need to 
record a large quantity of very short lived temporal data that will expire 
within seconds and only have a small percentage of the rows accessed before 
they expire.  Ideally, and I have done the math, I would like the data to never 
hit disk and just stay in memory once written until it expires.  How might I 
accomplish this?  I am not concerned about data consistency at all on this so 
if I could even avoid the commit log, that would be even better.

My main concern is that I don't see any evidence that writes end up in the 
cache—that it takes at least one read to get it into the cache.  I also realize 
that, assuming I don't cause SSTable writes due to sheer quantity, that the 
data would be in memory anyway.

Has anyone done anything similar to this that could provide direction?

Wayne



Re: Row cache for writes

2014-03-31 Thread Robert Coli
On Mon, Mar 31, 2014 at 9:37 AM, Wayne Schroeder 
wschroe...@pinsightmedia.com wrote:

 I found a lot of documentation about the read path for key and row caches,
 but I haven't found anything in regard to the write path.  My app has the
 need to record a large quantity of very short lived temporal data that will
 expire within seconds and only have a small percentage of the rows accessed
 before they expire.  Ideally, and I have done the math, I would like the
 data to never hit disk and just stay in memory once written until it
 expires.  How might I accomplish this?


http://en.wikipedia.org/wiki/Memcached

=Rob


Re: Row cache for writes

2014-03-31 Thread Wayne Schroeder
Perhaps I should clarify my question.  Is this possible / how might I 
accomplish this with cassandra?

Wayne


On Mar 31, 2014, at 12:58 PM, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com
 wrote:

On Mon, Mar 31, 2014 at 9:37 AM, Wayne Schroeder 
wschroe...@pinsightmedia.commailto:wschroe...@pinsightmedia.com wrote:
I found a lot of documentation about the read path for key and row caches, but 
I haven't found anything in regard to the write path.  My app has the need to 
record a large quantity of very short lived temporal data that will expire 
within seconds and only have a small percentage of the rows accessed before 
they expire.  Ideally, and I have done the math, I would like the data to never 
hit disk and just stay in memory once written until it expires.  How might I 
accomplish this?

http://en.wikipedia.org/wiki/Memcached

=Rob




Re: Row cache for writes

2014-03-31 Thread Ashok Ghosh
On Mar 31, 2014 12:38 PM, Wayne Schroeder wschroe...@pinsightmedia.com
wrote:

 I found a lot of documentation about the read path for key and row caches,
 but I haven't found anything in regard to the write path.  My app has the
 need to record a large quantity of very short lived temporal data that will
 expire within seconds and only have a small percentage of the rows accessed
 before they expire.  Ideally, and I have done the math, I would like the
 data to never hit disk and just stay in memory once written until it
 expires.  How might I accomplish this?  I am not concerned about data
 consistency at all on this so if I could even avoid the commit log, that
 would be even better.

 My main concern is that I don't see any evidence that writes end up in the
 cache--that it takes at least one read to get it into the cache.  I also
 realize that, assuming I don't cause SSTable writes due to sheer quantity,
 that the data would be in memory anyway.

 Has anyone done anything similar to this that could provide direction?

 Wayne




Re: Row cache vs. OS buffer cache

2014-01-23 Thread Janne Jalkanen

Our experience is that you want to have all your very hot data fit in the row 
cache (assuming you don’t have very large rows), and leave the rest for the OS. 
 Unfortunately, it completely depends on your access patterns and data what is 
the right size for the cache - zero makes sense for a lot of cases.

Try out different sizes, and watch for row cache hit ratio and read latency. 
Ditto for heap sizes, btw - if your nodes are short on RAM, you may get better 
performance by running at lower heap sizes because OS caches will get more 
memory and your gc pauses will be shorter (though more numerous).

/Janne

On 23 Jan 2014, at 09:13 , Katriel Traum katr...@google.com wrote:

 Hello list,
 
 I was if anyone has any pointers or some advise regarding using row cache vs 
 leaving it up to the OS buffer cache.
 
 I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option.
 
 Any input appreciated.
 Katriel



Re: Row cache vs. OS buffer cache

2014-01-23 Thread Chris Burroughs

My experience has been that the row cache is much more effective.
However, reasonable row cache sizes are so small relative to RAM that I 
don't see it as a significant trade-off unless it's in a very memory 
constrained environment.  If you want to enable the row cache (a big if) 
you probably want it to be as big as it can be until you have reached 
the point of diminishing returns on the hit rate.


The off-heap cache still has many on-heap objects so it's doesn't 
really change that much conceptually, you will just end up with a 
different number for the size.


On 01/23/2014 02:13 AM, Katriel Traum wrote:

Hello list,

I was if anyone has any pointers or some advise regarding using row cache
vs leaving it up to the OS buffer cache.

I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option.

Any input appreciated.
Katriel





Re: Row cache vs. OS buffer cache

2014-01-23 Thread Robert Coli
On Wed, Jan 22, 2014 at 11:13 PM, Katriel Traum katr...@google.com wrote:

 I was if anyone has any pointers or some advise regarding using row cache
 vs leaving it up to the OS buffer cache.

 I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option.


Many people have had bad experiences with Row Cache, I assert more than
have had a good experience.

https://issues.apache.org/jira/browse/CASSANDRA-5357

Is the 2.1 era re-design of the row cache into something more conceptually
appropriate.

The rule of thumb for row cache is that if your data is :

1) very hot
2) very small
3) very uniform in size

You may win with it. IMO if you meet all of those criteria you should try
A/B the on-heap cache vs. off-heap in 1.1/1.2, especially if your cached
rows are frequently updated.

https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634

=Rob


Re: Row cache vs. OS buffer cache

2014-01-23 Thread Katriel Traum
Thank you everyone for your input.
My dataset is ~100G of size with 1 or 2 read intensive column families. The
cluster has plenty of RAM. I'll start off small with 4G of row cache and
monitor the success rate.

Katriel


On Thu, Jan 23, 2014 at 9:17 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jan 22, 2014 at 11:13 PM, Katriel Traum katr...@google.comwrote:

 I was if anyone has any pointers or some advise regarding using row cache
 vs leaving it up to the OS buffer cache.

 I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option.


 Many people have had bad experiences with Row Cache, I assert more than
 have had a good experience.

 https://issues.apache.org/jira/browse/CASSANDRA-5357

 Is the 2.1 era re-design of the row cache into something more conceptually
 appropriate.

 The rule of thumb for row cache is that if your data is :

 1) very hot
 2) very small
 3) very uniform in size

 You may win with it. IMO if you meet all of those criteria you should try
 A/B the on-heap cache vs. off-heap in 1.1/1.2, especially if your cached
 rows are frequently updated.


 https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634

 =Rob




Row cache vs. OS buffer cache

2014-01-22 Thread Katriel Traum
Hello list,

I was if anyone has any pointers or some advise regarding using row cache
vs leaving it up to the OS buffer cache.

I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option.

Any input appreciated.
Katriel


Re: row cache

2013-09-07 Thread Edward Capriolo
I have found row cache to be more trouble then bene.

The term fools gold comes to mind.

Using key cache and leaving more free main memory seems stable and does not
have as many complications.
On Wednesday, September 4, 2013, S C as...@outlook.com wrote:
 Thank you all for your valuable comments and information.

 -SC


 Date: Tue, 3 Sep 2013 12:01:59 -0400
 From: chris.burrou...@gmail.com
 To: user@cassandra.apache.org
 CC: fsareshw...@quantcast.com
 Subject: Re: row cache

 On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:
  Yes, that is correct.
 
  The SerializingCacheProvider stores row cache contents off heap. I
believe you
  need JNA enabled for this though. Someone please correct me if I am
wrong here.
 
  The ConcurrentLinkedHashCacheProvider stores row cache contents on the
java heap
  itself.
 

 Naming things is hard. Both caches are in memory and are backed by a
 ConcurrentLinkekHashMap. In the case of the SerializingCacheProvider
 the *values* are stored in off heap buffers. Both must store a half
 dozen or so objects (on heap) per entry
 (org.apache.cassandra.cache.RowCacheKey,

com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue,
 java.util.concurrent.ConcurrentHashMap$HashEntry, etc). It would
 probably be better to call this a mixed-heap rather than off-heap
 cache. You may find the number of entires you can hold without gc
 problems to be surprising low (relative to say memcached, or physical
 memory on modern hardware).

 Invalidating a column with SerializingCacheProvider invalidates the
 entire row while with ConcurrentLinkedHashCacheProvider it does not.
 SerializingCacheProvider does not require JNA.

 Both also use memory estimation of the size (of the values only) to
 determine the total number of entries retained. Estimating the size of
 the totally on-heap ConcurrentLinkedHashCacheProvider has historically
 been dicey since we switched from sizing in entries, and it has been
 removed in 2.0.0.

 As said elsewhere in this thread the utility of the row cache varies
 from absolutely essential to source of numerous problems depending
 on the specifics of the data model and request distribution.





Re: row cache

2013-09-07 Thread Mohit Anchlia
I agree. We've had similar experience.

Sent from my iPhone

On Sep 7, 2013, at 6:05 PM, Edward Capriolo edlinuxg...@gmail.com wrote:

 I have found row cache to be more trouble then bene.
 
 The term fools gold comes to mind.
 
 Using key cache and leaving more free main memory seems stable and does not 
 have as many complications. 
 On Wednesday, September 4, 2013, S C as...@outlook.com wrote:
  Thank you all for your valuable comments and information.
 
  -SC
 
 
  Date: Tue, 3 Sep 2013 12:01:59 -0400
  From: chris.burrou...@gmail.com
  To: user@cassandra.apache.org
  CC: fsareshw...@quantcast.com
  Subject: Re: row cache
 
  On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:
   Yes, that is correct.
  
   The SerializingCacheProvider stores row cache contents off heap. I 
   believe you
   need JNA enabled for this though. Someone please correct me if I am 
   wrong here.
  
   The ConcurrentLinkedHashCacheProvider stores row cache contents on the 
   java heap
   itself.
  
 
  Naming things is hard. Both caches are in memory and are backed by a
  ConcurrentLinkekHashMap. In the case of the SerializingCacheProvider
  the *values* are stored in off heap buffers. Both must store a half
  dozen or so objects (on heap) per entry
  (org.apache.cassandra.cache.RowCacheKey,
  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue,
  java.util.concurrent.ConcurrentHashMap$HashEntry, etc). It would
  probably be better to call this a mixed-heap rather than off-heap
  cache. You may find the number of entires you can hold without gc
  problems to be surprising low (relative to say memcached, or physical
  memory on modern hardware).
 
  Invalidating a column with SerializingCacheProvider invalidates the
  entire row while with ConcurrentLinkedHashCacheProvider it does not.
  SerializingCacheProvider does not require JNA.
 
  Both also use memory estimation of the size (of the values only) to
  determine the total number of entries retained. Estimating the size of
  the totally on-heap ConcurrentLinkedHashCacheProvider has historically
  been dicey since we switched from sizing in entries, and it has been
  removed in 2.0.0.
 
  As said elsewhere in this thread the utility of the row cache varies
  from absolutely essential to source of numerous problems depending
  on the specifics of the data model and request distribution.
 
 
 


RE: row cache

2013-09-04 Thread S C
Thank you all for your valuable comments and information.

-SC


 Date: Tue, 3 Sep 2013 12:01:59 -0400
 From: chris.burrou...@gmail.com
 To: user@cassandra.apache.org
 CC: fsareshw...@quantcast.com
 Subject: Re: row cache
 
 On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:
  Yes, that is correct.
 
  The SerializingCacheProvider stores row cache contents off heap. I believe 
  you
  need JNA enabled for this though. Someone please correct me if I am wrong 
  here.
 
  The ConcurrentLinkedHashCacheProvider stores row cache contents on the java 
  heap
  itself.
 
 
 Naming things is hard.  Both caches are in memory and are backed by a 
 ConcurrentLinkekHashMap.  In the case of the SerializingCacheProvider 
 the *values* are stored in off heap buffers.  Both must store a half 
 dozen or so objects (on heap) per entry 
 (org.apache.cassandra.cache.RowCacheKey, 
 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue, 
 java.util.concurrent.ConcurrentHashMap$HashEntry, etc).  It would 
 probably be better to call this a mixed-heap rather than off-heap 
 cache.  You may find the number of entires you can hold without gc 
 problems to be surprising low (relative to say memcached, or physical 
 memory on modern hardware).
 
 Invalidating a column with SerializingCacheProvider invalidates the 
 entire row while with ConcurrentLinkedHashCacheProvider it does not. 
 SerializingCacheProvider does not require JNA.
 
 Both also use memory estimation of the size (of the values only) to 
 determine the total number of entries retained.  Estimating the size of 
 the totally on-heap ConcurrentLinkedHashCacheProvider has historically 
 been dicey since we switched from sizing in entries, and it has been 
 removed in 2.0.0.
 
 As said elsewhere in this thread the utility of the row cache varies 
 from absolutely essential to source of numerous problems depending 
 on the specifics of the data model and request distribution.
 
 
  

Re: row cache

2013-09-03 Thread Chris Burroughs

On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:

Yes, that is correct.

The SerializingCacheProvider stores row cache contents off heap. I believe you
need JNA enabled for this though. Someone please correct me if I am wrong here.

The ConcurrentLinkedHashCacheProvider stores row cache contents on the java heap
itself.



Naming things is hard.  Both caches are in memory and are backed by a 
ConcurrentLinkekHashMap.  In the case of the SerializingCacheProvider 
the *values* are stored in off heap buffers.  Both must store a half 
dozen or so objects (on heap) per entry 
(org.apache.cassandra.cache.RowCacheKey, 
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue, 
java.util.concurrent.ConcurrentHashMap$HashEntry, etc).  It would 
probably be better to call this a mixed-heap rather than off-heap 
cache.  You may find the number of entires you can hold without gc 
problems to be surprising low (relative to say memcached, or physical 
memory on modern hardware).


Invalidating a column with SerializingCacheProvider invalidates the 
entire row while with ConcurrentLinkedHashCacheProvider it does not. 
SerializingCacheProvider does not require JNA.


Both also use memory estimation of the size (of the values only) to 
determine the total number of entries retained.  Estimating the size of 
the totally on-heap ConcurrentLinkedHashCacheProvider has historically 
been dicey since we switched from sizing in entries, and it has been 
removed in 2.0.0.


As said elsewhere in this thread the utility of the row cache varies 
from absolutely essential to source of numerous problems depending 
on the specifics of the data model and request distribution.





RE: row cache

2013-09-01 Thread S C
It is my understanding that row cache is on the memory (Not on disk). It could 
live on heap or native memory depending on the cache provider? Is that right? 

-SC


 Date: Fri, 23 Aug 2013 18:58:07 +0100
 From: b...@dehora.net
 To: user@cassandra.apache.org
 Subject: Re: row cache
 
 I can't emphasise enough testing row caching against your workload for 
 sustained periods and comparing results to just leveraging the 
 filesystem cache and/or ssds. That said. The default off-heap cache can 
 work for structures that don't mutate frequently, and whose rows are not 
 very wide such that the in-and-out-of heap serialization overhead is 
 minimised (I've seen the off-heap cache slow a system down because of 
 serialization costs). The on-heap can do update in place, which is nice 
 for more frequently changing structures, and for larger structures 
 because it dodges the off-heap's serialization overhead. One problem 
 I've experienced with the on-heap cache is the cache working set 
 exceeding allocated space, resulting in GC pressure from sustained 
 thrash/evictions.
 
 Neither cache seems suitable for wide row + slicing usecases, eg time 
 series data or CQL tables whose compound keys create wide rows under the 
 hood.
 
 Bill
 
 
 On 2013/08/23 17:30, Robert Coli wrote:
  On Thu, Aug 22, 2013 at 7:53 PM, Faraaz Sareshwala
  fsareshw...@quantcast.com mailto:fsareshw...@quantcast.com wrote:
 
  According to the datastax documentation [1], there are two types of
  row cache providers:
 
  ...
 
  The off-heap row cache provider does indeed invalidate rows. We're
  going to look into using the ConcurrentLinkedHashCacheProvider. Time
  to read some source code! :)
 
 
  Thanks for the follow up... I'm used to thinking of the
  ConcurrentLinkedHashCacheProvider as the row cache and forgot that
  SerializingCacheProvider might have different invalidation behavior.
  Invalidating the whole row on write seems highly likely to reduce the
  overall performance of such a row cache. :)
 
  The criteria for use of row cache mentioned up-thread remain relevant.
  In most cases, you probably don't actually want to use the row cache.
  Especially if you're using ConcurrentLinkedHashCacheProvider and
  creating long lived, on heap objects.
 
  =Rob
 
  

Re: row cache

2013-09-01 Thread Faraaz Sareshwala
Yes, that is correct.

The SerializingCacheProvider stores row cache contents off heap. I believe you
need JNA enabled for this though. Someone please correct me if I am wrong here.

The ConcurrentLinkedHashCacheProvider stores row cache contents on the java heap
itself.

Each cache provider has different characteristics so it's important to read up
on how each works and even try it with your workload to see which one gives you
better performance, if any at all.

Faraaz

On Sun, Sep 01, 2013 at 12:06:20AM -0700, S C wrote:
 It is my understanding that row cache is on the memory (Not on disk). It could
 live on heap or native memory depending on the cache provider? Is that right? 
 
 -SC
 
 
  Date: Fri, 23 Aug 2013 18:58:07 +0100
  From: b...@dehora.net
  To: user@cassandra.apache.org
  Subject: Re: row cache
 
  I can't emphasise enough testing row caching against your workload for
  sustained periods and comparing results to just leveraging the
  filesystem cache and/or ssds. That said. The default off-heap cache can
  work for structures that don't mutate frequently, and whose rows are not
  very wide such that the in-and-out-of heap serialization overhead is
  minimised (I've seen the off-heap cache slow a system down because of
  serialization costs). The on-heap can do update in place, which is nice
  for more frequently changing structures, and for larger structures
  because it dodges the off-heap's serialization overhead. One problem
  I've experienced with the on-heap cache is the cache working set
  exceeding allocated space, resulting in GC pressure from sustained
  thrash/evictions.
 
  Neither cache seems suitable for wide row + slicing usecases, eg time
  series data or CQL tables whose compound keys create wide rows under the
  hood.
 
  Bill
 
 
  On 2013/08/23 17:30, Robert Coli wrote:
   On Thu, Aug 22, 2013 at 7:53 PM, Faraaz Sareshwala
   fsareshw...@quantcast.com mailto:fsareshw...@quantcast.com wrote:
  
   According to the datastax documentation [1], there are two types of
   row cache providers:
  
   ...
  
   The off-heap row cache provider does indeed invalidate rows. We're
   going to look into using the ConcurrentLinkedHashCacheProvider. Time
   to read some source code! :)
  
  
   Thanks for the follow up... I'm used to thinking of the
   ConcurrentLinkedHashCacheProvider as the row cache and forgot that
   SerializingCacheProvider might have different invalidation behavior.
   Invalidating the whole row on write seems highly likely to reduce the
   overall performance of such a row cache. :)
  
   The criteria for use of row cache mentioned up-thread remain relevant.
   In most cases, you probably don't actually want to use the row cache.
   Especially if you're using ConcurrentLinkedHashCacheProvider and
   creating long lived, on heap objects.
  
   =Rob
 


Low Row Cache Request

2013-08-31 Thread Sávio Teles
I'm running one Cassandra node -version 1.2.6- and I *enabled* the *row
cache* with *1GB*.

But looking the Cassandra metrics on JConsole, *Row Cache Requests* are
very *low* after a high number of queries (about 12 requests).

RowCache metrics:

*Capacity: 1GB*
*Entries: 3
*
*HitRate: 0.75
*
*Hits: 9
*
*Requests: 12
*
*Size: 191630
*

Something wrong?




-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG


Re: Low Row Cache Request

2013-08-31 Thread Jonathan Haddad
9/12 = .75


  


It's a rate, not a percentage.

On Sat, Aug 31, 2013 at 2:21 PM, Sávio Teles savio.te...@lupa.inf.ufg.br
wrote:

 I'm running one Cassandra node -version 1.2.6- and I *enabled* the *row
 cache* with *1GB*.
 But looking the Cassandra metrics on JConsole, *Row Cache Requests* are
 very *low* after a high number of queries (about 12 requests).
 RowCache metrics:
 *Capacity: 1GB*
 *Entries: 3
 *
 *HitRate: 0.75
 *
 *Hits: 9
 *
 *Requests: 12
 *
 *Size: 191630
 *
 Something wrong?
 -- 
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG

Re: Low Row Cache Request

2013-08-31 Thread Sávio Teles
Yes, it is! I've fixed the problem. I miss the caching property set to
'ALL' to create the columily family.


2013/8/31 Jonathan Haddad jonathan.had...@gmail.com

 9/12 = .75

 It's a rate, not a percentage.


 On Sat, Aug 31, 2013 at 2:21 PM, Sávio Teles 
 savio.te...@lupa.inf.ufg.brwrote:

I'm running one Cassandra node -version 1.2.6- and I *enabled* the *row
 cache* with *1GB*.

 But looking the Cassandra metrics on JConsole, *Row Cache Requests* are
 very *low* after a high number of queries (about 12 requests).

 RowCache metrics:

  *Capacity: 1GB*
 *Entries: 3
 *
 *HitRate: 0.75
 *
 *Hits: 9
 *
 *Requests: 12
 *
 *Size: 191630
 *

 Something wrong?




 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG





-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG


Re: row cache

2013-08-23 Thread Robert Coli
On Thu, Aug 22, 2013 at 7:53 PM, Faraaz Sareshwala 
fsareshw...@quantcast.com wrote:

 According to the datastax documentation [1], there are two types of row
 cache providers:

...

 The off-heap row cache provider does indeed invalidate rows. We're going
 to look into using the ConcurrentLinkedHashCacheProvider. Time to read
 some source code! :)


Thanks for the follow up... I'm used to thinking of the
ConcurrentLinkedHashCacheProvider as the row cache and forgot that
SerializingCacheProvider
might have different invalidation behavior. Invalidating the whole row on
write seems highly likely to reduce the overall performance of such a row
cache. :)

The criteria for use of row cache mentioned up-thread remain relevant. In
most cases, you probably don't actually want to use the row cache.
Especially if you're using ConcurrentLinkedHashCacheProvider and creating
long lived, on heap objects.

=Rob


Re: row cache

2013-08-23 Thread Bill de hÓra
I can't emphasise enough testing row caching against your workload for 
sustained periods and comparing results to just leveraging the 
filesystem cache and/or ssds. That said. The default off-heap cache can 
work for structures that don't mutate frequently, and whose rows are not 
very wide such that the in-and-out-of heap serialization overhead is 
minimised (I've seen the off-heap cache slow a system down because of 
serialization costs). The on-heap can do update in place, which is nice 
for more frequently changing structures, and for larger structures 
because it dodges the off-heap's serialization overhead. One problem 
I've experienced with the on-heap cache is the cache working set 
exceeding allocated space, resulting in GC pressure from sustained 
thrash/evictions.


Neither cache seems suitable for wide row + slicing usecases, eg time 
series data or CQL tables whose compound keys create wide rows under the 
hood.


Bill


On 2013/08/23 17:30, Robert Coli wrote:

On Thu, Aug 22, 2013 at 7:53 PM, Faraaz Sareshwala
fsareshw...@quantcast.com mailto:fsareshw...@quantcast.com wrote:

According to the datastax documentation [1], there are two types of
row cache providers:

...

The off-heap row cache provider does indeed invalidate rows. We're
going to look into using the ConcurrentLinkedHashCacheProvider. Time
to read some source code! :)


Thanks for the follow up... I'm used to thinking of the
ConcurrentLinkedHashCacheProvider as the row cache and forgot that
SerializingCacheProvider might have different invalidation behavior.
Invalidating the whole row on write seems highly likely to reduce the
overall performance of such a row cache. :)

The criteria for use of row cache mentioned up-thread remain relevant.
In most cases, you probably don't actually want to use the row cache.
Especially if you're using ConcurrentLinkedHashCacheProvider and
creating long lived, on heap objects.

=Rob




Re: row cache

2013-08-22 Thread Robert Coli
On Wed, Aug 14, 2013 at 10:56 PM, Faraaz Sareshwala 
fsareshw...@quantcast.com wrote:


- All writes invalidate the entire row (updates thrown out the cached
row)

 This is not correct. Writes are added to the row, if it is in the row
cache. If it's not in the row cache, the row is not added to the cache.

Citation from jbellis on stackoverflow, because I don't have time to find a
better one and the code is not obvious about it :

http://stackoverflow.com/a/12499422

I have yet to go through the source code for the row cache. I do plan to do
 that. Can someone point me to documentation on the row cache internals? All
 I've found online so far is small discussion about it and how to enable it.


There is no such documentation, or at least if it exists I am unaware of it.

In general, the rule of thumb is that the Row Cache should not be used
unless the rows in question are :

1) Very hot in terms of access
2) Uniform in size
3) Small

=Rob


Re: row cache

2013-08-22 Thread Boris Yen
If you are using off-heap memory for row cache, all writes invalidate the
entire row should be correct.

Boris


On Fri, Aug 23, 2013 at 8:32 AM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Aug 14, 2013 at 10:56 PM, Faraaz Sareshwala 
 fsareshw...@quantcast.com wrote:


- All writes invalidate the entire row (updates thrown out the cached
row)

 This is not correct. Writes are added to the row, if it is in the row
 cache. If it's not in the row cache, the row is not added to the cache.

 Citation from jbellis on stackoverflow, because I don't have time to find
 a better one and the code is not obvious about it :

 http://stackoverflow.com/a/12499422

 I have yet to go through the source code for the row cache. I do plan to
 do that. Can someone point me to documentation on the row cache internals?
 All I've found online so far is small discussion about it and how to enable
 it.


 There is no such documentation, or at least if it exists I am unaware of
 it.

 In general, the rule of thumb is that the Row Cache should not be used
 unless the rows in question are :

 1) Very hot in terms of access
 2) Uniform in size
 3) Small

 =Rob



Re: row cache

2013-08-22 Thread Faraaz Sareshwala
After a bit of searching, I think I've found the answer I've been looking for. 
I guess I didn't search hard enough before sending out this email. Thank you 
all for the responses.

According to the datastax documentation [1], there are two types of row cache 
providers:

row_cache_provider
(Default: SerializingCacheProvider) Specifies what kind of implementation to 
use for the row cache.
SerializingCacheProvider: Serializes the contents of the row and stores it in 
native memory, that is, off the JVM Heap. Serialized rows take significantly 
less memory than live rows in the JVM, so you can cache more rows in a given 
memory footprint. Storing the cache off-heap means you can use smaller heap 
sizes, which reduces the impact of garbage collection pauses. It is valid to 
specify the fully-qualified class name to a class that 
implementsorg.apache.cassandra.cache.IRowCacheProvider.
ConcurrentLinkedHashCacheProvider: Rows are cached using the JVM heap, 
providing the same row cache behavior as Cassandra versions prior to 0.8.

The SerializingCacheProvider is 5 to 10 times more memory-efficient than 
ConcurrentLinkedHashCacheProvider for applications that are not blob-intensive. 
However, SerializingCacheProvider may perform worse in update-heavy workload 
situations because it invalidates cached rows on update instead of updating 
them in place as ConcurrentLinkedHashCacheProvider does.


The off-heap row cache provider does indeed invalidate rows. We're going to 
look into using the ConcurrentLinkedHashCacheProvider. Time to read some source 
code! :)

Faraaz

[1] 
http://www.datastax.com/documentation/cassandra/1.2/webhelp/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__row_cache_provider




On Thursday, August 22, 2013 at 7:40 PM, Boris Yen wrote:

 If you are using off-heap memory for row cache, all writes invalidate the 
 entire row should be correct.
 
 Boris
 
 
 On Fri, Aug 23, 2013 at 8:32 AM, Robert Coli rc...@eventbrite.com 
 (mailto:rc...@eventbrite.com) wrote:
  On Wed, Aug 14, 2013 at 10:56 PM, Faraaz Sareshwala 
  fsareshw...@quantcast.com (mailto:fsareshw...@quantcast.com) wrote:
   All writes invalidate the entire row (updates thrown out the cached row)
  This is not correct. Writes are added to the row, if it is in the row 
  cache. If it's not in the row cache, the row is not added to the cache. 
   
  Citation from jbellis on stackoverflow, because I don't have time to find a 
  better one and the code is not obvious about it :
  
  http://stackoverflow.com/a/12499422 
  
   I have yet to go through the source code for the row cache. I do plan to 
   do that. Can someone point me to documentation on the row cache 
   internals? All I've found online so far is small discussion about it and 
   how to enable it. 
  
  There is no such documentation, or at least if it exists I am unaware of it.
  
  In general, the rule of thumb is that the Row Cache should not be used 
  unless the rows in question are : 
  
  1) Very hot in terms of access
  2) Uniform in size
  3) Small
  
  =Rob  



row cache

2013-08-14 Thread Faraaz Sareshwala
At the Cassandra 2013 conference, Axel Liljencrantz from Spotify discussed 
various cassandra gotchas in his talk on How Not to Use Cassandra. One of the 
sections of his talk was on the row cache. If you weren't at the talk, or don't 
remember it, the video is up on youtube [1]. The discussion on the row cache 
starts at about 5:35.

The takeaway from his row cache bit is that the row cache stores the full row:
Cache misses on a single column get silently turn into a full row read in order 
to cache the full row
All writes invalidate the entire row (updates thrown out the cached row)


I'm mostly interested in his second point. Is he saying that a single column 
mutation on a row which happens to be in the row cache results in the row cache 
completely discarding the row and waiting for another read of the row in order 
to bring it back in?

I must have misunderstood what he said because there is no way the row cache 
would be effective at all if that is how it worked. Most likely, it is smart 
and updates both the cache and real storage, or sets a dirty bit and writes 
through on eviction or some other sane eviction policy.

I have yet to go through the source code for the row cache. I do plan to do 
that. Can someone point me to documentation on the row cache internals? All 
I've found online so far is small discussion about it and how to enable it.

Thank you,
Faraaz

[1] http://www.youtube.com/watch?v=0u-EKJBPrj8

Re: Row cache off-heap ?

2013-03-14 Thread Alain RODRIGUEZ
Thanks, I'll let you know when I'll do so. But any Idea about the increase
of the heap used if all seems to be well configured ? Should I raise a
ticket since we are at least 3 having this issue from what I saw in the
mailing list ?


2013/3/14 aaron morton aa...@thelastpickle.com

  No, I didn't. I used the nodetool setcachecapacity and didn't restart
 the node.
 ok.

  I find them hudge, and just happened on the node in which I had enabled
 row cache. I just enabled it on .164 node from 10:45 to 10:48 and the heap
 size doubled from 3.5GB to 7GB (out of 8, which induced memory pressure).
 About GC, all the collections increased a lot compare to the other nodes
 with row caching disabled.

 If the row cache provider is set to serialising, and the node restarted,
 under 1.1X it will us the off heap cache.

 At start up look for the log line
 Initializing row cache with capacity of {} MBs and provider {


 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 12/03/2013, at 1:44 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

  I am using C*1.1.6.
 
  Did you restart the node after changing the row_cache_size_in_mb ?
 
  No, I didn't. I used the nodetool setcachecapacity and didn't restart
 the node.
 
  The changes in GC activity are not huge and may not be due to cache
 activity
 
  I find them hudge, and just happened on the node in which I had enabled
 row cache. I just enabled it on .164 node from 10:45 to 10:48 and the heap
 size doubled from 3.5GB to 7GB (out of 8, which induced memory pressure).
 About GC, all the collections increased a lot compare to the other nodes
 with row caching disabled.
 
  What is the output from nodetool info?
 
  I can give it to you but, row cache i now disabled.
 
  Token: 85070591730234615865843651857942052864
  Gossip active: true
  Thrift active: true
  Load : 201.61 GB
  Generation No: 1362749056
  Uptime (seconds) : 328675
  Heap Memory (MB) : 5157.58 / 8152.00
  Data Center  : eu-west
  Rack : 1b
  Exceptions   : 24
  Key Cache: size 104857584 (bytes), capacity 104857584 (bytes),
 106814132 hits, 120131310 requests, 0.858 recent hit rate, 14400 save
 period in seconds
  Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0
 requests, NaN recent hit rate, 0 save period in seconds
 
  I think it won't help, but I can't try things now unless we are quire
 sure it will work smooth, we are on heavy load.
 
  Anyway, thanks for trying to help once again.
 
 
 
 
  2013/3/12 aaron morton aa...@thelastpickle.com
  What version are you using?
 
  Sounds like you have configured it correctly. Did you restart the node
 after changing the row_cache_size_in_mb ?
  The changes in GC activity are not huge and may not be due to cache
 activity. Have they continued after you enabled the row cache?
 
  What is the output from nodetool info?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 11/03/2013, at 5:30 AM, Sávio Teles savio.te...@lupa.inf.ufg.br
 wrote:
 
  I have the same problem!
 
  2013/3/11 Alain RODRIGUEZ arodr...@gmail.com
  I can add that I have JNA corectly loaded, from the logs: JNA mlockall
 successful
 
 
  2013/3/11 Alain RODRIGUEZ arodr...@gmail.com
  Any clue on this ?
 
  Row cache well configured could avoid us a lot of disk read, and IO is
 definitely our bottleneck... If someone could explain why the row cache has
 so much impact on my JVM and how to avoid it, it would be appreciated :).
 
 
  2013/3/8 Alain RODRIGUEZ arodr...@gmail.com
  Hi,
 
  We have some issue having a high read throughput. I wanted to alleviate
 things by turning the row cache ON.
 
  I set the row cache to 200 on one node and enable caching 'ALL' on the
 3 most read CF. There is the effect this operation had on my JVM:
 http://img692.imageshack.us/img692/4171/datastaxopscenterr.png
 
  It looks like the row cache was somehow stored in-heap. I looked at my
 cassandra.yaml and I have the following configuration: row_cache_provider:
 SerializingCacheProvider (which should be enough to store row cache
 off-heap as described above in this file: SerializingCacheProvider
 serialises the contents of the row and stores it in native memory, i.e.,
 off the JVM Heap)
 
  What's wrong ?
 
 
 
 
 
  --
  Atenciosamente,
  Sávio S. Teles de Oliveira
  voice: +55 62 9136 6996
  http://br.linkedin.com/in/savioteles
  Mestrando em Ciências da Computação - UFG
  Arquiteto de Software
  Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG
 
 



 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com




Re: Row cache off-heap ?

2013-03-14 Thread aaron morton
 Should I raise a ticket since we are at least 3 having this issue from what I 
 saw in the mailing list ?
Sure, if you can come up with steps to reproduce the problem. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 14/03/2013, at 12:46 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Thanks, I'll let you know when I'll do so. But any Idea about the increase of 
 the heap used if all seems to be well configured ? Should I raise a ticket 
 since we are at least 3 having this issue from what I saw in the mailing list 
 ?
 
 
 2013/3/14 aaron morton aa...@thelastpickle.com
  No, I didn't. I used the nodetool setcachecapacity and didn't restart the 
  node.
 ok.
 
  I find them hudge, and just happened on the node in which I had enabled row 
  cache. I just enabled it on .164 node from 10:45 to 10:48 and the heap size 
  doubled from 3.5GB to 7GB (out of 8, which induced memory pressure). About 
  GC, all the collections increased a lot compare to the other nodes with row 
  caching disabled.
 
 If the row cache provider is set to serialising, and the node restarted, 
 under 1.1X it will us the off heap cache.
 
 At start up look for the log line
 Initializing row cache with capacity of {} MBs and provider {
 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 12/03/2013, at 1:44 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 
  I am using C*1.1.6.
 
  Did you restart the node after changing the row_cache_size_in_mb ?
 
  No, I didn't. I used the nodetool setcachecapacity and didn't restart the 
  node.
 
  The changes in GC activity are not huge and may not be due to cache 
  activity
 
  I find them hudge, and just happened on the node in which I had enabled row 
  cache. I just enabled it on .164 node from 10:45 to 10:48 and the heap size 
  doubled from 3.5GB to 7GB (out of 8, which induced memory pressure). About 
  GC, all the collections increased a lot compare to the other nodes with row 
  caching disabled.
 
  What is the output from nodetool info?
 
  I can give it to you but, row cache i now disabled.
 
  Token: 85070591730234615865843651857942052864
  Gossip active: true
  Thrift active: true
  Load : 201.61 GB
  Generation No: 1362749056
  Uptime (seconds) : 328675
  Heap Memory (MB) : 5157.58 / 8152.00
  Data Center  : eu-west
  Rack : 1b
  Exceptions   : 24
  Key Cache: size 104857584 (bytes), capacity 104857584 (bytes), 
  106814132 hits, 120131310 requests, 0.858 recent hit rate, 14400 save 
  period in seconds
  Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, 
  NaN recent hit rate, 0 save period in seconds
 
  I think it won't help, but I can't try things now unless we are quire sure 
  it will work smooth, we are on heavy load.
 
  Anyway, thanks for trying to help once again.
 
 
 
 
  2013/3/12 aaron morton aa...@thelastpickle.com
  What version are you using?
 
  Sounds like you have configured it correctly. Did you restart the node 
  after changing the row_cache_size_in_mb ?
  The changes in GC activity are not huge and may not be due to cache 
  activity. Have they continued after you enabled the row cache?
 
  What is the output from nodetool info?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 11/03/2013, at 5:30 AM, Sávio Teles savio.te...@lupa.inf.ufg.br wrote:
 
  I have the same problem!
 
  2013/3/11 Alain RODRIGUEZ arodr...@gmail.com
  I can add that I have JNA corectly loaded, from the logs: JNA mlockall 
  successful
 
 
  2013/3/11 Alain RODRIGUEZ arodr...@gmail.com
  Any clue on this ?
 
  Row cache well configured could avoid us a lot of disk read, and IO is 
  definitely our bottleneck... If someone could explain why the row cache 
  has so much impact on my JVM and how to avoid it, it would be appreciated 
  :).
 
 
  2013/3/8 Alain RODRIGUEZ arodr...@gmail.com
  Hi,
 
  We have some issue having a high read throughput. I wanted to alleviate 
  things by turning the row cache ON.
 
  I set the row cache to 200 on one node and enable caching 'ALL' on the 3 
  most read CF. There is the effect this operation had on my JVM: 
  http://img692.imageshack.us/img692/4171/datastaxopscenterr.png
 
  It looks like the row cache was somehow stored in-heap. I looked at my 
  cassandra.yaml and I have the following configuration: row_cache_provider: 
  SerializingCacheProvider (which should be enough to store row cache 
  off-heap as described above in this file: SerializingCacheProvider 
  serialises the contents of the row and stores it in native memory, i.e., 
  off the JVM Heap)
 
  What's wrong ?
 
 
 
 
 
  --
  Atenciosamente,
  Sávio S. Teles de Oliveira
  voice: +55 62 9136 6996
  http://br.linkedin.com

Re: Row cache off-heap ?

2013-03-12 Thread Alain RODRIGUEZ
I am using C*1.1.6.

Did you restart the node after changing the row_cache_size_in_mb ?

No, I didn't. I used the nodetool setcachecapacity and didn't restart the
node.

The changes in GC activity are not huge and may not be due to cache
activity

I find them hudge, and just happened on the node in which I had enabled row
cache. I just enabled it on .164 node from 10:45 to 10:48 and the heap size
doubled from 3.5GB to 7GB (out of 8, which induced memory pressure). About
GC, all the collections increased a lot compare to the other nodes with row
caching disabled.

What is the output from nodetool info?

I can give it to you but, row cache i now disabled.

Token: 85070591730234615865843651857942052864
Gossip active: true
Thrift active: true
Load : 201.61 GB
Generation No: 1362749056
Uptime (seconds) : 328675
Heap Memory (MB) : 5157.58 / 8152.00
Data Center  : eu-west
Rack : 1b
Exceptions   : 24
Key Cache: size 104857584 (bytes), capacity 104857584 (bytes),
106814132 hits, 120131310 requests, 0.858 recent hit rate, 14400 save
period in seconds
Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
NaN recent hit rate, 0 save period in seconds

I think it won't help, but I can't try things now unless we are quire sure
it will work smooth, we are on heavy load.

Anyway, thanks for trying to help once again.




2013/3/12 aaron morton aa...@thelastpickle.com

 What version are you using?

 Sounds like you have configured it correctly. Did you restart the node
 after changing the row_cache_size_in_mb ?
 The changes in GC activity are not huge and may not be due to cache
 activity. Have they continued after you enabled the row cache?

 What is the output from nodetool info?

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 11/03/2013, at 5:30 AM, Sávio Teles savio.te...@lupa.inf.ufg.br
 wrote:

 I have the same problem!

 2013/3/11 Alain RODRIGUEZ arodr...@gmail.com

 I can add that I have JNA corectly loaded, from the logs: JNA mlockall
 successful


 2013/3/11 Alain RODRIGUEZ arodr...@gmail.com

 Any clue on this ?

 Row cache well configured could avoid us a lot of disk read, and IO
 is definitely our bottleneck... If someone could explain why the row cache
 has so much impact on my JVM and how to avoid it, it would be appreciated
 :).


 2013/3/8 Alain RODRIGUEZ arodr...@gmail.com

 Hi,

 We have some issue having a high read throughput. I wanted to alleviate
 things by turning the row cache ON.

 I set the row cache to 200 on one node and enable caching 'ALL' on the
 3 most read CF. There is the effect this operation had on my JVM:
 http://img692.imageshack.us/img692/4171/datastaxopscenterr.png

 It looks like the row cache was somehow stored in-heap. I looked at my
 cassandra.yaml and I have the following configuration: row_cache_provider:
 SerializingCacheProvider (which should be enough to store row cache
 off-heap as described above in this file: SerializingCacheProvider
 serialises the contents of the row and stores it in native memory, i.e.,
 off the JVM Heap)

 What's wrong ?






 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
  Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG





Re: Row cache off-heap ?

2013-03-11 Thread Alain RODRIGUEZ
Any clue on this ?

Row cache well configured could avoid us a lot of disk read, and IO
is definitely our bottleneck... If someone could explain why the row cache
has so much impact on my JVM and how to avoid it, it would be appreciated
:).


2013/3/8 Alain RODRIGUEZ arodr...@gmail.com

 Hi,

 We have some issue having a high read throughput. I wanted to alleviate
 things by turning the row cache ON.

 I set the row cache to 200 on one node and enable caching 'ALL' on the 3
 most read CF. There is the effect this operation had on my JVM:
 http://img692.imageshack.us/img692/4171/datastaxopscenterr.png

 It looks like the row cache was somehow stored in-heap. I looked at my
 cassandra.yaml and I have the following configuration: row_cache_provider:
 SerializingCacheProvider (which should be enough to store row cache
 off-heap as described above in this file: SerializingCacheProvider
 serialises the contents of the row and stores it in native memory, i.e.,
 off the JVM Heap)

 What's wrong ?



Re: Row cache off-heap ?

2013-03-11 Thread Alain RODRIGUEZ
I can add that I have JNA corectly loaded, from the logs: JNA mlockall
successful


2013/3/11 Alain RODRIGUEZ arodr...@gmail.com

 Any clue on this ?

 Row cache well configured could avoid us a lot of disk read, and IO
 is definitely our bottleneck... If someone could explain why the row cache
 has so much impact on my JVM and how to avoid it, it would be appreciated
 :).


 2013/3/8 Alain RODRIGUEZ arodr...@gmail.com

 Hi,

 We have some issue having a high read throughput. I wanted to alleviate
 things by turning the row cache ON.

 I set the row cache to 200 on one node and enable caching 'ALL' on the 3
 most read CF. There is the effect this operation had on my JVM:
 http://img692.imageshack.us/img692/4171/datastaxopscenterr.png

 It looks like the row cache was somehow stored in-heap. I looked at my
 cassandra.yaml and I have the following configuration: row_cache_provider:
 SerializingCacheProvider (which should be enough to store row cache
 off-heap as described above in this file: SerializingCacheProvider
 serialises the contents of the row and stores it in native memory, i.e.,
 off the JVM Heap)

 What's wrong ?





Re: Row cache off-heap ?

2013-03-11 Thread Sávio Teles
I have the same problem!

2013/3/11 Alain RODRIGUEZ arodr...@gmail.com

 I can add that I have JNA corectly loaded, from the logs: JNA mlockall
 successful


 2013/3/11 Alain RODRIGUEZ arodr...@gmail.com

 Any clue on this ?

 Row cache well configured could avoid us a lot of disk read, and IO
 is definitely our bottleneck... If someone could explain why the row cache
 has so much impact on my JVM and how to avoid it, it would be appreciated
 :).


 2013/3/8 Alain RODRIGUEZ arodr...@gmail.com

 Hi,

 We have some issue having a high read throughput. I wanted to alleviate
 things by turning the row cache ON.

 I set the row cache to 200 on one node and enable caching 'ALL' on the 3
 most read CF. There is the effect this operation had on my JVM:
 http://img692.imageshack.us/img692/4171/datastaxopscenterr.png

 It looks like the row cache was somehow stored in-heap. I looked at my
 cassandra.yaml and I have the following configuration: row_cache_provider:
 SerializingCacheProvider (which should be enough to store row cache
 off-heap as described above in this file: SerializingCacheProvider
 serialises the contents of the row and stores it in native memory, i.e.,
 off the JVM Heap)

 What's wrong ?






-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG


Row cache off-heap ?

2013-03-08 Thread Alain RODRIGUEZ
Hi,

We have some issue having a high read throughput. I wanted to alleviate
things by turning the row cache ON.

I set the row cache to 200 on one node and enable caching 'ALL' on the 3
most read CF. There is the effect this operation had on my JVM:
http://img692.imageshack.us/img692/4171/datastaxopscenterr.png

It looks like the row cache was somehow stored in-heap. I looked at my
cassandra.yaml and I have the following configuration: row_cache_provider:
SerializingCacheProvider (which should be enough to store row cache
off-heap as described above in this file: SerializingCacheProvider
serialises the contents of the row and stores it in native memory, i.e.,
off the JVM Heap)

What's wrong ?


Re: Secondary index query + 2 Datacenters + Row Cache + Restart = 0 rows

2013-02-05 Thread Alexei Bakanov
I tried to run with tracing, but it says 'Scanned 0 rows and matched 0'.
I found existing issue on this bug
https://issues.apache.org/jira/browse/CASSANDRA-4973
I made a d-test for reproducing it and attached to the ticket.

Alexei

On 2 February 2013 23:00, aaron morton aa...@thelastpickle.com wrote:
 Can you run the select in cqlsh and enabling tracing (see the cqlsh online
 help).

 If you can replicate it then place raise a ticket on
 https://issues.apache.org/jira/browse/CASSANDRA and update email thread.

 Thanks

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 1/02/2013, at 9:03 PM, Alexei Bakanov russ...@gmail.com wrote:

 Hello,

 I've found a combination that doesn't work:
 A column family that have a secondary index and caching='ALL' with
 data in two datacenters and I do a restart of the nodes, then my
 secondary index queries start returning 0 rows.
 It happens when amount of data goes over a certain threshold, so I
 suspect that compactions are involved in this as well.
 Taking out one of the ingredients fixes the problem and my queries
 return rows from secondary index.
 I suspect that this guy is struggling with the same thing
 https://issues.apache.org/jira/browse/CASSANDRA-4785

 Here is a sequence of actions that reproduces it with help of CCM:

 $ ccm create --cassandra-version 1.2.1 --nodes 2 -p RandomPartitioner
 testRowCacheDC
 $ ccm updateconf 'endpoint_snitch: PropertyFileSnitch'
 $ ccm updateconf 'row_cache_size_in_mb: 200'
 $ cp ~/Downloads/cassandra-topology.properties
 ~/.ccm/testRowCacheDC/node1/conf/  (please find .properties file
 below)
 $ cp ~/Downloads/cassandra-topology.properties
 ~/.ccm/testRowCacheDC/node2/conf/
 $ ccm start
 $ ccm cli
 -create keyspace and column family(please find schema below)
 $ python populate_rowcache.py
 $ ccm stop  (I tried flush first, doesn't help)
 $ ccm start
 $ ccm cli
 Connected to: testRowCacheDC on 127.0.0.1/9160
 Welcome to Cassandra CLI version 1.2.1-SNAPSHOT

 Type 'help;' or '?' for help.
 Type 'quit;' or 'exit;' to quit.

 [default@unknown] use testks;
 Authenticated to keyspace: testks
 [default@testks] get cf1 where 'indexedColumn'='userId_75';

 0 Row Returned.
 Elapsed time: 68 msec(s).

 My cassandra instances run with -Xms1927M -Xmx1927M -Xmn400M
 Thanks for help.

 Best regards,
 Alexei


 -- START cassandra-topology.properties --
 127.0.0.1=DC1:RAC1
 127.0.0.2=DC2:RAC1
 default=DC1:r1
 -- FINISH cassandra-topology.properties --

 -- START cassandra-cli schema ---
 create keyspace testks
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {DC2 : 1, DC1 : 1}
  and durable_writes = true;

 use testks;

 create column family cf1
  with column_type = 'Standard'
  and comparator = 'org.apache.cassandra.db.marshal.AsciiType'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 1.0
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'ALL'
  and column_metadata = [
{column_name : 'indexedColumn',
validation_class : UTF8Type,
index_name : 'INDEX1',
index_type : 0}]
  and compression_options = {'sstable_compression' :
 'org.apache.cassandra.io.compress.SnappyCompressor'};
 ---FINISH cassandra-cli schema ---

 -- START populate_rowcache.py ---
 from pycassa.batch import Mutator

 import pycassa

 pool = pycassa.ConnectionPool('testks', timeout=5)
 cf = pycassa.ColumnFamily(pool, 'cf1')

 for userId in xrange(0, 1000):
print userId
b = Mutator(pool, queue_size=200)
for itemId in xrange(20):
rowKey = 'userId_%s:itemId_%s'%(userId, itemId)
for message_number in xrange(10):
b.insert(cf, rowKey, {'indexedColumn': 'userId_%s'%userId,
 str(message_number): str(message_number)})
b.send()

 pool.dispose()
 -- FINISH populate_rowcache.py ---




Re: Secondary index query + 2 Datacenters + Row Cache + Restart = 0 rows

2013-02-02 Thread aaron morton
Can you run the select in cqlsh and enabling tracing (see the cqlsh online 
help). 

If you can replicate it then place raise a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA and update email thread. 

Thanks

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/02/2013, at 9:03 PM, Alexei Bakanov russ...@gmail.com wrote:

 Hello,
 
 I've found a combination that doesn't work:
 A column family that have a secondary index and caching='ALL' with
 data in two datacenters and I do a restart of the nodes, then my
 secondary index queries start returning 0 rows.
 It happens when amount of data goes over a certain threshold, so I
 suspect that compactions are involved in this as well.
 Taking out one of the ingredients fixes the problem and my queries
 return rows from secondary index.
 I suspect that this guy is struggling with the same thing
 https://issues.apache.org/jira/browse/CASSANDRA-4785
 
 Here is a sequence of actions that reproduces it with help of CCM:
 
 $ ccm create --cassandra-version 1.2.1 --nodes 2 -p RandomPartitioner
 testRowCacheDC
 $ ccm updateconf 'endpoint_snitch: PropertyFileSnitch'
 $ ccm updateconf 'row_cache_size_in_mb: 200'
 $ cp ~/Downloads/cassandra-topology.properties
 ~/.ccm/testRowCacheDC/node1/conf/  (please find .properties file
 below)
 $ cp ~/Downloads/cassandra-topology.properties 
 ~/.ccm/testRowCacheDC/node2/conf/
 $ ccm start
 $ ccm cli
 -create keyspace and column family(please find schema below)
 $ python populate_rowcache.py
 $ ccm stop  (I tried flush first, doesn't help)
 $ ccm start
 $ ccm cli
 Connected to: testRowCacheDC on 127.0.0.1/9160
 Welcome to Cassandra CLI version 1.2.1-SNAPSHOT
 
 Type 'help;' or '?' for help.
 Type 'quit;' or 'exit;' to quit.
 
 [default@unknown] use testks;
 Authenticated to keyspace: testks
 [default@testks] get cf1 where 'indexedColumn'='userId_75';
 
 0 Row Returned.
 Elapsed time: 68 msec(s).
 
 My cassandra instances run with -Xms1927M -Xmx1927M -Xmn400M
 Thanks for help.
 
 Best regards,
 Alexei
 
 
 -- START cassandra-topology.properties --
 127.0.0.1=DC1:RAC1
 127.0.0.2=DC2:RAC1
 default=DC1:r1
 -- FINISH cassandra-topology.properties --
 
 -- START cassandra-cli schema ---
 create keyspace testks
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {DC2 : 1, DC1 : 1}
  and durable_writes = true;
 
 use testks;
 
 create column family cf1
  with column_type = 'Standard'
  and comparator = 'org.apache.cassandra.db.marshal.AsciiType'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 1.0
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'ALL'
  and column_metadata = [
{column_name : 'indexedColumn',
validation_class : UTF8Type,
index_name : 'INDEX1',
index_type : 0}]
  and compression_options = {'sstable_compression' :
 'org.apache.cassandra.io.compress.SnappyCompressor'};
 ---FINISH cassandra-cli schema ---
 
 -- START populate_rowcache.py ---
 from pycassa.batch import Mutator
 
 import pycassa
 
 pool = pycassa.ConnectionPool('testks', timeout=5)
 cf = pycassa.ColumnFamily(pool, 'cf1')
 
 for userId in xrange(0, 1000):
print userId
b = Mutator(pool, queue_size=200)
for itemId in xrange(20):
rowKey = 'userId_%s:itemId_%s'%(userId, itemId)
for message_number in xrange(10):
b.insert(cf, rowKey, {'indexedColumn': 'userId_%s'%userId,
 str(message_number): str(message_number)})
b.send()
 
 pool.dispose()
 -- FINISH populate_rowcache.py ---



Secondary index query + 2 Datacenters + Row Cache + Restart = 0 rows

2013-02-01 Thread Alexei Bakanov
Hello,

I've found a combination that doesn't work:
A column family that have a secondary index and caching='ALL' with
data in two datacenters and I do a restart of the nodes, then my
secondary index queries start returning 0 rows.
It happens when amount of data goes over a certain threshold, so I
suspect that compactions are involved in this as well.
Taking out one of the ingredients fixes the problem and my queries
return rows from secondary index.
I suspect that this guy is struggling with the same thing
https://issues.apache.org/jira/browse/CASSANDRA-4785

Here is a sequence of actions that reproduces it with help of CCM:

$ ccm create --cassandra-version 1.2.1 --nodes 2 -p RandomPartitioner
testRowCacheDC
$ ccm updateconf 'endpoint_snitch: PropertyFileSnitch'
$ ccm updateconf 'row_cache_size_in_mb: 200'
$ cp ~/Downloads/cassandra-topology.properties
~/.ccm/testRowCacheDC/node1/conf/  (please find .properties file
below)
$ cp ~/Downloads/cassandra-topology.properties ~/.ccm/testRowCacheDC/node2/conf/
$ ccm start
$ ccm cli
 -create keyspace and column family(please find schema below)
$ python populate_rowcache.py
$ ccm stop  (I tried flush first, doesn't help)
$ ccm start
$ ccm cli
Connected to: testRowCacheDC on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.2.1-SNAPSHOT

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] use testks;
Authenticated to keyspace: testks
[default@testks] get cf1 where 'indexedColumn'='userId_75';

0 Row Returned.
Elapsed time: 68 msec(s).

My cassandra instances run with -Xms1927M -Xmx1927M -Xmn400M
Thanks for help.

Best regards,
Alexei


-- START cassandra-topology.properties --
127.0.0.1=DC1:RAC1
127.0.0.2=DC2:RAC1
default=DC1:r1
-- FINISH cassandra-topology.properties --

-- START cassandra-cli schema ---
create keyspace testks
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {DC2 : 1, DC1 : 1}
  and durable_writes = true;

use testks;

create column family cf1
  with column_type = 'Standard'
  and comparator = 'org.apache.cassandra.db.marshal.AsciiType'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 1.0
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'ALL'
  and column_metadata = [
{column_name : 'indexedColumn',
validation_class : UTF8Type,
index_name : 'INDEX1',
index_type : 0}]
  and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};
---FINISH cassandra-cli schema ---

-- START populate_rowcache.py ---
from pycassa.batch import Mutator

import pycassa

pool = pycassa.ConnectionPool('testks', timeout=5)
cf = pycassa.ColumnFamily(pool, 'cf1')

for userId in xrange(0, 1000):
print userId
b = Mutator(pool, queue_size=200)
for itemId in xrange(20):
rowKey = 'userId_%s:itemId_%s'%(userId, itemId)
for message_number in xrange(10):
b.insert(cf, rowKey, {'indexedColumn': 'userId_%s'%userId,
str(message_number): str(message_number)})
b.send()

pool.dispose()
-- FINISH populate_rowcache.py ---


Re: Row cache and counters

2013-01-03 Thread André Cruz
Does anyone see anything wrong in these settings? Anything to account for a 8s 
timeout during a counter increment?

Thanks,
André

On 31/12/2012, at 14:35, André Cruz andre.c...@co.sapo.pt wrote:

 On Dec 29, 2012, at 8:53 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
 
 Can you post gc settings? Also check logs and see what it says
 
 These are the relevant jam settings:
 
 -home /usr/lib/jvm/j2re1.6-oracle/bin/../
 -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
 -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 
 -Xms8049M 
 -Xmx8049M 
 -Xmn800M 
 -XX:+HeapDumpOnOutOfMemoryError 
 -Xss196k 
 -XX:+UseParNewGC 
 -XX:+UseConcMarkSweepGC 
 -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 
 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 
 -XX:+UseCMSInitiatingOccupancyOnly
 -Djava.net.preferIPv4Stack=true 
 
 I have 3 servers (32GB RAM), with a RF of 3. I searched all of them for log 
 messages related to a period when I had timeouts (19h20-19h30), only one of 
 them showed messages for that timeframe, and none seem related with that CF:
 
 INFO [MemoryMeter:1] 2012-12-27 19:20:17,858 Memtable.java (line 213) 
 CFS(Keyspace='Disco', ColumnFamily='RevisionLog') liveRatio is 
 4.318314007200407 (just-counted was 4.318314007200407).  calculation took 
 350ms for 8623 columns
 INFO [MemoryMeter:1] 2012-12-27 19:23:37,148 Memtable.java (line 213) 
 CFS(Keyspace='Disco', ColumnFamily='LinkPathsExist') liveRatio is 
 25.87012987012987 (just-counted was 25.87012987012987).  calculation took 0ms 
 for 10 columns
 INFO [MemoryMeter:1] 2012-12-27 19:28:32,736 Memtable.java (line 213) 
 CFS(Keyspace='Disco', ColumnFamily='BlockMetadata.BlockMetadata_used_idx') 
 liveRatio is 1.7176206177506523 (just-counted was 1.7176206177506523).  
 calculation took 62ms for 12941 columns
 INFO [MemoryMeter:1] 2012-12-27 19:30:12,752 Memtable.java (line 213) 
 CFS(Keyspace='Disco', ColumnFamily='Namespace') liveRatio is 
 20.097473571044617 (just-counted was 20.097473571044617).  calculation took 
 10ms for 288 columns
 INFO [MemoryMeter:1] 2012-12-27 19:30:28,421 Memtable.java (line 213) 
 CFS(Keyspace='Disco', ColumnFamily='NamespaceDir') liveRatio is 
 4.801010311533358 (just-counted was 4.801010311533358).  calculation took 
 96ms for 3138 columns
 
 
 Also post how many writes and reads along with avg row size
 
 All rows have 3-6 counters. As for writes and reads:
 
Column Family: UserQuotas
SSTable count: 3
Space used (live): 2609839
Space used (total): 2609839
Number of Keys (estimate): 22016
Memtable Columns Count: 142705
Memtable Data Size: 768117
Memtable Switch Count: 26
Read Count: 822203
Read Latency: 0.305 ms.
Write Count: 1024277
Write Latency: 0.066 ms.
Pending Tasks: 0
Bloom Filter False Postives: 3
Bloom Filter False Ratio: 0.0
Bloom Filter Space Used: 42584
Compacted row minimum size: 125
Compacted row maximum size: 770
Compacted row mean size: 298
 
 
 Is there anything wrong with my configuration?
 
 Best regards,
 André Cruz
 
 


Row cache and counters

2012-12-29 Thread André Cruz
Hello.

I recently was having some timeout issues while updating counters and turned on 
row cache for that particular CF. This is its stats:

Column Family: UserQuotas
SSTable count: 3
Space used (live): 2687239
Space used (total): 2687239
Number of Keys (estimate): 22912
Memtable Columns Count: 25766
Memtable Data Size: 180975
Memtable Switch Count: 17
Read Count: 356900
Read Latency: 1.004 ms.
Write Count: 548996
Write Latency: 0.045 ms.
Pending Tasks: 0
Bloom Filter False Postives: 17
Bloom Filter False Ratio: 0.0
Bloom Filter Space Used: 44232
Compacted row minimum size: 125
Compacted row maximum size: 770
Compacted row mean size: 308

Since it is rather small I was hoping that it would eventually be all cached, 
and the timeouts would go away. I'm updating the counters with a CL of ONE, so 
I thought that the timeout would be caused by the read step and the cache would 
help here. But I still get timeouts, and the cache hit rate is rather low:

Row Cache: size 1436291 (bytes), capacity 524288000 (bytes), 125310 
hits, 442760 requests, 0.247 recent hit rate, 0 save period in seconds

Am I assuming something wrong about the row cache? Isn't it updated when a 
counter update occurs or is just invalidated?

Best regards,
André Cruz

Re: Row cache and counters

2012-12-29 Thread rohit bhatia
Reads during a write still occur during a counter increment with CL ONE,
but that latency is not counted in the request latency for the write. Your
local node write latency of 45 microseconds is pretty quick. what is your
timeout and the write request latency you see. In our deployment we had
some issues and we could trace the timeouts to parnew gc collections which
were quite frequent. You might just want to take a look there too.


On Sat, Dec 29, 2012 at 4:44 PM, André Cruz andre.c...@co.sapo.pt wrote:

 Hello.

 I recently was having some timeout issues while updating counters and
 turned on row cache for that particular CF. This is its stats:

 Column Family: UserQuotas
 SSTable count: 3
 Space used (live): 2687239
 Space used (total): 2687239
 Number of Keys (estimate): 22912
 Memtable Columns Count: 25766
 Memtable Data Size: 180975
 Memtable Switch Count: 17
 Read Count: 356900
 Read Latency: 1.004 ms.
 Write Count: 548996
 Write Latency: 0.045 ms.
 Pending Tasks: 0
 Bloom Filter False Postives: 17
 Bloom Filter False Ratio: 0.0
 Bloom Filter Space Used: 44232
 Compacted row minimum size: 125
 Compacted row maximum size: 770
 Compacted row mean size: 308

 Since it is rather small I was hoping that it would eventually be all
 cached, and the timeouts would go away. I'm updating the counters with a CL
 of ONE, so I thought that the timeout would be caused by the read step and
 the cache would help here. But I still get timeouts, and the cache hit rate
 is rather low:

 Row Cache: size 1436291 (bytes), capacity 524288000 (bytes),
 125310 hits, 442760 requests, 0.247 recent hit rate, 0 save period in
 seconds

 Am I assuming something wrong about the row cache? Isn't it updated when a
 counter update occurs or is just invalidated?

 Best regards,
 André Cruz


Re: Row cache and counters

2012-12-29 Thread André Cruz
On 29/12/2012, at 16:59, rohit bhatia rohit2...@gmail.com wrote:

 Reads during a write still occur during a counter increment with CL ONE, but 
 that latency is not counted in the request latency for the write. Your local 
 node write latency of 45 microseconds is pretty quick. what is your timeout 
 and the write request latency you see.

Most of the time the increments are pretty quick, in the millisecond range. I 
have a 8s timeout and sometimes timeouts happen in bursts.  

 In our deployment we had some issues and we could trace the timeouts to 
 parnew gc collections which were quite frequent. You might just want to take 
 a look there too.

What can we do about that? Which settings did you tune?

Thanks,
André

Re: Row cache and counters

2012-12-29 Thread rohit bhatia
i assume u mean 8 seconds and not 8ms..
thats pretty huge to be caused by gc. Is there lot of load on your servers?
You might also need to check for memory contention

Regarding GC, since its parnew all u can really do is increase heap and
young gen size, or modify tenuring rate. But that can't be the reason for a
8 second timeout.


On Sat, Dec 29, 2012 at 11:37 PM, André Cruz andre.c...@co.sapo.pt wrote:

 On 29/12/2012, at 16:59, rohit bhatia rohit2...@gmail.com wrote:

 Reads during a write still occur during a counter increment with CL ONE,
 but that latency is not counted in the request latency for the write. Your
 local node write latency of 45 microseconds is pretty quick. what is your
 timeout and the write request latency you see.


 Most of the time the increments are pretty quick, in the millisecond
 range. I have a 8s timeout and sometimes timeouts happen in bursts.

 In our deployment we had some issues and we could trace the timeouts to
 parnew gc collections which were quite frequent. You might just want to
 take a look there too.


 What can we do about that? Which settings did you tune?

 Thanks,
 André



Re: strange row cache behavior

2012-12-04 Thread aaron morton
 Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes), 0 
 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds

So the cache is pretty much full, there is only 1 MB free. 

There were 2,576 read requests that tried to get a row from the cache. Zero of 
those had a hit. If you have 6 nodes and RF 2, each node has  one third of the 
data in the cluster (from the effective ownership info). So depending on the 
read workload the number of read requests on each node may be different. 

What I think is happening is reads are populating the row cache, then 
subsequent reads are evicting items from the row cache before you get back to 
reading the original rows. So if you read rows 1 to 5, they are put in the 
cache, when you read rows 6 to 10 they are put in and evict rows 1 to 5. Then 
you read rows 1 to 5 again they are not in the cache. 

Try testing with a lower number of hot rows, and/or a bigger row cache. 

But to be honest, with rows in the 10's of MB you will probably only get good 
cache performance with a small set of hot rows. 

Hope that helps. 



-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/12/2012, at 5:11 AM, Yiming Sun yiming@gmail.com wrote:

 Does anyone have any comments/suggestions for me regarding this?  Thanks
 
 
 I am trying to understand some strange behavior of cassandra row cache.  We 
 have a 6-node Cassandra cluster in a single data center on 2 racks, and the 
 neighboring nodes on the ring are from alternative racks.  Each node has 1GB 
 row cache, with key cache disabled.   The cluster uses PropertyFileSnitch, 
 and the ColumnFamily I fetch from uses NetworkTopologyStrategy, with 
 replication factor of 2.  My client code uses Hector to fetch a fixed set of 
 rows from cassandra
 
 What I don't quite understand is even after I ran the client code several 
 times, there are always some nodes with 0 row cache hits, despite that the 
 row cache from all nodes are filled and all nodes receive requests.
 
 Which nodes have 0 hits seem to be strongly related to the following:
 
  - the set of row keys to fetch
  - the order of the set of row keys to fetch
  - the list of hosts passed to Hector's CassandraHostConfigurator
  - the order of the list of hosts passed to Hector
 
 Can someone shed some lights on how exactly the row cache works and hopefully 
 also explain the behavior I have been seeing?  I thought if the fixed set of 
 the rows keys are the only thing I am fetching (each row should be on the 
 order of 10's of MBs, no more than 100MB), and each node gets requests, and 
 its row cache is filled, there's gotta be some hits.  Apparent this is not 
 the case.   Thanks.
 
 cluster information:
 
 Address DC  RackStatus State   Load
 Effective-Ownership Token   
   
  141784319550391026443072753096570088105 
 x.x.x.1DC1 r1  Up Normal  587.46 GB   33.33%  
 0   
 x.x.x.2DC1 r2  Up Normal  591.21 GB   33.33%  
 28356863910078205288614550619314017621  
 x.x.x.3DC1 r1  Up Normal  594.97 GB   33.33%  
 56713727820156410577229101238628035242  
 x.x.x.4DC1 r2  Up Normal  587.15 GB   33.33%  
 85070591730234615865843651857942052863  
 x.x.x.5DC1 r1  Up Normal  590.26 GB   33.33%  
 113427455640312821154458202477256070484 
 x.x.x.6DC1 r2  Up Normal  583.21 GB   33.33%  
 141784319550391026443072753096570088105
 
 
 [user@node]$ ./checkinfo.sh   
 *** x.x.x.4
 Token: 85070591730234615865843651857942052863
 Gossip active: true
 Thrift active: true
 Load : 587.15 GB
 Generation No: 1354074048
 Uptime (seconds) : 36957
 Heap Memory (MB) : 2027.29 / 3948.00
 Data Center  : DC1
 Rack : r2
 Exceptions   : 0
 
 Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, 
 NaN recent hit rate, 14400 save period in seconds
 Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes), 0 
 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds
 
 *** x.x.x.6
 Token: 141784319550391026443072753096570088105
 Gossip active: true
 Thrift active: true
 Load : 583.21 GB
 Generation No: 1354074461
 Uptime (seconds) : 36535
 Heap Memory (MB) : 828.71 / 3948.00
 Data Center  : DC1
 Rack : r2
 Exceptions   : 0
 
 Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, 
 NaN recent hit rate, 14400 save period in seconds
 Row Cache: size

Re: strange row cache behavior

2012-12-04 Thread aaron morton
  Does this mean we should not enable row caches until we are absolutely sure 
 about what's hot (I think there is a reason why row caches are disabled by 
 default) ?
Yes and Yes. 
Row cache takes memory and CPU, unless you know you are getting a benefit from 
it leave it off. The key cache and os disk cache will help. If you find latency 
is an issue then start poking around.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/12/2012, at 4:23 AM, Yiming Sun yiming@gmail.com wrote:

 Hi Aaron,
 
 Thank you,and your explanation makes sense.  At the time, I thought having 
 1GB of row cache on each node was plenty enough, because there was an 
 aggregated 6GB cache, but you are right, with each row in 10's of MBs, some 
 of the nodes can go into a constant load and evict cycle and would have 
 negative effects on the performance.  I will try as you suggested to 1.) 
 reduce the requested entry set, and 2.) increase the row cache size and see 
 if they get better hits, and also do 3) by reversing the requested entry list 
 in alternate runs.
 
 Our data space has close to 3 million rows, but we haven't gotten enough 
 usage statistics to know what rows are hot.  Does this mean we should not 
 enable row caches until we are absolutely sure about what's hot (I think 
 there is a reason why row caches are disabled by default) ?  It also seems 
 from my test that OS page cache works much better, but it could be that OS 
 page cache can utilize all the available memory so it is essentially larger 
 -- I guess I will find out by doing 2.) above.
 
 best,
 
 -- Y.
 
 
 
 On Tue, Dec 4, 2012 at 4:47 AM, aaron morton aa...@thelastpickle.com wrote:
  Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes), 0 
  hits, 2576 requests, NaN recent hit rate, 0 save period in seconds
 
 So the cache is pretty much full, there is only 1 MB free.
 
 There were 2,576 read requests that tried to get a row from the cache. Zero 
 of those had a hit. If you have 6 nodes and RF 2, each node has  one third of 
 the data in the cluster (from the effective ownership info). So depending on 
 the read workload the number of read requests on each node may be different.
 
 What I think is happening is reads are populating the row cache, then 
 subsequent reads are evicting items from the row cache before you get back to 
 reading the original rows. So if you read rows 1 to 5, they are put in the 
 cache, when you read rows 6 to 10 they are put in and evict rows 1 to 5. Then 
 you read rows 1 to 5 again they are not in the cache.
 
 Try testing with a lower number of hot rows, and/or a bigger row cache.
 
 But to be honest, with rows in the 10's of MB you will probably only get good 
 cache performance with a small set of hot rows.
 
 Hope that helps.
 
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 1/12/2012, at 5:11 AM, Yiming Sun yiming@gmail.com wrote:
 
  Does anyone have any comments/suggestions for me regarding this?  Thanks
 
 
  I am trying to understand some strange behavior of cassandra row cache.  We 
  have a 6-node Cassandra cluster in a single data center on 2 racks, and the 
  neighboring nodes on the ring are from alternative racks.  Each node has 
  1GB row cache, with key cache disabled.   The cluster uses 
  PropertyFileSnitch, and the ColumnFamily I fetch from uses 
  NetworkTopologyStrategy, with replication factor of 2.  My client code uses 
  Hector to fetch a fixed set of rows from cassandra
 
  What I don't quite understand is even after I ran the client code several 
  times, there are always some nodes with 0 row cache hits, despite that the 
  row cache from all nodes are filled and all nodes receive requests.
 
  Which nodes have 0 hits seem to be strongly related to the following:
 
   - the set of row keys to fetch
   - the order of the set of row keys to fetch
   - the list of hosts passed to Hector's CassandraHostConfigurator
   - the order of the list of hosts passed to Hector
 
  Can someone shed some lights on how exactly the row cache works and 
  hopefully also explain the behavior I have been seeing?  I thought if the 
  fixed set of the rows keys are the only thing I am fetching (each row 
  should be on the order of 10's of MBs, no more than 100MB), and each node 
  gets requests, and its row cache is filled, there's gotta be some hits.  
  Apparent this is not the case.   Thanks.
 
  cluster information:
 
  Address DC  RackStatus State   Load
  Effective-Ownership Token
  
 141784319550391026443072753096570088105
  x.x.x.1DC1 r1  Up Normal  587.46 GB   33.33%
0
  x.x.x.2DC1 r2  Up Normal  591.21 GB   33.33

Re: strange row cache behavior

2012-12-04 Thread Yiming Sun
Got it.  Thanks again, Aaron.

-- Y.


On Tue, Dec 4, 2012 at 3:07 PM, aaron morton aa...@thelastpickle.comwrote:

  Does this mean we should not enable row caches until we are absolutely
 sure about what's hot (I think there is a reason why row caches are
 disabled by default) ?

 Yes and Yes.
 Row cache takes memory and CPU, unless you know you are getting a benefit
 from it leave it off. The key cache and os disk cache will help. If you
 find latency is an issue then start poking around.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 5/12/2012, at 4:23 AM, Yiming Sun yiming@gmail.com wrote:

 Hi Aaron,

 Thank you,and your explanation makes sense.  At the time, I thought having
 1GB of row cache on each node was plenty enough, because there was an
 aggregated 6GB cache, but you are right, with each row in 10's of MBs, some
 of the nodes can go into a constant load and evict cycle and would have
 negative effects on the performance.  I will try as you suggested to 1.)
 reduce the requested entry set, and 2.) increase the row cache size and see
 if they get better hits, and also do 3) by reversing the requested entry
 list in alternate runs.

 Our data space has close to 3 million rows, but we haven't gotten enough
 usage statistics to know what rows are hot.  Does this mean we should not
 enable row caches until we are absolutely sure about what's hot (I think
 there is a reason why row caches are disabled by default) ?  It also seems
 from my test that OS page cache works much better, but it could be that OS
 page cache can utilize all the available memory so it is essentially larger
 -- I guess I will find out by doing 2.) above.

 best,

 -- Y.



 On Tue, Dec 4, 2012 at 4:47 AM, aaron morton aa...@thelastpickle.comwrote:

  Row Cache: size 1072651974 (bytes), capacity 1073741824
 (bytes), 0 hits, 2576 requests, NaN recent hit rate, 0 save period in
 seconds

 So the cache is pretty much full, there is only 1 MB free.

 There were 2,576 read requests that tried to get a row from the cache.
 Zero of those had a hit. If you have 6 nodes and RF 2, each node has  one
 third of the data in the cluster (from the effective ownership info). So
 depending on the read workload the number of read requests on each node may
 be different.

 What I think is happening is reads are populating the row cache, then
 subsequent reads are evicting items from the row cache before you get back
 to reading the original rows. So if you read rows 1 to 5, they are put in
 the cache, when you read rows 6 to 10 they are put in and evict rows 1 to
 5. Then you read rows 1 to 5 again they are not in the cache.

 Try testing with a lower number of hot rows, and/or a bigger row cache.

 But to be honest, with rows in the 10's of MB you will probably only get
 good cache performance with a small set of hot rows.

 Hope that helps.



 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 1/12/2012, at 5:11 AM, Yiming Sun yiming@gmail.com wrote:

  Does anyone have any comments/suggestions for me regarding this?  Thanks
 
 
  I am trying to understand some strange behavior of cassandra row cache.
  We have a 6-node Cassandra cluster in a single data center on 2 racks, and
 the neighboring nodes on the ring are from alternative racks.  Each node
 has 1GB row cache, with key cache disabled.   The cluster uses
 PropertyFileSnitch, and the ColumnFamily I fetch from uses
 NetworkTopologyStrategy, with replication factor of 2.  My client code uses
 Hector to fetch a fixed set of rows from cassandra
 
  What I don't quite understand is even after I ran the client code
 several times, there are always some nodes with 0 row cache hits, despite
 that the row cache from all nodes are filled and all nodes receive requests.
 
  Which nodes have 0 hits seem to be strongly related to the following:
 
   - the set of row keys to fetch
   - the order of the set of row keys to fetch
   - the list of hosts passed to Hector's CassandraHostConfigurator
   - the order of the list of hosts passed to Hector
 
  Can someone shed some lights on how exactly the row cache works and
 hopefully also explain the behavior I have been seeing?  I thought if the
 fixed set of the rows keys are the only thing I am fetching (each row
 should be on the order of 10's of MBs, no more than 100MB), and each node
 gets requests, and its row cache is filled, there's gotta be some hits.
  Apparent this is not the case.   Thanks.
 
  cluster information:
 
  Address DC  RackStatus State   Load
  Effective-Ownership Token
 
141784319550391026443072753096570088105
  x.x.x.1DC1 r1  Up Normal  587.46 GB
 33.33%  0
  x.x.x.2DC1 r2  Up Normal  591.21 GB
 33.33

Re: strange row cache behavior

2012-11-30 Thread Yiming Sun
Does anyone have any comments/suggestions for me regarding this?  Thanks


I am trying to understand some strange behavior of cassandra row cache.  We
 have a 6-node Cassandra cluster in a single data center on 2 racks, and the
 neighboring nodes on the ring are from alternative racks.  Each node has
 1GB row cache, with key cache disabled.   The cluster uses
 PropertyFileSnitch, and the ColumnFamily I fetch from uses
 NetworkTopologyStrategy, with replication factor of 2.  My client code uses
 Hector to fetch a fixed set of rows from cassandra

 What I don't quite understand is even after I ran the client code several
 times, there are always some nodes with 0 row cache hits, despite that the
 row cache from all nodes are filled and all nodes receive requests.

 Which nodes have 0 hits seem to be strongly related to the following:

  - the set of row keys to fetch
  - the order of the set of row keys to fetch
  - the list of hosts passed to Hector's CassandraHostConfigurator
  - the order of the list of hosts passed to Hector

 Can someone shed some lights on how exactly the row cache works and
 hopefully also explain the behavior I have been seeing?  I thought if the
 fixed set of the rows keys are the only thing I am fetching (each row
 should be on the order of 10's of MBs, no more than 100MB), and each node
 gets requests, and its row cache is filled, there's gotta be some hits.
  Apparent this is not the case.   Thanks.

 cluster information:

 Address DC  RackStatus State   Load
 Effective-Ownership Token

 141784319550391026443072753096570088105
 x.x.x.1DC1 r1  Up Normal  587.46 GB
 33.33%  0
 x.x.x.2DC1 r2  Up Normal  591.21 GB
 33.33%  28356863910078205288614550619314017621
 x.x.x.3DC1 r1  Up Normal  594.97 GB
 33.33%  56713727820156410577229101238628035242
 x.x.x.4DC1 r2  Up Normal  587.15 GB
 33.33%  85070591730234615865843651857942052863
 x.x.x.5DC1 r1  Up Normal  590.26 GB
 33.33%  113427455640312821154458202477256070484
 x.x.x.6DC1 r2  Up Normal  583.21 GB
 33.33%  141784319550391026443072753096570088105


 [user@node]$ ./checkinfo.sh
 *** x.x.x.4
 Token: 85070591730234615865843651857942052863
 Gossip active: true
 Thrift active: true
 Load : 587.15 GB
 Generation No: 1354074048
 Uptime (seconds) : 36957
 Heap Memory (MB) : 2027.29 / 3948.00
 Data Center  : DC1
 Rack : r2
 Exceptions   : 0

 Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
 NaN recent hit rate, 14400 save period in seconds
 Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes), 0
 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds

 *** x.x.x.6
 Token: 141784319550391026443072753096570088105
 Gossip active: true
 Thrift active: true
 Load : 583.21 GB
 Generation No: 1354074461
 Uptime (seconds) : 36535
 Heap Memory (MB) : 828.71 / 3948.00
 Data Center  : DC1
 Rack : r2
 Exceptions   : 0

 Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
 NaN recent hit rate, 14400 save period in seconds
 Row Cache: size 1072602906 (bytes), capacity 1073741824 (bytes), 0
 hits, 3194 requests, NaN recent hit rate, 0 save period in seconds



Re: need some help with row cache

2012-11-28 Thread Bryan Talbot
The row cache itself is global and the size is set with
row_cache_size_in_mb.  It must be enabled per CF using the proper
settings.  CQL3 isn't complete yet in C* 1.1 so if the cache settings
aren't shown there, then you'll probably need to use cassandra-cli.

-Bryan


On Tue, Nov 27, 2012 at 10:41 PM, Wz1975 wz1...@yahoo.com wrote:
 Use cassandracli.


 Thanks.
 -Wei

 Sent from my Samsung smartphone on ATT


  Original message 
 Subject: Re: need some help with row cache
 From: Yiming Sun yiming@gmail.com
 To: user@cassandra.apache.org
 CC:


 Also, what command can I used to see the caching setting?  DESC TABLE
 cf doesn't list caching at all.  Thanks.

 -- Y.


 On Wed, Nov 28, 2012 at 12:15 AM, Yiming Sun yiming@gmail.com wrote:

 Hi Bryan,

 Thank you very much for this information.  So in other words, the settings
 such as row_cache_size_in_mb in YAML alone are not enough, and I must also
 specify the caching attribute on a per column family basis?

 -- Y.


 On Tue, Nov 27, 2012 at 11:57 PM, Bryan Talbot btal...@aeriagames.com
 wrote:

 On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com wrote:
  Hello,
 
  but it is not clear to me where this setting belongs to, because even
  in the
  v1.1.6 conf/cassandra.yaml,  there is no such property, and apparently
  adding this property to the yaml causes a fatal configuration error
  upon
  server startup,
 

 It's a per column family setting that can be applied using the CLI or
 CQL.

 With CQL3 it would be

 ALTER TABLE cf WITH caching = 'rows_only';

 to enable the row cache but no key cache for that CF.

 -Bryan





Re: need some help with row cache

2012-11-28 Thread Yiming Sun
Thanks guys.  However, after I ran the client code several times (same set
of 5000 entries),  still 2 of the 6 nodes show 0 hits on row cache, despite
each node has 1GB capacity for row cache and the caches are full.   Since I
always request the same entries over and over again, shouldn't there be
some hits?


[user@node]$ ./checkinfo.sh
Token: 85070591730234615865843651857942052863
Gossip active: true
Thrift active: true
Load : 587.15 GB
Generation No: 1354074048
Uptime (seconds) : 36957
Heap Memory (MB) : 2027.29 / 3948.00
Data Center  : DC1
Rack : r2
Exceptions   : 0
Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
NaN recent hit rate, 14400 save period in seconds
Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes), 0
hits, 2576 requests, NaN recent hit rate, 0 save period in seconds

Token: 141784319550391026443072753096570088105
Gossip active: true
Thrift active: true
Load : 583.21 GB
Generation No: 1354074461
Uptime (seconds) : 36535
Heap Memory (MB) : 828.71 / 3948.00
Data Center  : DC1
Rack : r2
Exceptions   : 0
Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
NaN recent hit rate, 14400 save period in seconds
Row Cache: size 1072602906 (bytes), capacity 1073741824 (bytes), 0
hits, 3194 requests, NaN recent hit rate, 0 save period in seconds


On Wed, Nov 28, 2012 at 4:26 AM, Bryan Talbot btal...@aeriagames.comwrote:

 The row cache itself is global and the size is set with
 row_cache_size_in_mb.  It must be enabled per CF using the proper
 settings.  CQL3 isn't complete yet in C* 1.1 so if the cache settings
 aren't shown there, then you'll probably need to use cassandra-cli.

 -Bryan


 On Tue, Nov 27, 2012 at 10:41 PM, Wz1975 wz1...@yahoo.com wrote:
  Use cassandracli.
 
 
  Thanks.
  -Wei
 
  Sent from my Samsung smartphone on ATT
 
 
   Original message 
  Subject: Re: need some help with row cache
  From: Yiming Sun yiming@gmail.com
  To: user@cassandra.apache.org
  CC:
 
 
  Also, what command can I used to see the caching setting?  DESC TABLE
  cf doesn't list caching at all.  Thanks.
 
  -- Y.
 
 
  On Wed, Nov 28, 2012 at 12:15 AM, Yiming Sun yiming@gmail.com
 wrote:
 
  Hi Bryan,
 
  Thank you very much for this information.  So in other words, the
 settings
  such as row_cache_size_in_mb in YAML alone are not enough, and I must
 also
  specify the caching attribute on a per column family basis?
 
  -- Y.
 
 
  On Tue, Nov 27, 2012 at 11:57 PM, Bryan Talbot btal...@aeriagames.com
  wrote:
 
  On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com
 wrote:
   Hello,
  
   but it is not clear to me where this setting belongs to, because even
   in the
   v1.1.6 conf/cassandra.yaml,  there is no such property, and
 apparently
   adding this property to the yaml causes a fatal configuration error
   upon
   server startup,
  
 
  It's a per column family setting that can be applied using the CLI or
  CQL.
 
  With CQL3 it would be
 
  ALTER TABLE cf WITH caching = 'rows_only';
 
  to enable the row cache but no key cache for that CF.
 
  -Bryan
 
 
 



Re: need some help with row cache

2012-11-28 Thread Yiming Sun
Does replica placement play a role in row cache hits?

I happen to notice that the 3 nodes on rack 2 are the ones with no recent
hit rates, even when I specify only one node from rack2 as the host to
Hector.

The cluster uses PropertyFileSnitch, and the nodes are alternating between
rac1 and rac2 in a single Data Center clockwise on the ring.  This
particular column family uses NetworkTopologyStrategy, with replication
factor of 2.   So the idea is it can place the replica on the next node in
the ring without having to walk all the around.   But it seems cache hits
tend to only happen on rack 1?


Address DC  RackStatus State   Load
Effective-Ownership Token

141784319550391026443072753096570088105
x.x.x.1DC1 r1  Up Normal  587.46 GB
33.33%  0
x.x.x.2DC1 r2  Up Normal  591.21 GB
33.33%  28356863910078205288614550619314017621
x.x.x.3DC1 r1  Up Normal  594.97 GB
33.33%  56713727820156410577229101238628035242
x.x.x.4DC1 r2  Up Normal  587.15 GB
33.33%  85070591730234615865843651857942052863
x.x.x.5DC1 r1  Up Normal  590.26 GB
33.33%  113427455640312821154458202477256070484
x.x.x.6DC1 r2  Up Normal  583.21 GB
33.33%  141784319550391026443072753096570088105


On Wed, Nov 28, 2012 at 9:09 AM, Yiming Sun yiming@gmail.com wrote:

 Thanks guys.  However, after I ran the client code several times (same set
 of 5000 entries),  still 2 of the 6 nodes show 0 hits on row cache, despite
 each node has 1GB capacity for row cache and the caches are full.   Since I
 always request the same entries over and over again, shouldn't there be
 some hits?


 [user@node]$ ./checkinfo.sh
 Token: 85070591730234615865843651857942052863
 Gossip active: true
 Thrift active: true
 Load : 587.15 GB
 Generation No: 1354074048
 Uptime (seconds) : 36957
 Heap Memory (MB) : 2027.29 / 3948.00
 Data Center  : DC1
 Rack : r2
 Exceptions   : 0

 Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
 NaN recent hit rate, 14400 save period in seconds
 Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes), 0
 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds

 Token: 141784319550391026443072753096570088105
 Gossip active: true
 Thrift active: true
 Load : 583.21 GB
 Generation No: 1354074461
 Uptime (seconds) : 36535
 Heap Memory (MB) : 828.71 / 3948.00
 Data Center  : DC1
 Rack : r2
 Exceptions   : 0

 Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
 NaN recent hit rate, 14400 save period in seconds
 Row Cache: size 1072602906 (bytes), capacity 1073741824 (bytes), 0
 hits, 3194 requests, NaN recent hit rate, 0 save period in seconds


 On Wed, Nov 28, 2012 at 4:26 AM, Bryan Talbot btal...@aeriagames.comwrote:

 The row cache itself is global and the size is set with
 row_cache_size_in_mb.  It must be enabled per CF using the proper
 settings.  CQL3 isn't complete yet in C* 1.1 so if the cache settings
 aren't shown there, then you'll probably need to use cassandra-cli.

 -Bryan


 On Tue, Nov 27, 2012 at 10:41 PM, Wz1975 wz1...@yahoo.com wrote:
  Use cassandracli.
 
 
  Thanks.
  -Wei
 
  Sent from my Samsung smartphone on ATT
 
 
   Original message 
  Subject: Re: need some help with row cache
  From: Yiming Sun yiming@gmail.com
  To: user@cassandra.apache.org
  CC:
 
 
  Also, what command can I used to see the caching setting?  DESC TABLE
  cf doesn't list caching at all.  Thanks.
 
  -- Y.
 
 
  On Wed, Nov 28, 2012 at 12:15 AM, Yiming Sun yiming@gmail.com
 wrote:
 
  Hi Bryan,
 
  Thank you very much for this information.  So in other words, the
 settings
  such as row_cache_size_in_mb in YAML alone are not enough, and I must
 also
  specify the caching attribute on a per column family basis?
 
  -- Y.
 
 
  On Tue, Nov 27, 2012 at 11:57 PM, Bryan Talbot btal...@aeriagames.com
 
  wrote:
 
  On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com
 wrote:
   Hello,
  
   but it is not clear to me where this setting belongs to, because
 even
   in the
   v1.1.6 conf/cassandra.yaml,  there is no such property, and
 apparently
   adding this property to the yaml causes a fatal configuration error
   upon
   server startup,
  
 
  It's a per column family setting that can be applied using the CLI or
  CQL.
 
  With CQL3 it would be
 
  ALTER TABLE cf WITH caching = 'rows_only';
 
  to enable the row cache but no key cache for that CF.
 
  -Bryan
 
 
 





Re: need some help with row cache

2012-11-27 Thread Bryan Talbot
On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com wrote:
 Hello,

 but it is not clear to me where this setting belongs to, because even in the
 v1.1.6 conf/cassandra.yaml,  there is no such property, and apparently
 adding this property to the yaml causes a fatal configuration error upon
 server startup,


It's a per column family setting that can be applied using the CLI or CQL.

With CQL3 it would be

ALTER TABLE cf WITH caching = 'rows_only';

to enable the row cache but no key cache for that CF.

-Bryan


  1   2   >