[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-09-19 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140476#comment-14140476
 ] 

Marcus Eriksson commented on CASSANDRA-5357:


[~rcoli] yes, we still invalidate on writes

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1 beta1

 Attachments: 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-09-17 Thread Robert Coli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138291#comment-14138291
 ] 

Robert Coli commented on CASSANDRA-5357:


[~krummas] : To be 100% clear, this keeps the invalidate on write behavior 
from the previous Row Cache? 

(Your previous answer to [~d-_-b] implies so, but people reading this ticket 
might still be unclear.)

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1 beta1

 Attachments: 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-02-21 Thread Frederick Haebin Na (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908141#comment-13908141
 ] 

Frederick Haebin Na commented on CASSANDRA-5357:


I have a question. Is this row cache (or partition cache) write through or not?
If a column is added, would it be added to the cache?

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1 beta1

 Attachments: 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-02-21 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908142#comment-13908142
 ] 

Marcus Eriksson commented on CASSANDRA-5357:


[~haebin] no, not yet

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1 beta1

 Attachments: 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-02-05 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892159#comment-13892159
 ] 

Sylvain Lebresne commented on CASSANDRA-5357:
-

I've pushed an additional commit on top of the branch above at 
https://github.com/pcmanus/cassandra/commits/5357-2 that adds a bunch of 
comments (there is a few subtlety going on here :)), a few minor nits and:
* refactor getThroughCache() a bit so we can maximize the cases where we can 
use a cached partition (typically, if a slice query bounds are fully included 
in cache cf, we know we're good).
* improve a bit the handling of expiring columns: when comparing the number of 
rows in a cached CF to rowsPerPartitionToCache, we should indeed not expire 
columns as this could lead to think we're caching the whole partition when we 
don't, but when comparing checking if the cached CF has enough rows for the 
query filter, we must expire with the query TTL or we might end up return less 
rows than we should (that last part wasn't done by the previous patch).
* make sure we test for !reversed in isHeadFilter()
* I've reverted to CFS.getRawCachedRow and instead have move the decision of 
whether the cached row can be used to RowIteratorFactory. It didn't felt 
particularly logical to do that in getRawCachedRow given nothing in the name of 
the method suggests it, and it allows more easily to maximize the usage of the 
cache for range queries.
* doesn't increment metric.rowCacheHit when the cached value can't be used due 
to the filter, only increment metric.rowCacheHitOutOfRange, as that feels more 
natural to me.

Provided that additional commit looks reasonable, I believe I'm good with this.

bq. I'll to do the row cache - partition cache renaming in a separate ticket

Right. Though now that I think about it, row cache is not all that horrible, it 
does cache rows, it just cache only some number per partition :). Maybe it's 
not worth the pain of a rename. Anyway, I'm good leaving that discussion for 
another time/ticket.

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1

 Attachments: 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-29 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885215#comment-13885215
 ] 

Marcus Eriksson commented on CASSANDRA-5357:


Pushed a fixed version to 
https://github.com/krummas/cassandra/commits/marcuse/5357 with thrift renaming 
and updates after the review in separate commits. I guess the 
replicate_on_write stuff in the thrift-generated code is due to 714c423.

I'll to do the row cache - partition cache renaming in a separate ticket.


 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1

 Attachments: 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-28 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884030#comment-13884030
 ] 

Sylvain Lebresne commented on CASSANDRA-5357:
-

A bunch of rermaks on the attached patch:
* I don't think we can count CQL rows as the patch does. Namely, we can't rely 
on 'the number of cells' * 'the number of declared CQL columns' as being the 
number of CQL rows, since many rows may not have all columns set. Typically, 
the cacheFilter in getThroughCache might finish in the middle of a CQL row 
(since it blindly asks for cells, not CQL rows) which would be bad. Also, for 
the same reason, cacheFilter might fetch less CQL rows than requested by the 
user when we use it to query data. Lastly, 'the number of declared CQL columns' 
can change at runtime, which would broke the cache (for that last one, we could 
invalidate the cache when new columns are added/removed, but it's better if we 
don't have to).
So anyway, we really should always count CQL rows, which implies using 
SliceQueryFilter.lastCounted() and ColumnCounter. This does mean that unless we 
actually store the row count with each cached CF object, we might have to 
re-count every time we have a cache hit, but this is probably ok for now (we 
can optimize later).
* In CFS.getThroughCache, on a cache hit, I believe the condition to return the 
cached value should be something like:
{noformat}
if (wholePartitionCached || (filter.isHeadFilter()  filter.filter.getCount() 
* (metadata.regularColumns().size() + 1)  = cachedCf.getColumnCount()))
{noformat}
i.e. unless the whole partition is cached we should not return the cached value 
for anything other than a head filter.
* In CFS.getThroughCache, on a cache miss and in the case where 
{{filter.filter.getCount()  
metadata.getRowsPerPartitionToCache().rowsToCache}}, we shouldn't cache the 
result if the filter 'finish' is not empty and the query has yield less rows 
than the cache capacity.
* From a naming point of view, I'll note that rows_per_partition_to_cache is 
probably confusing on the thrift side.  Maybe for thrift it's worth naming it 
something like columns_per_rows_to_cache.  It won't be entirely correct 
technically because internally we'll store a number of CQL rows per partitions, 
but in most cases this will be the same thing anyway so it's maybe fine.
* For SliceQueryFilter.isHeadFilter, we want to exclude queries with multiple 
ColumnSlice for now, and while the implementation of the patch does that, it 
looks more like a side effect than the actual intent. I think something like
{noformat}
public boolean isHeadFilter()
{
return slices.length == 1  slice[0].start.isEmpty();
}
{noformat}
would make the intent more clear.
* In RowIteratorFactory, we could keep using the cache if we cache ALL. Or even 
really if the cache is big enough to satisfy the filter, like we do in 
getThroughCache.
* In ColumnFamilyMetrics, any reason why only rowCacheHitOutOfRange is removed 
on release()?
* Nit: seems like shouldCache() in NamesQueryFilter is not used.

As a side note, maybe this is a good time to rename the 'row cache' into 
'partition cache' (I'm thinking options in cassandra.yaml, metrics names, ...)? 
Could be done in a separate issue though.



 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1

 Attachments: 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871885#comment-13871885
 ] 

Sylvain Lebresne commented on CASSANDRA-5357:
-

bq. The case you describe of wanting to cache a full table is not dependent 
on rows per partition but on cache size = number of partitions cached

But if you don't want to cache a full table, you still at least need to make 
sure that for each partition, all rows are cached. You still need rows per 
partition = n where n  max number of rows per partition in that table and 
all I'm saying is that rows per partition = all is a bit more user friendly.  
It's true you also need to make sure you cache is big enough if you want to 
cache the table in full but that doesn't invalidate the first part (unless I'm 
missing something).

bq. We're talking about static CFs aka partition key == primary key, right? 
Then there is one row per partition, so there is no need for a special rows 
per partition = all setting.

I guess I'm saying 2 things:
# I think that what user sometimes really want is cache full partitions. 
That's the basic intention.  So what's the harm of adding a all alias that 
express that intention better for user friendliness sake, provided adding that 
don't require noticeable complexity? And given all can just be an alias for 
Integer.MAX_VALUE, it doesn't add complexity so ...
# It's somewhat a detail, but I don't think that technically rows per 
partition = 1 will work equivalently to the current row cache behavior for 
static table in practice, not always at least.  More precisely, suppose you get 
a query select * from foo where pk=3, that pk=3 is a cache hit and that 
rows_per_partition=1 on that table. Then, you can only serve the read from 
the cache hit if you know *for sure* that this is a static table, i.e. that 
there cannot be more rows in that partition that haven't been cache due to the 
per-partition limitation.  And, at least for thrift, we never really know for 
sure if a table is a static one.  I do note that rows_per_partition=2 would 
work, because if your cache hit has 1 row and you know you cache the 2 first 
rows of the partition, then you can infer all rows of the partition are cached 
without any more info, but at that point, I think it's a lot simpler to have a 
all alias than to have to explain those implementation details.

Not saying it's a big deal, just that I think it's user friendly and has not 
real downside that I can see.


 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1

 Attachments: 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870521#comment-13870521
 ] 

Sylvain Lebresne commented on CASSANDRA-5357:
-

bq. If you have a single row per partition, how much of the table you cache is 
purely a function of cache size.

If that was related to my remark above, I don't think I understood that 
sentence, sorry.

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1

 Attachments: 0001-Cache-a-configurable-amount-of-columns-v2.patch, 
 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870962#comment-13870962
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

We're talking about static CFs aka partition key == primary key, right?

Then there is one row per partition, so there is no need for a special rows 
per partition = all setting.  The case you describe of wanting to cache a 
full table is not dependent on rows per partition but on cache size = number 
of partitions cached.

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1

 Attachments: 0001-Cache-a-configurable-amount-of-columns-v2.patch, 
 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-13 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869423#comment-13869423
 ] 

Sylvain Lebresne commented on CASSANDRA-5357:
-

bq. If the newly cached data does not include all cells requested by user, we 
do another read. We cannot know if the requested cells will be included in the 
first N cells.

Haven't checked the patch, but what I would imagine we'd want/have to do here 
is distinguish between 2 types of queries:
# the head-of-partition type, so (non-reversed) slices where the start bound 
is empty (and that have just one slice). For those, we'd have 2 sub-category:
  ## those whose limit is = N (where N is the number of cached 
rows-per-partition). For that, we can safely answer the query from a cache hit, 
and on a cache miss we can query the first N rows, cache them and return.
  ## those whose limit is  N. In that case, we can check the cache and see if 
what's cached covers the whole query (i.e. despite the bigger limit the slice 
end is before the last cached entry). But if it doesn't, we'd have to do a read 
(we can start that read where the cache ends though) but we wouldn't cache that 
2nd read results since it doesn't fall into the first N rows.
# the other types of queries. Those can't serve be served from cache in 
general. That being said, we can still check the cache and see if by any chance 
we can guarantee it's enough, i.e.  if the last possibly queried item sort 
before the last cached entry. But if it's a miss or if we can't guarantee the 
query is fully covered by the cache, I think we should just ignore caching and 
just read-and-return the user query without trying to cache anything.  On a 
cache miss in particular, I'm not really convinced that it's worth reading the 
first N rows of the partition when we have a very good chance it won't cover 
our query anyway. Of course, on the longer run, maybe we can add heuristic for 
querying the first N rows is almost sure to cover that query (for instance, 
if the mean number of cells-per-partition for the table is  N), but I'd rather 
left that to later. Overall, I don't think caching should mean we may have to 
do 2 reads to answer queries, that feels wrong (and makes it easy to bit people 
in a way they don't expect).

I'll finish by saying that for the sake of shipping sooner than later, I'd be 
absolutely fine with a simpler first version that would only ever consider the 
cache for head-of-partition with limit  N type of queries and ignore it 
completely for all other cases. After which we can incrementally cover more 
cases in follow up patches.

In other words, if we only cache the N first rows per partition, it's perfectly 
ok imo to say that cache is only use when you query the first M  N rows of a 
partition initially.  

Btw, it does would make me really happy to preserve the current cache behavior 
being a rows_per_partition: all option (I doubt it'll be much code). I still 
think that for static CFs it's the right option and I'm sure a few users have 
built legitimate use of our existing cache-everything cache that won't be 
easily covered without that. 


 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1

 Attachments: 0001-Cache-a-configurable-amount-of-columns-v2.patch, 
 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-13 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869462#comment-13869462
 ] 

Marcus Eriksson commented on CASSANDRA-5357:


OK, ill refactor a bit then, stand by for new patch

My use case was the give me the last hour of data-slices for time series data 
etc, not limit queries

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1

 Attachments: 0001-Cache-a-configurable-amount-of-columns-v2.patch, 
 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869539#comment-13869539
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

bq.  I still think that for static CFs [all rows in a partition is] the right 
option

Isn't all rows just one row for a static CF, or did you mean something else?

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1

 Attachments: 0001-Cache-a-configurable-amount-of-columns-v2.patch, 
 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-13 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869575#comment-13869575
 ] 

Sylvain Lebresne commented on CASSANDRA-5357:
-

bq. Isn't all rows just one row for a static CF, or did you mean something 
else?

I guess I'm talking about user convenience. I'm fine making that all option 
just syntactic sugar for Integer.MAX_VALUE. However,  1 row wouldn't work for 
all queries (you'd need to cache at least 2 rows, so that if you get a get me 
all the partition query, you can say well, I'm caching the first 2 rows and I 
have only 1, so I have the full partition in cache). But wanting to cache a 
full table (being it static or not btw) is not entirely uncommon so it just 
feels cleaner to have a 'all' option, rather than requiring user to pick a 
random big number (it does mean that rows_per_partition can't be just an 
integer, but since we plan on having an 'auto' mode in the future anyway...). 
This also give us a good value to default to when upgrading from 2.0 and row 
cache was enabled on the table.

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1

 Attachments: 0001-Cache-a-configurable-amount-of-columns-v2.patch, 
 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869667#comment-13869667
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

If you have a single row per partition, how much of the table you cache is 
purely a function of cache size.

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1

 Attachments: 0001-Cache-a-configurable-amount-of-columns-v2.patch, 
 0001-Cache-a-configurable-amount-of-columns.patch


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-08 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865208#comment-13865208
 ] 

Sylvain Lebresne commented on CASSANDRA-5357:
-

bq. Personally I'd lean towards (2)

Personally, I'd lean towards both :). That is, I'd add a new 
rows_per_partition_to_cache table option that would either be a user set fixed 
value, or some auto (the default presumably) that we would determine 
automatically. Of course, in the interest of shipping sooner, the auto option 
could be added later on. But while I'm all for having smart automatic default 
options that most user don't have to ever change, it seems to me that for 
something as important as caching, there will always be cases where the user 
will know better than whatever heuristic we come up.

bq. I think this also means we should go back to a separate cache per CF with 
its own size limit – if we have 1000 queries/s against CF X's cache, then we 
shouldn't throw those away when a query against CF Y comes in where we expect 
only 10/s

It seems to me that this reasoning apply equally well to the current row cache. 
Is there something specific to this ticket that makes you say that, or is it 
just saying making the caches global was possibly a mistake we'd want to 
reconsider?  For what is worth, when we made the caches global, that kind of 
objection was raised and the answer had been that you could disable caching for 
CF Y to avoid that and that if that was not enough we'd add optional per-CF 
quota on top of the global one later on. Overall, I do think we really should 
maintain a global limit on how much is cached, though I don't disagree that 
some finer per-CF quotas could be desirable.

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865383#comment-13865383
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

bq. It seems to me that this reasoning apply equally well to the current row 
cache.

It would, but the current row cache is dangerous enough that I don't want to 
spend effort making it smarter. :)

bq. I do think we really should maintain a global limit on how much is cached

Agreed 100%; I'm just saying we should apportion that to the CFs based on our 
metrics.

In any case, I'm getting ahead of what we should focus on for 2.1.0 here.

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-08 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865409#comment-13865409
 ] 

Sylvain Lebresne commented on CASSANDRA-5357:
-

bq. In any case, I'm getting ahead of what we should focus on for 2.1.0 here.

Absolutely :)

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-08 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865495#comment-13865495
 ] 

Marcus Eriksson commented on CASSANDRA-5357:


So, trying to distill what we actually want here;

* Make it possible to cache parts of the partitions (CASSANDRA-1956) (head, 
tail and all perhaps, for users who use it today)
* Don't evict entire partitions on writes (CASSANDRA-2864)
* Add heuristics to guess the number of columns to be cached (maybe push to 
later versions, configurable sizes could be fine for now)

anything else? [~jbellis] [~slebresne]

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

2014-01-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865515#comment-13865515
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

Yup, good summary.  I'm getting Time To Ship Religion now so for our purposes 
here I vote for head cache only and sylvain's rows_per_partition_to_cache table 
option.  Let's reopen 2864 and fancy heuristics as a follow up.

 Query cache / partition head cache
 --

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache

2014-01-07 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864543#comment-13864543
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

bq. I think we could also do some intelligent sizing of the cache per-CF with 
the metrics we keep, that would be relatively static (so impervious to churn).

I'm not sure what I was thinking here.  (Maybe that we'd only need one cached 
partition per CF which is nonsense.)  We do need LRU or similar behavior at a 
high level, just like we do with the row cache today.

The question is, how much of each partition do we cache?  I think it's a lot 
simpler if we decide we'll cache the same amount for each partition in a CF, 
and not try to be clever and extend a cached partition when we query for more 
later.

So how much do we cache?  We can either

# Make the user configure it, which requires creating new CQL syntax, or
# Determine it automatically

Personally I'd lean towards (2):
# Track an EstimatedHistogram of LIMITs in qualifying queries
# Set the cells-to-cache per CF so that we maximize the queries we can satisfy 
for a given cache size
# I think this also means we should go back to a separate cache per CF with its 
own size limit -- if we have 1000 queries/s against CF X's cache, then we 
shouldn't throw those away when a query against CF Y comes in where we expect 
only 10/s

In the interest of shipping sooner than later though I'll take whatever we can 
reasonably do for 2.1.0 and push the rest out to improve later.  If we just 
have a single cache this many cells parameter in cassandra.yaml that's still 
better than people OOMing themselves with the classic row cache.

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
 Fix For: 2.1


 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-12-12 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846150#comment-13846150
 ] 

Sylvain Lebresne commented on CASSANDRA-5357:
-

bq. That's a very interesting idea, and a good fit with existing best practices

Isn't that pretty much exactly the initial idea for CASSANDRA-1956 (except 
maybe that the filter would be hard-coded to the head of the row) to which 
you argued that a query cache was more generic and was handling 2ndary indexes 
in particular (note that I'm against the idea, it had my preference initially 
if only for simplicity sake, I'm just trying to make sure I understand the 
though process on this)?

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-12-12 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846344#comment-13846344
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

Yeah, but we've already given up on 2i since that turns out to be a mess. :)

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-12-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845899#comment-13845899
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

bq. only cache the head of the row and it's bounds

That's a very interesting idea, and a good fit with existing best practices 
(slicing from the start of the partition is fastest).

I think we could also do some intelligent sizing of the cache per-CF with the 
metrics we keep, that would be relatively static (so impervious to churn).  
With the query cache the best we can do is LRU.

WDYT [~pmcfadin] [~tupshin]?

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-11-26 Thread Rick Branson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832901#comment-13832901
 ] 

Rick Branson commented on CASSANDRA-5357:
-

Perhaps an anecdote from a production system might help find a simple, yet 
useful improvement to the row cache. Facebook's TAO distributed storage system 
supports a data model called assocs which are basically just graph edges, and 
nodes assigned to a given assoc ID hold a write-through cache of the state. The 
assoc storage can be roughly considered a more use-case specific CF. For large 
assocs with many thousands of edges, TAO only maintains the tail of the assoc 
in memory, as those tend to be the most interesting portions of data. More of 
the details are discussed in the linked paper[1].

Perhaps instead of a total overhaul, what's really needed to evolve the row 
cache by modifying it to only cache the head of the row and it's bounds. In 
contrast to the complexity of trying to match queries  mutations to a set of 
serialized query filter objects, the cache only needs to maintain one interval 
for each row at most. This would provide a very simple write-through story. 
After reviewing our production wide row use cases, they seem to fall into two 
camps. The first and most read-performance sensitive is vastly skewed towards 
reads on the head of the row (90% of the time) with a fixed limit. The second 
is randomly distributed slice queries which would not seem to provide a very 
good cache hit rate either way.

[1] https://www.usenix.org/conference/atc13/technical-sessions/papers/bronson)

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-10-24 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804333#comment-13804333
 ] 

Sylvain Lebresne commented on CASSANDRA-5357:
-

Having really followed the progress here so I don't know how much relevant that 
is, but while we switch to a query cache, we should probably have a look at the 
ideas from CASSANDRA-2864. Some of them possibly apply here too.

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-09-24 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777088#comment-13777088
 ] 

Vijay commented on CASSANDRA-5357:
--

{quote}
So the cost is quite high vs having live filters
{quote}
Some synthetic test show very low over head on the filter deserialization 
http://pastebin.com/VNREA8fG. 

IMHO... Exist check might not be that bad, since 99% (thats a assumption) of 
the queries will have the same query filters on them. For those queries which 
are discreet and present in the cache (survived the LRU), i think it is fair to 
take a hit than letting it live in JVM. 
Filters may be big in some cases (like named filters, or filters with long 
string names) and even an optimal case of empty strings we still need a minimum 
of 2 BB, count and the data structures in memory. Hence a compact storage 
off-heap might be good.

One other option which we where discussing little earlier, to optimize the 
filters in the cache by trying to find the optimal cache filter entry by 
merging similar and overlapping queries will help the above.

{quote}
I'm not concerned about that so much as, do we keep within our total memory 
budget? 
{quote}
Ahaa got it, so we need an additional parameter for the cache which says how 
much memory is available in the JVM for the cached keys... i will add it to the 
next revision.

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-09-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775953#comment-13775953
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

bq. It is required because we need to know the query which populated the cache

Sure, but why does that imply we need to *serialize* the filters?  I'm saying 
just shove the ColumnFamily payload off-heap but leave the rest live.

That might also simplify the Sentinel business.

bq. If the slice with count as 250 is stored we might not need to store the 
slice with count of 50 with same range, we can also merge overlapping slices 
etc.

Pushing that to a separate ticket is fine.

bq. Can we [handle respecting memory limits] in a separate ticket?

I think that's pretty core functionality; it seems like we should do that here. 
 That said, I'm not sure I understand exactly how the problem happens here.

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-09-23 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775967#comment-13775967
 ] 

Vijay commented on CASSANDRA-5357:
--

{quote}
I'm saying just shove the ColumnFamily payload off-heap but leave the rest 
live.
{quote}
Sure but that can cause more memory pressure in the JVM, IMHO (cost vs benefit) 
its not that bad to deserialize the filters at least in the stress tests i did.

{quote}
I'm not sure I understand exactly how the problem happens here.
{quote}
The problem is when the whole row (lets say multiple MB's) column family is 
cached, instead of de-serializing the whole column family at once we can 
de-serialize it during filter in CFS.filterColumnFamily, hence the QC should 
return a iterator instead of CF... Makes sense?

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-09-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775972#comment-13775972
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

bq. its not that bad to deserialize the filters at least in the stress tests i 
did

I think you may be testing the wrong thing.  Specifically, you have to 
deserialize the filters (and the CF, as a unit!) even on a *miss*.  So the cost 
is quite high vs having live filters.

bq. instead of de-serializing the whole column family at once we can 
de-serialize it during filter in CFS.filterColumnFamily

Okay, I get that.  I'm not concerned about that so much as, do we keep within 
our total memory budget?  If we have a 2GB cache and your query/queries make us 
use 1GB of that on a single CF object, that is painful but acceptable.  But if 
we disregard our budget and collect a 3GB CF, that's unacceptable.



 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-09-22 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774291#comment-13774291
 ] 

Vijay commented on CASSANDRA-5357:
--

Hi Jonathan, I have pushed a version with sentinel (might have made it little 
hackie, but it works) 
https://github.com/Vijay2win/cassandra/commits/query_cache_v2.

{quote}
Serializing the entire QueryCacheValue for each lookup is going to kill 
performance on hot partitions.
{quote}
It is required because we need to know the query which populated the cache, for 
example there can be a named query for Column A, Z which can be followed by a 
slice query from A to Z and we might not respond with the right response since 
B to Y is not in the cache. 

In a separate ticket we can also optimize the above case (and more) cache 
query's stored, if thats ok. Example: If the slice with 250 is stored why to 
also store the slice with 50 in the same range, we can also merge overlapping 
slices etc.

{quote}
if there's room, that's fine, but exceeding the configured memory budget is Bad
{quote}
Can we do that in a separate ticket?, i believe we can achieve this by 
implementing a Iterator which will be similar to SSTableIterator to stream the 
columns than constructing the ColumnFamily at once.

Thanks!

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-09-20 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773111#comment-13773111
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

Getting back to this now that 2.0.1 is done.  Sounds reasonable in general.  
Comments:

# I don't think dropping RowCacheSentinel is valid, unfortunately.  Otherwise 
we still have the same problem as CASSANDRA-3862.  (Write can invalidate the 
row, just before cache adds the pre-write value to it, so stale data will be 
cached indefinitely.)
# Serializing the entire QueryCacheValue for each lookup is going to kill 
performance on hot partitions.  (Since you have to deserialize a large chunk of 
filters to just do the existence check.)  Suggest that serializing just the CF 
data is going to work better.
# {{TODO do something here}} looks important :)
# There could be any number of queries but the data will not be repeated 
within them.  Clever.
# There is a property which user can enable to cache the whole row. Not 
really a fan but I guess existing row cache users will demand it. :-|
# we might be pulling the whole data into memory -- if there's room, that's 
fine, but exceeding the configured memory budget is Bad.  As long as we don't 
do that I'm fine.

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-09-20 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773131#comment-13773131
 ] 

Nate McCall commented on CASSANDRA-5357:


bq. There is a property which user can enable to cache the whole row. Not 
really a fan but I guess existing row cache users will demand it. :-|

Depends on how this is exposed to the APIs. If I want to cache the whole row, 
i'll get my (potentially paged) slice on with 'cache_me=true' or what have you. 
Particularly with the idea on sharing the same data from different queries. 

In general, having to pre-heat caches with queries *would not* be a new thing 
to developers. 

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-08-07 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733151#comment-13733151
 ] 

Vijay commented on CASSANDRA-5357:
--

Hi Jonathan, The idea in the current implementation is as follows:

The QueryCacheQueryFilter,CF is implemented on top of SerializedCache. It 
stores the Map's key as a RowCacheKeyRowKey, CFID (same as earlier RowCache), 
and Map's value is a composite value as QueryCacheValue[Query, ], 
ColumnFamily, 

For every new query enters the system, we get the QueryCacheValue after 
generating RowCacheKey from QueryFilter, to check if the IFilter exist. If it 
does then return CF; else get QueryCacheValue (if QCV exist; else create new), 
add the IFilter to QCV and merge the results with the existing ColumnFamily 
(also in QCV), which will in-turn be serialized.

Advantages: 
1) Queries can overlap, there could be any number of queries but the data will 
not be repeated within them.
2) When we want to invalidate it we would just invalidate the RowKey and all 
the cached QueryCacheValue goes away (avoids another Map for book keeping and 
hence little more memory efficient)
3) there is a property which user can enable to cache the whole row no matter 
what the query is (but currently patch adds overhead of deserializing identity 
filter which can be fixed though).

Of course there are disadvantages: 
1) LRU algorithm is no longer really accurate, When a single query is hot we 
have no way of invalidating the other queries on the same row, since they all 
have the same number of hit rates (which is no worse than what we have 
currently)
2) With multiple types of queries on the same row (which is kind of edge case) 
we might be pulling the whole data into memory (which can be mitigated by 
incrementally loading it or holding a index in the filter and doesn't exist in 
the current patch).

there could be more which i overlooked...

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-08-06 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731581#comment-13731581
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

Can you give a high level summary of your approach?

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-08-05 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729343#comment-13729343
 ] 

Vijay commented on CASSANDRA-5357:
--

Hi Jonathan, 

I pushed a basic version of Query cache to 
https://github.com/Vijay2win/cassandra/commits/query_cache .I am not sure if we 
still need RowCacheSentinel, but the attached removes it. Attached patch also 
has an option query_cache: true (if set to false, the whole row will always be 
cached). It will be nice to have fully off-heap Map/Cache (including the keys) 
but i am thinking to address it with a separate github project/patch (though 
IMHO, CHM may have contention in the segments for a big caches).

Let me know what you think about the patch, it might need some more cleanup.

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-06-14 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13683536#comment-13683536
 ] 

Vijay commented on CASSANDRA-5357:
--

Hi Jonathan, The plan is to make the Entryk,v offheap (with FM in the heap) 
of the map of heap after promoting the query filter. 
I might end up removing the CLHM and implement a Concurrent HM to reduce the 
over head... 

Let me know if it is fine.

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-04-14 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631383#comment-13631383
 ] 

Vijay commented on CASSANDRA-5357:
--

May be we can merge both the idea mentioned by [~tjake] and Option (2) above, 
I can see a configurable way to write the query cache to be written to 
tmpfs/shm/SSD's as files.

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-04-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631459#comment-13631459
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

What purpose would that serve beyond the existing cache saving period?

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-04-14 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631461#comment-13631461
 ] 

Vijay commented on CASSANDRA-5357:
--

None... but the idea is that we will compact/drop the rows (based on the LRU 
algo). 

Kind of the same reason memcache does for its memory to avoid fragmentation.
Good thing is we already have a better way to manage compaction.

At this point i dont know if it is a overkill to complicate.
But if there is enough support i can give it a shot :)

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-04-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631472#comment-13631472
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

I'm definitely in the let's get a query cache working first, then we can start 
thinking about overcomplicating it camp. :)

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-04-12 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630062#comment-13630062
 ] 

T Jake Luciani commented on CASSANDRA-5357:
---

One idea we have been playing with is to create a row cache that writes to SSD. 
 This is similar to how https://github.com/facebook/flashcache/ works.

The idea being you put you cold rows are on spinning disk and your hot rows are 
cached on SSD (uncompressed).  

I think this would work really well and would be a good use for keeping the old 
row cache around.  

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-04-11 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629695#comment-13629695
 ] 

Vijay commented on CASSANDRA-5357:
--

{quote}
I don't follow – how can you have both O(1) [Map key is the row key] and also 
promote QF into the key?
{quote}
Well I am talking about the best case, where the queries are fairly limited on 
the rows.

{quote}
We're talking about K=Map Key, right? How do you see QF increasing by row size?
{quote}
K (Map Key) where K is a list of queries, (RowKey + [start,end,columns[]]). 
NOTE: we will only hash/equals based of the rowKey and not the entier Map Key, 
once we reach the map key we can verify if the particular query exist in the 
key by a linear scan. 

The value being a set of columns which match all the Key's queries

The main reason for complicating the above is to avoid 2 maps (POC code in 
#1956 does that), one to map RowKey, Query and other to hold the actual query 
Query, CF to help invalidate or even update the cache when there is an update.



 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-04-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626735#comment-13626735
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

bq. What i had in mind was to do something like MapRowKey, 
[QueryFilter,ColumnFamily] so invalidation is O(1). To further improve the 
performance on a query (And deserializing the whole [QueryFilter,ColumnFamily] 
we can have all the QueryFilters as a part of the RowKey (Kind of like promoted 
index's).

I don't follow -- how can you have both O(1) [Map key is the row key] and also 
promote QF into the key?

bq. If we do this then the K (For a fat row) becomes big enough to cause more 
heap issue. Hence we could move that to off-heap along with CF.

We're talking about K=Map Key, right?  How do you see QF increasing by row size?



 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-03-25 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612735#comment-13612735
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

I think you lost me where you jumped into implementation strategy.  How about 
starting with, what should we cache, how do we invalidate it, and where do we 
hook into the execution engine to do those things?

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-03-25 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613504#comment-13613504
 ] 

Vijay commented on CASSANDRA-5357:
--

Ooops, I thought i was continuing the discussion from the other ticket... My 
bad.

What i had in mind was to do something like MapRowKey, 
[QueryFilter,ColumnFamily] so invalidation is O(1)
To further improve the performance on a query (And deserializing the whole 
[QueryFilter,ColumnFamily] we can have all the QueryFilters as a part of the 
RowKey (Kind of like promoted index's).

If we do this then the K (For a fat row) becomes big enough to cause more heap 
issue. 
Hence we could move that to off-heap along with CF.

Query to a CF will be constant time (or O(m) where m is the size of queries 
within a row), Space complexity is also constant since we have the key is a off 
heap reference and a hash.

There is one downside in this approche, if a row is hot all the queries will 
stay in memory longer (Unless we reimplement a cache like CLHM)... 
If we think this is big enough problem then we would need an alternative 
approach where we have 2 Maps one for QF, CF and the other for RK, QF 
mapping, which might not be that space efficient. Let me know.

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-03-22 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611325#comment-13611325
 ] 

Vijay commented on CASSANDRA-5357:
--

Hi Jonathan, What do you think about

Option 1) 
A Wild idea (Just throwing it), Off-heap inmemory SSTables (SSTable like 
format) and we manage compaction etc in the same way we deal with the 
filesThe difference is that there will be much more things to cleanup for 
compaction (Like tombstones which expires immediately) :)

Option 2) 
Move Query cache Keys to be a Memory Object and move the key off-heap (Only 
store the hash value as the key), so the deserialization of the keys happen 
only when the hash collide We can start with invalidation and go from there.



 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5357) Query cache

2013-03-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605173#comment-13605173
 ] 

Jonathan Ellis commented on CASSANDRA-5357:
---

I'm 80% convinced that a truly generic query cache is not feasible while also 
maintaining good performance.  Specifically, on the write path we either need 
to update cached entries or invalidate them; for sequential scans that would 
mean storing scan results as an interval of rows, which is probably reasonable, 
but for index scans we'd need to store cached results under the indexed row 
value, and on update to an indexed column we'd need to read-before-write to 
find the current value, to invalidate them.

So I'm okay with saying that we only cache results for single-partition 
queries, which makes cache invalidation on update simple.  It also makes it 
more difficult, but not impossible, to cause yourself memory problems from 
caching huge resultsets.

 Query cache
 ---

 Key: CASSANDRA-5357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Vijay

 I think that most people expect the row cache to act like a query cache, 
 because that's a reasonable model.  Caching the entire partition is, in 
 retrospect, not really reasonable, so it's not surprising that it catches 
 people off guard, especially given the confusion we've inflicted on ourselves 
 as to what a row constitutes.
 I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira