subject:"\[jira\] \[Commented\] \(CASSANDRA\-1956\) Convert row cache to row\+filter cache"

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2013-09-20 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773105#comment-13773105
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

This is basically a clunkier implementation of CASSANDRA-5357, right?  Should 
we close it as duplicate?

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 2.1

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Vijay (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256949#comment-13256949
 ] 

Vijay commented on CASSANDRA-1956:
--

Hi Jonathan, When user requests X1,Y1 we cache the block from 
(Start-X1-Position -Y1-Position) + N with block size being configurable. 
(it should similar to the page cache and if there is a write on the block the 
whole block is marked dirty and next fetch will go to the FS). there is a 
configurable block size when set high enough will cache the whole row (like the 
existing cache). The logic around it is kind of what the patch has

We can also use column indexes (if needed) and caching data within it, but the 
simple may be is to start from the column requested and cache blocks (else 
updating them cache without invalidating the whole row is going to be hard).

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256954#comment-13256954
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

How is that different from the query cache I waved my hands about?


 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Vijay (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256965#comment-13256965
 ] 

Vijay commented on CASSANDRA-1956:
--

:) Quite similar but a different version with less memory foot print, and 
efficient updates.

1) For example if there are 10 columns which are queried the key will have 
those names and in the CF return object;
2) If we have 2 kinds of queries with over laps (slice and column names) then 
we will be caching twice or sometimes more in a pure query cache;
3) If we have an update to a column out of 10 columns we have to search to 
see if they are available in them or invalidate the whole row. in this way we 
can update the block and be done with it.

This also allows us to incrementally deserialize some parts of the row when the 
whole row is cached. *


 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256975#comment-13256975
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

Does this support caching head/tail queries?  Or do X and Y have to be existing 
column values?

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256983#comment-13256983
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

Also, it sounds like this always invalidates on update.  Would it be possible 
to preserve the current row cache behavior?  I.e., update-in-place if a 
non-copying cache implementation.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Vijay (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256992#comment-13256992
 ] 

Vijay commented on CASSANDRA-1956:
--

 Does this support caching head/tail queries? Or do X and Y have to be 
 existing column values?
No X and Y doesn't need to existing, they are just markers in the RowCacheKey 
(for example if the query has x* - y* we will have that in the RCK instead of 
xeon - yum)... It does support head and tail queries.

  it sounds like this always invalidates on update. Would it be possible to 
 preserve the current row cache behavior?
Yeah the prototype does the update on write, but the problem is that when there 
are a lot of updates block size will increase then initially cached, at some 
point we need to split/re-partition it...

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-02-19 Thread Vijay (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211491#comment-13211491
 ] 

Vijay commented on CASSANDRA-1956:
--

Just wanted to make sure to clarify: The proposal is not about head or the tail 
row's cache, but to have a block cache where blocks of the wide rows are cached 
(with cnamex to cnamey)... here head and tail is just pure optimization on 
where to start the block cache from (where to start the block from from the 
tail or from the head).

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-02-13 Thread B. Todd Burruss (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207370#comment-13207370
 ] 

B. Todd Burruss commented on CASSANDRA-1956:


i have very wide rows (140k columns) that i randomly query on, usually about 
100 columns at a time.  wide rows do not work well with the 
SerializingCacheProvider because of the constant copying of data. 
ConcurrentLinkedHashMap performs very well, but eats memory because of all the 
ByteBuffers.

I'm trying to understand if this will help my case.  head and tail caching will 
help folks with time series data, but not me.  possibly the handful of named 
columns caching will help, but there will be overlap in my queries so columns 
will exist in multiple cache entries, balooning the cache.

what i was hoping for was a scheme to segment the wider row into smaller 
segments so not as much copying is performed in the SerializingCacheProvider.


 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-02-10 Thread Sylvain Lebresne (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205503#comment-13205503
 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
-

bq. What about secondary indexes? I guess you'd like to be able to cache hot 
columns in secondary indexes CF

It's a good question. Secondary indexes CF are accessed from beginning to end, 
using paging potentially. Which means it's unclear to me if we can do much 
better than caching the whole secondary index rows. But again, I'm good 
discussing that, but what I'm mainly saying is that we can't choose the best 
solution unless we clearly identify the problems we are trying to solve.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-02-10 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205519#comment-13205519
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

Right now we have a cache that is only useful for accelerating queries against 
rows that fit easily into memory, and even then it is inefficient if you only 
care about part of the row (either name-based or slice-based).

The filter approach allows us to make slice-based queries more efficient 
(somewhat clumsily) but doesn't really address the inefficiency for name-based 
queries.  As Lior points out, it also doesn't help with 2I queries, while with 
a true query cache we could do write-through updates on 2I queries as well 
(select * from users where birth_date = 1980).  (This is a fairly 
straightforward jump to make for queries within a single composite-PK row, and 
admittedly more complex when spanning multiple physical rows gets involved.)

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-02-10 Thread Sylvain Lebresne (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205528#comment-13205528
]

Sylvain Lebresne commented on CASSANDRA-1956:
-

bq. The filter approach allows us to make slice-based queries more efficient
(somewhat clumsily)

What is so clumsy?

bq. but doesn't really address the inefficiency for name-based queries

Depends on what we're talking. The filter approach would allow to set a
name-based filter. But ok, that is less convenient. But the query cache is not
perfect either. If you do different name-based query, we will end up caching
the same data multiple times. We may be able to optimize this, but then it
becomes fairly complicated.

bq. while with a true query cache we could do write-through updates on 2I
queries as well

I'm not sure I understand, could you clarify your idea?

Don't get me wrong, I'm not totally closed to the idea of query cache or
something alike, but I do want to make sure we don't jump on it without a good
reasoning behind, because I do fear a query cache will come with a bunch of
complication (and while you may have good reasoning, I personally don't yet see
clearly that it's the best choice, so I'll need some convincing). The query
cache also has the risk of caching multiple time the same thing. Take a CF on
which you do some paging: provided the row receives a few update, we'll end up
re-caching the same things multiple times (unless we're really smart about it
but I'm pretty sure it's not a simple problem). I'm not sure how much of a
problem that'll be in practice but ...

Then there is also the fact that the way you model in C* is usually with one CF
per kind of query. So it does feel like keeping each query separately shouldn't
be necessary. But that's not a technical argument.

Convert row cache to row+filter cache
-

Key: CASSANDRA-1956
URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
Fix For: 1.2

Attachments: 0001-1956-cache-updates-v0.patch,
0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch,
0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch,
0002-add-query-cache.patch

Changing the row cache to a row+filter cache would make it much more useful.
We currently have to warn against using the row cache with wide rows, where
the read pattern is typically a peek at the head, but this usecase would be
perfect supported by a cache that stored only columns matching the filter.
Possible implementations:
* (copout) Cache a single filter per row, and leave the cache key as is
* Cache a list of filters per row, leaving the cache key as is: this is
likely to have some gotchas for weird usage patterns, and it requires the
list overheard
* Change the cache key to rowkey+filterid: basically ideal, but you need a
secondary index to lookup cache entries by rowkey so that you can keep them
in sync with the memtable
* others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-02-10 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205579#comment-13205579
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

bq. What is so clumsy?

First, that you need to explicitly configure it as part of the schema.  Second, 
because it inherently only allows one type of query to be cached.  One CF per 
query is the wrong rule of thumb -- One CF per type of resultset is my 
preferred one.  So for instance, you couldn't cache both oldest entries and 
newest, from the same row in a CF.

bq. while with a true query cache we could do write-through updates on 2I 
queries as well

What I mean by this is that {{select * from users where birth_date = 1980}} is 
a query that people could reasonably want to cache, that we can't fit into your 
3 categories of full row, head/tail, handful of named columns.  

At a more sophisticated stage from that, a true query cache could update that 
cached resultsets whenever someone updates the birth_date value to or from 
1980, so the query stays fast without having to be recalculated.  (We already 
have the perfect place in the code for this where index maintenance happens in 
Table.apply.)

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-02-10 Thread Sylvain Lebresne (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205619#comment-13205619
 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
-

bq. So for instance, you couldn't cache both oldest entries and newest, from 
the same row in a CF.

Well, as I said earlier, that would just require to have a handful of query 
per-CF rather than just one. Granted, this may blur a little bit the actual 
complexity difference with respect to a query cache, but it's still likely 
simpler and with less overhead.

I think that for a good part what bothers me with a pure query cache is that I 
think picking what to cache is difficult to do automatically, and looking each 
query in isolation (which is what a query cache does) is not necessarily the 
right thing. The typical example is when the right thing to do is to cache the 
whole row while you'll never query the full row (but maybe on part of the code 
query the firstname and lastname, another query the email, another the phone). 
We've mentioned the idea of having the 'cache the full row' as a special case 
but that doesn't sound very convenient. And it makes me wonder if we won't have 
the same problem for other situation, where the query cache actually play 
against you because it just don't see the big picture. While for the user 
usually know that big picture. In any case, what I meant by my previous comment 
is that pining a full row into the cache is imho something we should keep 
(without forcing the user to always query the full row to get that), and that's 
not handle by a true query cache.

{quote}
What I mean by this is that select * from users where birth_date = 1980 is a 
query that people could reasonably want to cache, that we can't fit into your 3 
categories of full row, head/tail, handful of named columns.

At a more sophisticated stage from that, a true query cache could update that 
cached resultsets whenever someone updates the birth_date value to or from 
1980, so the query stays fast without having to be recalculated. (We already 
have the perfect place in the code for this where index maintenance happens in 
Table.apply.)
{quote}

That's a good point. I agree that in that case a query cache is what we want, 
because the query spans multiple CF (or in other words the query doesn't map 
directly to what's on disk but compute the result). But I'm still not sold on 
the query cache on direct queries of rows, because of the reasons above. 

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-02-10 Thread Vijay (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205673#comment-13205673
 ] 

Vijay commented on CASSANDRA-1956:
--

Alright, What i was trying to do here is to get the feedback from everyone on 
all the use cases and try to fit it into the one cache,

I did some fair amount of research to see if there is any better option and 
there wasn't one, the closest concept which i got to was something like a block 
cache or Linux Page cache When there is updates to those blocks we can find 
those and update those.

1) The problem shows up only when you have a wide row, which means most 
probably the user is doing a range queries
2) If the user has a wide row then most probably he has a large number of 
writes into the row, but if we invalidate the row cache for every updates then 
it might not be useful and also the first read will have to read multiple SST's.
3) Lets say user has a 100 columns to query and he queries in this case 
(specially with composite type columns where the column names can be larger 
than the value), then we can possibly run into memory pressure.
4) Having whole row in memory is absolutely required case and we are supporting 
it (setting min and max number of columns in a block will help it).
5) the above solution can work seamlessly well for narrow rows when the block 
size is reasonably big.

Head and Tail is basically a optimization for the Reverse/Forward queries which 
is supported if you have 1 M rows and your block size is 500 and your count is 
100 and you are reading from reverse.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-02-09 Thread Sylvain Lebresne (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204371#comment-13204371
 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
-

bq.  it should also solve the problems which we are discussing in this ticket

What are those?

I'd like us to be a little scientific on that issue. What is it we are trying 
to do in the first place? My take on that (and please feel free to correct me 
if I'm missing something) is that the kind of caching that I can really see 
useful in practice are:
# Caching a row entirely; that's what we do and I think we agree we should keep 
that feature because sometimes that's what you want.
# Caching the head or the tail of a row for wide rows.
# I could also imagine cases where you want to only pin a few columns (by name) 
into the cache without keeping the row entirely.

And well, that's it. I try to think of other type of (not far fetched 
hypothetical) workload where caching could be a notable win but are not handled 
by the 3 cases above and I don't really find one. Now I apparently am stupid 
and miss 90% of situations since:

bq. but I see a true query cache as being better than the row cache in 90% of 
situations

because the 3 cases above are perfectly handled by the idea of just adding a 
filter per-cf to our current row cache (which btw could easily be extended to 
2-3 filters per-cf if that proves necessary). So please let's share those cases 
that are not above and that we want to handle as part of this ticket.

But if what's above does sum up the problem we want to solve, then I continue 
to think that simply adding a per-cf filter alongside our current row cache is 
the best solution:
* there is *no* memory overhead.
* all 3 caching use case above are handled without any drawback that I can 
think of.
* it's an incremental change of the existing, not a completely new thing, thus 
lowering then risk of introducing new bugs. Typically, I can easily see how 
CASSANDRA-3862 will translate to that solution; but I suspect thing may get 
more complicated for say a query cache.

The only criticism that I've seen so far on that solution is the question of 
the user configuration of the cache, while for the query cache there wouldn't 
be a configuration (which remains to be proven btw if we want to support the 
'stick a row entirely in cache always' case). If someone consider that 
auto-configuration should be an absolute priority then let's discuss that, 
because I disagree with that (to sum up, I think any auto-configuration of 
caches will have drawbacks so I think users should be able to override the 
default and so I think it's more sane to start with a cache that user can make 
do what they want and then evaluate how to make that configuration mostly 
automatic, which I think can be done).

So before considering other solutions, I'd like to understand first more 
clearly why we're discarding that per-cf filter idea. Because currently it 
seems to strike a pretty nice balance of fixing what seems to be the problem 
versus the added complexity.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-27 Thread Vijay (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194981#comment-13194981
 ] 

Vijay commented on CASSANDRA-1956:
--

So for this ticket:

1) Expose rowCache API's so we can extend easier.
2) Reduce the Query cache memory foot print.
3) reject rows  x (Configurable)
4) Writes should not invalidates the cache (Configurable but if not invalidate 
take some hit on the write performance).

Reasonable? Anything missing?

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-27 Thread Vijay (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194980#comment-13194980
 ] 

Vijay commented on CASSANDRA-1956:
--

So for this ticket:

1) Expose rowCache API's so we can extend easier.
2) Reduce the Query cache memory foot print.
3) reject rows  x (Configurable)
4) Writes should not invalidates the cache (Configurable but if not invalidate 
take some hit on the write performance).

Reasonable? Anything missing?

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-08 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182364#comment-13182364
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

bq. I really don't think specifying it in the schema is such a big deal

Maybe, but I see a true query cache as being better than the row cache in 90% 
of situations.  It's only an accident of implementation that we did the row 
cache first.  So it feels weird to me to argue to keep it instead of moving to 
a more flexible model.  (One that could accommodate 2ary index queries as well, 
for instance.)

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-07 Thread Vijay (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182148#comment-13182148
 ] 

Vijay commented on CASSANDRA-1956:
--

How about a query cache which will cache all query's by default
1) Users can set an upper bound on the size per row (cache rejection handle)
2) Users can also say cache everything startWith=a to endWith=x if the 
query falls within. (The first time we see see for a row) We will populate the 
cache with the predefined query and subsequent queries which will fetch results 
within these limits will served from the cache and the rest will go to the disk.
3) the existing row cache is nothing but a configuration with startWith= and 
endWith= (everything in a row).

Makes sense?


 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-06 Thread Sylvain Lebresne (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181200#comment-13181200
 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
-

bq. That, and I want to cache a specific set of known-ahead-of-time columns 
[maybe the entire row], which is what today's row cache is mostly used for.

That is trivially handled by the filter-per-cf approach I'm advocating, 
contrarily to the query cache solution. 

bq. I think it's a huge, huge win for a design to be able to handle both of 
these, without requiring it to be specified in the schema.

Again, I really don't think specifying it in the schema is such a big deal in 
that case (I insist on the in that case, I'm *not* pretending hand-tuning is 
never a big deal), nor does it feel a hard one to get right.

Now don't get me wrong, I agree that self-tuning is great, but only if we know 
how to do it correctly. Typically, and to refer to some ideas above, I think 
that if users have to think about what query they should do to have good 
caching (like using select * when really they want select x, y but want to keep 
the full row in cache, or to be careful that if they use too many different 
queries for a given row it won't play well with the cache), then 1) it's still 
hand-tuning and 2) one that is imo far less convenient/intuitive.

Basically what I'm saying is that with a query cache, I see a number of 
unknowns, of added difficulties (what about the space taken by all those filter 
per query? how do we make sure to cache the full row when it's the right thing 
to do without any user intervention? etc...) and of cases where it will be less 
efficient that the filter-per-cf alternative unless the user is super careful 
(will that be a problem in real life ? maybe not, but maybe). On the other 
side, adding a simple per-cf filter is a nice simple increment over what we 
have and we stay in known territory while solving the problem we want to solve.

Besides, if specifying a filter with the schema is that much of a problem, 
maybe we can do that choice automatically. We have stats on the rows avg and 
max size, and we can easily start gathering some simple stats on queries, at 
least enough to be able to say if it's the head or tail that we need to keep in 
cache for wide rows. Though honestly, even if we do that, my preference would 
largely go to still allow the user to override whatever automatic choice we 
came up with if they wish so.



 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-05 Thread Sylvain Lebresne (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180297#comment-13180297
 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
-

Thinking about this a bit more, I'm not sure I'm convinced by a query cache. Or 
rather, I think that a cache + filter defined with the schema could be much 
simpler and imo likely good enough.

More precisely, with a query cache, when you update a (cached) row, you have to 
check every queries for that row to see if it should be updated. I'm afraid 
there will be cases where this will be inefficient and this will put the burden 
on user to make sure they don't make query that hit those inefficiencies. I'm 
also really not fan of having of putting the burden on user to query full row 
if they want it cached fully for equivalent reasons.

It seems to me that what we want to handle here is exactly the 'cache head or 
tail of row' problem. If so, it seems to me that simply adding a per-cf 
(optional) filter to the cache has the following advantages:
- It handles the head/tail use case, as well as the current cache all row case.
- You don't have to care about the problems I mention above
- There is no in-memory overhead of filters problem. We just keep one filter 
per cf. We can allow more than one filter per-cf in the future *if* that proves 
useful, which I'm not even too sure it will.
- There is no upgrade headache at all, no new cache that use will have to 
switch to, nothing we'd have to deprecate. No new mental model for the user of 
how things are cached, just a imho very natural new option of being able to 
select what part of the row is cached.
- No question of having two solutions. The current cache will just be the case 
were there is no filter configured (or the filter is the identity filter, 
whether optimize the no filter case or use the identity filter is really a 
implementation detail).

Now the only downside I could see to that compared to a query cache is the fact 
that you have to define the filter with the schema. I really see this as almost 
anecdotal. Doesn't seem very complicated (and certainly more simple than having 
to change your query to make sure what you want is cached) to write something 
along the line of:
{noformat}
CREATE TABLE timeline (
userid uuid,
timestamp time,
action text,
PRIMARY KEY (userid, timestamp)
) WITH COMPACT STORAGE AND CACHING FIRST 100;
{noformat}
(note the use of our tentative syntax of CASSANDRA-2474 for wide rows) or even
{noformat}
CREATE TABLE users (
userid uuid PRIMARY KEY,
firstname text,
lastname text,
age int,
email text,
picture binary,
) WITH CACHING (firstname, lastname, email);
{noformat}
if one is so inclined to do that (because he don't want to cache profile 
pictures for instance).


 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-05 Thread Sylvain Lebresne (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180375#comment-13180375
 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
-

bq. The only inefficiency as noted above are deletions which are hard to handle 
without invalidation and now that we have them ttl columns.

Yes, though I'll note that a query cache have the exact same problem so it's 
more a problem we have to deal with than anything else. And I'm good by 
starting with just invalidating for deletion and TTL at first. On a second 
iteration, I think we could improve the TTL case by keeping for each cached row 
(at least those associated to a non-identity slice filter) a 
firstColumnToExpireTimestamp. We would keep that to the smallest 
localExpirationTime in the cached row. Gets and puts would just invalidate the 
cache if now = firstColumnToExpireTimestamp. But again, we don't have to 
concern ourselves with that initially.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-05 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180912#comment-13180912
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

bq. It seems to me that what we want to handle here is exactly the 'cache head 
or tail of row' problem.

That, and I want to cache a specific set of known-ahead-of-time columns [maybe 
the entire row], which is what today's row cache is mostly used for.

I think it's a huge, huge win for a design to be able to handle both of these, 
*without* requiring it to be specified in the schema.  We've been moving away 
from hand-tuning, towards self-tuning, for a very good reason: when you require 
humans to do the right thing to be efficient, you're going to be inefficient an 
awful lot of the time. :)

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Vijay (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179619#comment-13179619
 ] 

Vijay commented on CASSANDRA-1956:
--

If i understand it correctly, we need a configuration which will tell cache to 
cache filters only if the returned row is more than x size?

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179627#comment-13179627
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

I don't know, is it really worth keeping the old cache entire rows behavior 
around if we have something more sophisticated?

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179629#comment-13179629
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

... to answer my own question, I want to keep an entire CF in memory is a 
fairly common request.  So maybe the answer is, we support that more directly, 
as well as the query cache.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Sylvain Lebresne (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179689#comment-13179689
 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
-

bq. ... to answer my own question, I want to keep an entire CF in memory is a 
fairly common request. So maybe the answer is, we support that more directly, 
as well as the query cache.

Thinking out loud, but with secondary indexes, we use the trick of expanding 
the filter to a full row filter if the maxRowSize for the cfs is smaller that 
some value (the columnIndexSize more specially). Given that keeping entire CF 
in memory really make sense only for static (narrow) CFs, we could just expand 
filters for those CFs automatically and just have a query cache as far as the 
cache implementation is concerned.

As a side note, I share Daniel's opinion (at least I believe that's what he 
meant earlier) that a serializing query cache that invalidate on update will be 
very useful for wide rows. At least I don't see many use cases where it would. 
However I see a cache that would not invalidate on update but keep the cached 
data matching the filter be much more useful, even if we start by an on-heap 
cache. And once we've agreed that we'll evacuate the delete problem by 
invalidating, I don't think it's too hard to do. 

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Daniel Doubleday (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179765#comment-13179765
 ] 

Daniel Doubleday commented on CASSANDRA-1956:
-

bq. As a side note, I share Daniel's opinion (at least I believe that's what he 
meant earlier) that a serializing query cache that invalidate on update will be 
very useful for wide rows.

If there's a 'not' missing here than yes :-)

bq. However I see a cache that would not invalidate on update but keep the 
cached data matching the filter be much more useful

Just as an implementation idea that could make this easier: If the cache data 
would be merged with memtables upon read you could merge/write back cache data 
upon memtable flush which avoids synchronization headaches and might be more 
efficient (we did something similar in CASSANDRA-2864)

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179787#comment-13179787
 ] 

Pavel Yaskevich commented on CASSANDRA-1956:


That is why I propose to combine current technique and filter-data and use 
first for small rows and latter for wide ones without on-update invalidation. 
And I agree with Daniel and Sylvain that serializing query cache that 
invalidate on update won't be very useful in most cases.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Vijay (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179800#comment-13179800
 ] 

Vijay commented on CASSANDRA-1956:
--

Cool i will do the following: 
* Make a configurable cache type serializing or inheap.
* Make x size to be configurable either full row or partial.
* Add a ranking of Queries 
Example: IdentifyFilter will have the highest rank and any query can go fetch 
the data from them, in other words compare the query and see if the returned 
data will be a subset of the cache query and if yes then return those
Use the same ranking if it is not an invalidating the cache to update the rows 
(which can be tricky but i can give it a shot).

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179876#comment-13179876
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

bq. That is why I propose to combine current technique and filter-data and use 
first for small rows and latter for wide ones

I'd rather avoid the complexity of keeping both implementations around.  If the 
rows are small enough that keeping the whole thing in memory is the right 
tradeoff, then users can optimize that themselves by using select * instead 
of select x and select y (i.e., the former would result in just one cache 
entry for the row).  I suspect it won't matter a great deal in real work 
scenarios anyway.

How about this?

- Query cache replaces row cache, with on/off heap implementations based on 
existing SC/CLHC.  Use CLHM weight feature to rank by query result size.
- Cache key becomes (row key, query filter)
- When applying an update to row X, check query cache for filters on X.  Update 
cached CF with the new data for on-heap, invalidate for off-.
- New ticket for pin CF in memory feature

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179883#comment-13179883
 ] 

Pavel Yaskevich commented on CASSANDRA-1956:


I don't say that we should keep both implementations, I suggest we combine them 
:) I'm not a big fan of keeping filter for small rows because it creates 
superfluous overhead.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179905#comment-13179905
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

You could be right.  I'd still prefer to do have everything go through a single 
code path first, and then we can evaluate optimization afterwards.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Daniel Doubleday (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179909#comment-13179909
 ] 

Daniel Doubleday commented on CASSANDRA-1956:
-

bq. users can optimize that themselves by using select * instead of select 
x and select y

I guess it's not totally atypical right now to model data in a way that it fits 
the current caching scheme. I.e. we have UserData rows with around 10 columns 
and around 50 - 100k row size. All of them are read during one session at a 
different time. With the proposed new caching scheme we have the choice to 
either create 10x cache misses and a lot more objects to gc. Or load the entire 
row everywhere.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179981#comment-13179981
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

So... this proposal would be *worse* than the status quo for you?  I thought 
this was your idea! :)

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Daniel Doubleday (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180093#comment-13180093
 ] 

Daniel Doubleday commented on CASSANDRA-1956:
-

Nice try :-)

I was Mr I dont believe in magic from the very beginning. I thought that there 
are some common cases like tail and head caching or exclusion of columns that 
could be implemented. But never mind - maybe the proposed cache is all 
greatness. All I'm trying to say is that it's pretty easy to end up in 
propagation failure hell here or change something else that blows things up for 
use cases that are not foreseen.

So to prevent a rude awakening for some users it might be cool to provide some 
means (config or whatever) that works similar as the current version. Or at 
least schedule this for a release which allows for a downgrade.

Just saying ... 

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180133#comment-13180133
]

Jonathan Ellis commented on CASSANDRA-1956:
---

bq. I thought that there are some common cases like tail and head caching or
exclusion of columns that could be implemented.

Right. That's what we want to get out of this -- well, the tail/head part,
since query cache can only cache what you do ask for, but it can't exclude what
you don't. (Although if you never query the excluded column... close enough,
right?)

bq. So to prevent a rude awakening for some users it might be cool to provide
some means (config or whatever) that works similar as the current version.

Good point. Damn it. :)

Convert row cache to row+filter cache
-

Attachments: 0001-1956-cache-updates-v0.patch,
0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch,
0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180134#comment-13180134
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

Maybe we should just leave the existing row cache in for 1.2 and deprecate it 
in favor of query cache, and remove it in 1.3.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Daniel Doubleday (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180152#comment-13180152
 ] 

Daniel Doubleday commented on CASSANDRA-1956:
-

I think that would be great ...

bq. Although if you never query the excluded column... close enough, right?

Well maybe you could incorporate hinting as mysql or oracle does in thrift and 
cql as in

{noformat}
SELECT * FROM WhatEver IGNORE CACHE (*) WHERE KEY = 'ComesMyWay'
{noformat}

or

{noformat}
SELECT * FROM WhatEver /*+ nocache(*) */ WHERE KEY = 'ComesMyWay'
{noformat}

That would also prevent cache pollution when you need to run jobs

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180156#comment-13180156
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

I suppose, but best practice is still going to be to run that kind of job on 
separate replicas, so that feels pretty low priority to me.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-03 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179146#comment-13179146
 ] 

Pavel Yaskevich commented on CASSANDRA-1956:


What I propose is - let's make a row cache more configurable according to the 
way we handle big rows, we can allow cache to take a per-cf filter class 
which would have settings for max row size to cache (and probably some other 
options in the future) so we can feed it a ColumnFamily from getTopColumns and 
let it decide if we should just keep that ColumnFamily in cache as is, because 
it's a thin row, or (if we are querying different parts of the big row) keep 
QueryFilter and ColumnFamily.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-03 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179149#comment-13179149
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

What if we turned it into a real query cache?  That way we don't have to 
predefine filters in the schema, we just use the IFilter objects from the 
queries involved.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-01-03 Thread Pavel Yaskevich (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179155#comment-13179155
 ] 

Pavel Yaskevich commented on CASSANDRA-1956:


That would work, actually I think that we can slightly modify IFilter to return 
number of columns involved or use resulting ColumnFamily.serializedSize() to 
check for threshold, I just want to make sure that cache has minimal memory 
overhead associated with keeping filters for thin rows.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2011-12-04 Thread Daniel Doubleday (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162540#comment-13162540
 ] 

Daniel Doubleday commented on CASSANDRA-1956:
-

Looking at CASSANDRA-3143 I understand that the consensus seems to be that 
users should be protected from missconfigurations and that a sane 'works most 
of the time' solution is the way to go.

Still I want to note why I do believe that some solution along the lines of 
your patch is a good idea.
Obviously I can only speak for us and our usecases but our app can only work 
with the relevant data set in mem.

Being able to to optimize caching using the knowledge about data access 
patterns could significantly improve effeciency.

Right now we still need first level caching for a lot of use cases which makes 
life not necessarily easier.

With the patch custom cache providers could (on a per cf basis) cache partial 
rows, optimize memory format or even integrate other caching solutions. We 
could also skip caching entirely for certain queries. This would also be great 
for maintenance jobs which would otherwise lead to cache thrashing.

I understand that nobody wants to scare off adopters but all of this would not 
really change anything for people who want to go with the sane standard. And of 
course it is also clear that such extension points cannot be guaranteed to be 
stable. 

Yada yada yada ... 

Thanks for the effort btw!

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2011-11-05 Thread Vijay (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144902#comment-13144902
 ] 

Vijay commented on CASSANDRA-1956:
--

Sure, If no one else has a concern i can do the refractor as a part of this 
patch...

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2011-11-02 Thread Daniel Doubleday (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142264#comment-13142264
]

Daniel Doubleday commented on CASSANDRA-1956:
-

As I wrote earlier I'm a little sceptical that a query cache like this will be
useful in many cases but since there is something going on here and Jonathan
asked for a wish list:

Would you guys consider making the row cache a little more pluggable? This
would allow us to maintain custom implementation more easily. Also I think that
the core code could benefit as well moving some ifs out of CFS.

Instead of implementing the control flow in CFS and using the cache as a map
you could introduce an RowCache instance that would act more like a service
layer like:

{noformat}
interface RowCache {

// returns filtered rows - ready to serve. reads the row from cfs if
necessary.
ColumnFamily getRow(CFS store, QueryFilter filter, int gcBefore);

// notify the cache of a mutation. it can update or invalidate
void apply(CFS store, DK key, CF cf);
}
{noformat}

This way CFS would need no knowledge wether a cache is able to update or only
invalidate. And when it invalidates wether it has to invalidate the row or just
portions of it. Also there would be no expectation about the internal caching
format. The row cache could do whatever it likes.

In CFS there would be only the cache reference. No distinction between old row
cache, query cache, off-heap-cache, my-awesome-very-specialized-cache would be
necessary.

Convert row cache to row+filter cache
-

Attachments: 0001-1956-cache-updates-v0.patch,
0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2011-10-26 Thread Chris Burroughs (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136053#comment-13136053
 ] 

Chris Burroughs commented on CASSANDRA-1956:


Doesn't the QueryCache's ledger also need to have entries removed when entries 
are evicted from the primary cache?

Would it be reasonable to have a tunning knob so only rows with more than n 
columns are filter cached to avoid paying the penalty of storing the row key 
twice?

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2011-10-26 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136055#comment-13136055
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

Do we store the full key twice, or just two references to the same key object?  
The latter would be negligible IMO.  (And asking How many total columns are in 
this row? isn't free, either.)

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-1956) Convert row cache to row+filter cache

2011-01-28 Thread Daniel Doubleday (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988023#action_12988023
 ] 

Daniel Doubleday commented on CASSANDRA-1956:
-

Now it's my ahh  :-) I think I understood your idea.

Instead of configuring cache rules you want to cache every filtered request 
like mysql query cache?

I dropped the idea because I thought it would be either very restricted to 
certain query patterns or very complicated to keep in sync with memtables 
and/or decide whether a query can be served by the cache. Also it might be hard 
to avoid that the cache is being polluted (analogous to the page cache eviction 
problem during compaction). It might force the developer to spread the data 
over multiple CFs according to access pattern which increases memory needs 
(more memtables, more rows).

But yes - if you can come up with an automagical cache management that just 
works that would be obviously nicer!

PS: If you wan to have a look at the patch: apply to 0.7 r1064192


 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Stu Hood
Assignee: Daniel Doubleday
 Fix For: 0.7.2

 Attachments: 0001-row-cache-filter.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1956) Convert row cache to row+filter cache

2011-01-27 Thread Stu Hood (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987739#action_12987739
]

Stu Hood commented on CASSANDRA-1956:
-

These are cached. If one of them gets deleted it would not be able to return
a valid response.
Ahh, sorry: quite right. Invalidation sounds like the best option there.

I'll try and review this more closely in the next week, but I'm not sure I like
the filter as a configuration option, as opposed to any of the ideas in the
summary.

Convert row cache to row+filter cache
-

Key: CASSANDRA-1956
URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 0.7.0
Reporter: Stu Hood
Assignee: Daniel Doubleday
Fix For: 0.7.2

Attachments: 0001-row-cache-filter.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1956) Convert row cache to row+filter cache

2011-01-26 Thread Stu Hood (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987360#action_12987360
 ] 

Stu Hood commented on CASSANDRA-1956:
-

Thanks for the patch Daniel! We actually have existing 'filter' implementations 
(in {{org.apache.cassandra.db.filter}}) that I think would make the most sense 
for use aside cache entries.

 What about just invalidating (removing from the cache) the row on delete and 
 letting it get rebuild on the next read?
Also, regarding the tombstones in cache problem: I believe it came up in IRC 
the other day. The solution that seemed closest to our existing methods was to 
keep the tombstones in cache, but to add a thread that periodically walked the 
cache to perform GC (with our existing GC timeout) like we would during 
compaction.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Stu Hood
Assignee: Daniel Doubleday
 Fix For: 0.7.2

 Attachments: 0001-row-cache-filter.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

52 matches

Mail list logo