[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2013-04-29 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644580#comment-13644580
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

What is the difference between the token and the key?

The FST could theoretically cache (due to compression efficiency) every 
key/token (?) in RAM, removing the need for the key cache, and providing an 
extremely fine grained pointer to the underlying row value data.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2013-04-29 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644583#comment-13644583
 ] 

Jonathan Ellis commented on CASSANDRA-4324:
---

My point is that keys routed to a given node will be effectively a random 
subset of the keys in use (under RP/M3P), so unlike the tokens I don't think 
it's clear at all that they will be well-compressible by FST. 

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-11-02 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13489823#comment-13489823
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

Lucene 4.0 has been released, so maybe it's time to revisit this issue.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-08-14 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434727#comment-13434727
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

Jonathan, do you mean there is no need for 'array index' lookups into the 
'index' keys?

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-08-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434782#comment-13434782
 ] 

Jonathan Ellis commented on CASSANDRA-4324:
---

Not from what I saw skimming the uses.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-08-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433526#comment-13433526
 ] 

Jonathan Ellis commented on CASSANDRA-4324:
---

bq. Looks like only SSTableReader calls IndexSummary#getKeys

Yes, although some uses, like the one in {{getKeySamples(Range)}}, are going to 
be tricky without an actual list of keys to work with.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-08-13 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433731#comment-13433731
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

{quote}some uses, like the one in getKeySamples(Range), are going to be tricky 
without an actual list of keys to work with{quote}

The FST has a range iteration method, and the FST will return the keys, which 
should suffice?

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-08-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433740#comment-13433740
 ] 

Jonathan Ellis commented on CASSANDRA-4324:
---

Yes, I don't think we need random access anywhere but the actual index lookups.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-17 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416801#comment-13416801
 ] 

Yuki Morishita commented on CASSANDRA-4324:
---

bq. Interesting, what is the information?

Specifically I mean IndexSummary#getKeys() returns list of DecoratedKeys and 
that is called from many places. FST does not carry DecoratedKeys around so you 
may need to find the way to work around.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-17 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416859#comment-13416859
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

Looks like only SSTableReader calls IndexSummary#getKeys()?  As you mentioned, 
we need to abstract out the key and key range access into the IndexSummary 
class, that should obviate the need to call getKeys, unless I'm missing 
something?

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-17 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416372#comment-13416372
 ] 

Yuki Morishita commented on CASSANDRA-4324:
---

Jason,

I used YourKit and profiled memory usage for your test (little bit modified to 
call IndexSummary#complete) and it shows

IndexSummary: 21,597,040 (~20MB)
FST: 3,576,248 (~3.4MB)

for storing 10,000 keys to each, so it's pretty impressive. If we can deliver 
this, it will be huge win.
(Note that on disk, IndexSummary only writes key portion of DecoratedKey so it 
may be smaller than FST.)

My concerns left are as follows:

* Planned 1.2 release saves IndexSummary to disk(CASSANDRA-2392), so I think it 
is better to leave current implementation and add FST version of IndexSummary 
so you can rw from both.
* DecoratedKeys stored inside current IndexSummary are actually accessed from 
various places, and FST version will lack those information, you may need to 
figure out the alternative way to preserve current functionality.
* If you want to use Lucene 4.0, we should release this feature after 4.0 
release.

bq. Also the last results are for 100,000 keys rather than 1 mil.

IndexSummary holds keys for every index_interval(default 128), so I think you 
don't need to test with 1 mil.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-17 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416388#comment-13416388
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

Nice that's a huge win!  That's for the MD5 encoded keys?

{quote}DecoratedKeys stored inside current IndexSummary are actually accessed 
from various places, and FST version will lack those information, you may need 
to figure out the alternative way to preserve current functionality{quote}

Interesting, what is the information?  Why are there two keys stored in 
DecoratedKey?

FST supports range like scans, however I am not exactly sure how that part 
works.  We probably want to restructure the API to make it abstract for both 
implementations?

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-17 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416632#comment-13416632
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

Also it should be mentioned, using the FST will be a big win for garbage 
collection, it's basically a single byte[].  The IndexSummary currently uses a 
lot of object pointers, which are more costly than a single byte[].

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-14 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414558#comment-13414558
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

Can you verify the mem size calculation for the IndexSummary is correct?

I realized the keys should be sorted?  Where is the code that sorts the keys?

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-14 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414561#comment-13414561
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

Also the last results are for 100,000 keys rather than 1 mil.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-05 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407191#comment-13407191
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

Lucene 4.0 will be available shortly and contains FST code that is very stable. 
We should use it with the assumption that Lucene 4.0 will be available soon.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-04 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406487#comment-13406487
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

The benchmark idea is interesting, however it will not take into account the 
fact that the FST will be able to store more keys and use less RAM.  With 
greater key granularity, a seek to a given value will be faster?  Is there an 
existing benchmark framework that will for example generate the keys?

In general the big win with the FST is the amount of RAM consumed should be far 
less.  That is fairly easy to measure by generating N keys and comparing the 
RAM usage, which with the existing IndexSummary will include object pointers.  

This article describes the improvements seen using Wikipedia using the FST, up 
to 52% less RAM used, and 22% faster.  Though we need to perform our own 
benchmarks because an MD5 key is different than a dictionary of words.

http://blog.mikemccandless.com/2011/01/finite-state-transducers-part-2.html



 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406496#comment-13406496
 ] 

Sylvain Lebresne commented on CASSANDRA-4324:
-

I guess we mostly want to make sure this patch is not clearly slower that with 
the current index (and if it's faster then that's even better). Using 
tools/stress (with enough keys inserted) should be a good start.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-04 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406499#comment-13406499
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

The range portion of the patch is not completed as per the initial patch 
posted.  So I'm not sure tools/stress will work?  However if tools/stress has a 
way to generate keys that would be useful.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406591#comment-13406591
 ] 

Jonathan Ellis commented on CASSANDRA-4324:
---

bq. In general the big win with the FST is the amount of RAM consumed should be 
far less. That is fairly easy to measure by generating N keys and comparing the 
RAM usage

I think that's what yuki meant by micro benchmark [of] memory.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-04 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406603#comment-13406603
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

An easy approach is to generate an ascending set of keys and positions, apply 
the MD5 hash and add them to both data structures and compare the RAM usage.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-04 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406604#comment-13406604
 ] 

Jason Rutherglen commented on CASSANDRA-4324:
-

Also what is the best way to include the Lucene libraries?

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-04 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406607#comment-13406607
 ] 

Yuki Morishita commented on CASSANDRA-4324:
---

Manually put jar file inside lib directory (as well as adding dependency for 
maven in build.xml as you do).
I've noticed you added 4.0-SNAPSHOT to build.xml, but I think 4.0 is still 
alpha, so does lucene-core 3.6 work for this?

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-03 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406240#comment-13406240
 ] 

Yuki Morishita commented on CASSANDRA-4324:
---

Jason,

Thanks for the patch.
Current IndexSummary has list of DecoratedKeys and list of positions, but 
search is done against KeyBound as well. Both DecoratedKey and KeyBound are 
subclass of RowPosition and are compared using their Tokens. So, I think you 
have to construct FST against Token.
For implementation, it would be better to keep all lucene FST related classes 
inside IndexSummary and not expose them directory to SSTableReader etc.

Also, can you provide micro benchmark (memory, cpu time...) of IndexSummary 
between current implementation and FST?

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira