[jira] [Commented] (CASSANDRA-18673) Reduce size of per-SSTable index components

2023-08-18 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756105#comment-17756105
 ] 

Caleb Rackliffe commented on CASSANDRA-18673:
-

An additional repeat run for \{{StorageAttachedIndexDDLTest}} looks green, and 
all other failures are existing/unrelated. Moving to commit...

> Reduce size of per-SSTable index components
> ---
>
> Key: CASSANDRA-18673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Urgent
> Fix For: 5.0.x, 5.x
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18673) Reduce size of per-SSTable index components

2023-08-10 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752874#comment-17752874
 ] 

Caleb Rackliffe commented on CASSANDRA-18673:
-

+1

I'm guessing the 5.0 and trunk patches will be identical, since we just created 
{{cassandra-5.0}}...

> Reduce size of per-SSTable index components
> ---
>
> Key: CASSANDRA-18673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Urgent
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18673) Reduce size of per-SSTable index components

2023-08-09 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752608#comment-17752608
 ] 

Caleb Rackliffe commented on CASSANDRA-18673:
-

Finished w/ my pass at review, and left my comments in the PR.

> Reduce size of per-SSTable index components
> ---
>
> Key: CASSANDRA-18673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Urgent
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18673) Reduce size of per-SSTable index components

2023-08-01 Thread Mike Adamson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749857#comment-17749857
 ] 

Mike Adamson commented on CASSANDRA-18673:
--

[~maedhroz] I have attached a new PR to this ticket. This patch does the 
following:
 * Removes the primary key trie on-disk component
 * Adds a partition sizes on-disk component
 * Adds a partitionedSeekToTerm to SortedTermsReader.Cursor
 * Creates separate SkinnyRowAwarePrimaryKeyMap and WideRowAwarePrimaryKeyMap 
components

 

> Reduce size of per-SSTable index components
> ---
>
> Key: CASSANDRA-18673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Urgent
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18673) Reduce size of per-SSTable index components

2023-07-25 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747173#comment-17747173
 ] 

Caleb Rackliffe commented on CASSANDRA-18673:
-

For anyone watching, there are still some issues w/ how we handle/compress more 
complex primary keys. Once we've addressed those, this will move back into 
review...

> Reduce size of per-SSTable index components
> ---
>
> Key: CASSANDRA-18673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Urgent
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18673) Reduce size of per-SSTable index components

2023-07-20 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745342#comment-17745342
 ] 

Caleb Rackliffe commented on CASSANDRA-18673:
-

Made a first pass at this and left some comments. Overall, things are looking 
pretty good and CI is clean...

> Reduce size of per-SSTable index components
> ---
>
> Key: CASSANDRA-18673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Urgent
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18673) Reduce size of per-SSTable index components

2023-07-20 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745247#comment-17745247
 ] 

Caleb Rackliffe commented on CASSANDRA-18673:
-

[~mike_tr_adamson] Reviewing now, but want to make sure we don't forget to 
throw up a Phase 2 Jira for removing the sorted terms entirely in favor of  
{{row ID -> trie node ID}} map + collecting the PK from the trie itself...

> Reduce size of per-SSTable index components
> ---
>
> Key: CASSANDRA-18673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Urgent
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18673) Reduce size of per-SSTable index components

2023-07-20 Thread Mike Adamson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745152#comment-17745152
 ] 

Mike Adamson commented on CASSANDRA-18673:
--

I have completed some performance runs against this branch and the current CEP 
branch. This loaded 1B rows with the following schema:
{noformat}
create table if not exists TEMPLATE(keyspace,test).TEMPLATE(table,sai) (
  id bigint,
  time timestamp,
  value int,
  lc int,
  tag text,
  PRIMARY KEY (id)
  );
  CREATE CUSTOM INDEX IF NOT EXISTS ON  
TEMPLATE(keyspace:test).TEMPLATE(table:sai) (time) USING 'StorageAttachedIndex';
  CREATE CUSTOM INDEX IF NOT EXISTS ON  
TEMPLATE(keyspace:test).TEMPLATE(table:sai) (value) USING 
'StorageAttachedIndex';
  CREATE CUSTOM INDEX IF NOT EXISTS ON  
TEMPLATE(keyspace:test).TEMPLATE(table:sai) (lc) USING 'StorageAttachedIndex';
  CREATE CUSTOM INDEX IF NOT EXISTS ON  
TEMPLATE(keyspace:test).TEMPLATE(table:sai) (tag) USING 'StorageAttachedIndex';
{noformat}
Data was loaded into the time, value & tag columns.
||Branch||SSTable Size GB||Per-SSTable Index Components GB||Tag Index GB||Time 
Index GB||Value Index GB||SAI Total GB||
|CEP|48|70|2|7|7|87|
|CASSANDRA-18673|48|13|2|7|7|29|

> Reduce size of per-SSTable index components
> ---
>
> Key: CASSANDRA-18673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Urgent
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18673) Reduce size of per-SSTable index components

2023-07-19 Thread Mike Adamson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744687#comment-17744687
 ] 

Mike Adamson commented on CASSANDRA-18673:
--

This patch introduces the following changes:
* The token is no longer included in the primary key data stored in the sorted 
terms and primary key trie. This allows the sorted terms and the primary key 
trie to correctly prefix compress the primary keys. This was not possible with 
the token at the start of the stored data. 
* To cater for the primary keys no longer being in lexicographic order, the 
primary key trie is now segmented to allow the keys to be sorted in memory 
first.
* The NamedMemoryLimiter has been renamed the SegmentMemoryLimiter and 
simplified in its usage. This allows it to more easily be used by the 
SegmentBuilder for per-column indexes and by the primary key trie.
* The LongArray can now search for rowIds by token making it bidirectional.
* The primary key trie is only written for wide tables. If the table has no 
clustering then the rowId can be read from the token LongArray making the trie 
redundant.

> Reduce size of per-SSTable index components
> ---
>
> Key: CASSANDRA-18673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Urgent
>
> The current per-SSTable index components are large because the primary keys 
> that are stored in them include the token as part of the byte comparable. The 
> byte comparable puts the token first meaning that we get very little prefix 
> compression from either the trie or the sorted terms store. 
> We can fix this by removing the token from the primary key serialization. 
> This would allow us to get the prefix compression from the trie and the 
> sorted terms store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org