yifan-c commented on code in PR #4622:
URL: https://github.com/apache/cassandra/pull/4622#discussion_r2818627638
##########
src/java/org/apache/cassandra/db/compression/CompressionDictionaryDetailsTabularData.java:
##########
@@ -278,15 +289,11 @@ private void validate()
if (table == null)
throw new IllegalArgumentException("Table not specified.");
if (tableId == null)
- throw new IllegalArgumentException("Table id not specified");
+ throw new IllegalArgumentException("Table id not specified.");
if (dictId <= 0)
throw new IllegalArgumentException("Provided dictionary id
must be positive but it is '" + dictId + "'.");
if (dict == null || dict.length == 0)
throw new IllegalArgumentException("Provided dictionary byte
array is null or empty.");
- if (dict.length > FileUtils.ONE_MIB)
Review Comment:
Thanks for the background! It helps to understand.
Dictionaries are attached to every SSTable. The size limit is added with
this context. The size of the dictionaries are typically 64~100 KiB. That said,
the underlying zstd trainer do allow train large dictionaries. The questions,
do we want to train dictionaries larger than 1 MiB? The added dictionary size
might outweighs the compression gains (64 KiB vs. 1 MiB dictionaries)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]