[ https://issues.apache.org/jira/browse/CASSANDRA-11911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343977#comment-15343977 ]
Benjamin Lerer edited comment on CASSANDRA-11911 at 6/22/16 9:16 AM: --------------------------------------------------------------------- Just 2 small nits: * I think that: {code} if (value == null) { rawValues.add(null); } else { rawValues.add(value == UNSET_VALUE ? UNSET_VALUE : typeCodecs.get(i).serialize(value, ProtocolVersion.NEWEST_SUPPORTED)); } {code} should be replaced by something like: {code} rawValues.add(serialize(value, typeCodecs.get(i))); [...] private ByteBuffer serialize(Object value, TypeCodec codec) { if (value == null || value == UNSET_VALUE) return value; return codec.serialize(value, ProtocolVersion.NEWEST_SUPPORTED)); } {code} as it simplify the logic and remove duplicate code. * It might be worth it to check the error messages in the unit tests to be sure that they are the expected ones was (Author: blerer): Just 2 small nits: * I think that: {code} if (value == null) { rawValues.add(null); } else { rawValues.add(value == UNSET_VALUE ? UNSET_VALUE : typeCodecs.get(i).serialize(value, ProtocolVersion.NEWEST_SUPPORTED)); } {code} should be replaced by something like: {code} rawValues.add(serialize(value, typeCodecs.get(i))); [...] private ByteBuffer serialize(Object value, TypeCodec codec) { if (value == null || value == UNSET_VALUE) return value; return codec.serialize(value, ProtocolVersion.NEWEST_SUPPORTED)); } {code} as it simplify the logic and remove duplicate code. * It might be worth it to check the error messages in the unit tests to be sure that thaey are the expected ones > CQLSSTableWriter should allow for unset fields > ---------------------------------------------- > > Key: CASSANDRA-11911 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11911 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cassandra 3.0.6 > Reporter: Matt Kopit > Assignee: Alex Petrov > Labels: lhf > > If you are using CQLSSTableWriter to bulk load data into sstables the only > way to handle fields without values is by setting them to NULL, which results > in the generation of a tombstoned field in the resulting sstable. For a large > dataset this can result in a large number of tombstones. > CQLSSTableWriter is currently instantiated with a single INSERT statement, so > it's not an option to modify the insert statement to specify different fields > on a per-row basis. > Here are three potential solutions to this problem: > 1. Change the default behavior of how NULLs are handled so those fields are > treated as UNSET and will never be written to the sstable. > 2. Create a configuration option for CQLSSTableWriter that governs whether > NULLs should be ignored. > 3. Invent a new constant that represents an UNSET value which can be used in > place of NULL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)