joao-r-reis commented on code in PR #1952: URL: https://github.com/apache/cassandra-java-driver/pull/1952#discussion_r1916983031
########## core/src/main/java/com/datastax/oss/driver/api/core/type/DataTypes.java: ########## @@ -69,12 +66,21 @@ public static DataType custom(@NonNull String className) { /* Vector support is currently implemented as a custom type but is also parameterized */ if (className.startsWith(DefaultVectorType.VECTOR_CLASS_NAME)) { - List<String> params = - paramSplitter.splitToList( - className.substring( - DefaultVectorType.VECTOR_CLASS_NAME.length() + 1, className.length() - 1)); - DataType subType = classNameParser.parse(params.get(0), AttachmentPoint.NONE); - int dimensions = Integer.parseInt(params.get(1)); + String paramsString = Review Comment: Why did you decide to stop using `paramSplitter`? ########## core/src/main/java/com/datastax/oss/driver/api/core/data/CqlVector.java: ########## @@ -87,15 +87,52 @@ public static <V extends Number> CqlVector<V> newInstance(List<V> list) { * @param subtypeCodec * @return a new CqlVector built from the String representation */ - public static <V extends Number> CqlVector<V> from( - @NonNull String str, @NonNull TypeCodec<V> subtypeCodec) { + public static <V> CqlVector<V> from(@NonNull String str, @NonNull TypeCodec<V> subtypeCodec) { Preconditions.checkArgument(str != null, "Cannot create CqlVector from null string"); Preconditions.checkArgument(!str.isEmpty(), "Cannot create CqlVector from empty string"); - ArrayList<V> vals = - Streams.stream(Splitter.on(", ").split(str.substring(1, str.length() - 1))) - .map(subtypeCodec::parse) - .collect(Collectors.toCollection(ArrayList::new)); - return new CqlVector(vals); + if (str == null || str.isEmpty() || str.equalsIgnoreCase("NULL")) return null; Review Comment: Isn't null and empty already checked by `Preconditions.checkArgument`? ########## core/src/main/java/com/datastax/oss/driver/api/core/data/CqlVector.java: ########## @@ -196,7 +233,8 @@ public int hashCode() { @Override public String toString() { - return Iterables.toString(this.list); + TypeCodec<T> subcodec = CodecRegistry.DEFAULT.codecFor(list.get(0)); Review Comment: We could store the subtype codec as a field of the vector but then it will only be available if the vector was created with `from()` so it's a bit weird. Using the default codec could lead to unexpected behavior if the user has custom codecs so I don't think it's a good solution either. The C# `CqlVector` class has methods to easily convert between native arrays and `CqlVector` objects so that the user can use existing code that is compatible with arrays like debugging, printing, etc. There's no way to convert a C# `CqlVector` object to `string` directly but there's also no way to create a `CqlVector` object from a string representation. I think this was a fine idea when `CqlVector` was only for `float` values but it becomes very complex to support when we extend the subtype to any cql supported type including collections, udts, etc. Personally I'd try to find a way to keep supporting this for subtypes that were already supported before this change but just throw an exception if the subtype would require access to the `TypeCodec` to make it work. Alternatively if we don't really mind breaking this string conversion support for existing apps that use the current `CqlVector` class then that would also work but I don't think that's a good idea. ########## core/src/main/java/com/datastax/oss/driver/api/core/data/CqlVector.java: ########## @@ -87,15 +87,52 @@ public static <V extends Number> CqlVector<V> newInstance(List<V> list) { * @param subtypeCodec * @return a new CqlVector built from the String representation */ - public static <V extends Number> CqlVector<V> from( - @NonNull String str, @NonNull TypeCodec<V> subtypeCodec) { + public static <V> CqlVector<V> from(@NonNull String str, @NonNull TypeCodec<V> subtypeCodec) { Preconditions.checkArgument(str != null, "Cannot create CqlVector from null string"); Preconditions.checkArgument(!str.isEmpty(), "Cannot create CqlVector from empty string"); - ArrayList<V> vals = - Streams.stream(Splitter.on(", ").split(str.substring(1, str.length() - 1))) - .map(subtypeCodec::parse) - .collect(Collectors.toCollection(ArrayList::new)); - return new CqlVector(vals); + if (str == null || str.isEmpty() || str.equalsIgnoreCase("NULL")) return null; + + int idx = ParseUtils.skipSpaces(str, 0); + if (str.charAt(idx++) != '[') + throw new IllegalArgumentException( + String.format( + "Cannot parse vector value from \"%s\", at character %d expecting '[' but got '%c'", + str, idx, str.charAt(idx))); + + idx = ParseUtils.skipSpaces(str, idx); + + if (str.charAt(idx) == ']') { + return null; Review Comment: Shouldn't it be an empty vector instead of null? ########## core/src/main/java/com/datastax/oss/driver/internal/core/type/codec/VectorCodec.java: ########## @@ -156,4 +116,195 @@ public CqlVector<SubtypeT> parse(@Nullable String value) { ? null : CqlVector.from(value, this.subtypeCodec); } + + private static class FixedLength<SubtypeT> implements VectorCodecProxy<SubtypeT> { + private final VectorType cqlType; + private final TypeCodec<SubtypeT> subtypeCodec; + + private FixedLength(VectorType cqlType, TypeCodec<SubtypeT> subtypeCodec) { + this.cqlType = cqlType; + this.subtypeCodec = subtypeCodec; + } + + @Override + public ByteBuffer encode( + @Nullable CqlVector<SubtypeT> value, @NonNull ProtocolVersion protocolVersion) { + if (value == null || cqlType.getDimensions() <= 0) { + return null; + } + ByteBuffer[] valueBuffs = new ByteBuffer[cqlType.getDimensions()]; + Iterator<SubtypeT> values = value.iterator(); + int allValueBuffsSize = 0; + for (int i = 0; i < cqlType.getDimensions(); ++i) { + ByteBuffer valueBuff; + SubtypeT valueObj; + + try { + valueObj = values.next(); + } catch (NoSuchElementException nsee) { + throw new IllegalArgumentException( + String.format( + "Not enough elements; must provide elements for %d dimensions", + cqlType.getDimensions())); + } + + try { + valueBuff = this.subtypeCodec.encode(valueObj, protocolVersion); + } catch (ClassCastException e) { + throw new IllegalArgumentException("Invalid type for element: " + valueObj.getClass()); + } + if (valueBuff == null) { + throw new NullPointerException("Vector elements cannot encode to CQL NULL"); + } + allValueBuffsSize += valueBuff.limit(); + valueBuff.rewind(); + valueBuffs[i] = valueBuff; + } + // if too many elements, throw + if (values.hasNext()) { + throw new IllegalArgumentException( + String.format( + "Too many elements; must provide elements for %d dimensions", + cqlType.getDimensions())); + } + /* Since we already did an early return for <= 0 dimensions above */ + assert valueBuffs.length > 0; + ByteBuffer rv = ByteBuffer.allocate(allValueBuffsSize); + for (int i = 0; i < cqlType.getDimensions(); ++i) { + rv.put(valueBuffs[i]); + } + rv.flip(); + return rv; + } + + @Override + public CqlVector<SubtypeT> decode( + @Nullable ByteBuffer bytes, @NonNull ProtocolVersion protocolVersion) { + if (bytes == null || bytes.remaining() == 0) { + return null; + } + + int elementSize = subtypeCodec.serializedSize().get(); + if (bytes.remaining() != cqlType.getDimensions() * elementSize) { + throw new IllegalArgumentException( + String.format( + "Expected elements of uniform size, observed %d elements with total bytes %d", + cqlType.getDimensions(), bytes.remaining())); + } + + ByteBuffer slice = bytes.slice(); + List<SubtypeT> rv = new ArrayList<SubtypeT>(cqlType.getDimensions()); + for (int i = 0; i < cqlType.getDimensions(); ++i) { + // Set the limit for the current element + int originalPosition = slice.position(); + slice.limit(originalPosition + elementSize); + rv.add(this.subtypeCodec.decode(slice, protocolVersion)); + // Move to the start of the next element + slice.position(originalPosition + elementSize); + // Reset the limit to the end of the buffer + slice.limit(slice.capacity()); + } + + return CqlVector.newInstance(rv); + } + } + + private static class VariableLength<SubtypeT> implements VectorCodecProxy<SubtypeT> { + private final VectorType cqlType; + private final TypeCodec<SubtypeT> subtypeCodec; + + private VariableLength(VectorType cqlType, TypeCodec<SubtypeT> subtypeCodec) { + this.cqlType = cqlType; + this.subtypeCodec = subtypeCodec; + } + + @Override + public ByteBuffer encode( + @Nullable CqlVector<SubtypeT> value, @NonNull ProtocolVersion protocolVersion) { + if (value == null || cqlType.getDimensions() <= 0) { + return null; + } + ByteBuffer[] valueBuffs = new ByteBuffer[cqlType.getDimensions()]; + Iterator<SubtypeT> values = value.iterator(); + int allValueBuffsSize = 0; + for (int i = 0; i < cqlType.getDimensions(); ++i) { + ByteBuffer valueBuff; + SubtypeT valueObj; + + try { + valueObj = values.next(); + } catch (NoSuchElementException nsee) { + throw new IllegalArgumentException( + String.format( + "Not enough elements; must provide elements for %d dimensions", + cqlType.getDimensions())); + } + + try { + valueBuff = this.subtypeCodec.encode(valueObj, protocolVersion); + } catch (ClassCastException e) { + throw new IllegalArgumentException("Invalid type for element: " + valueObj.getClass()); + } + if (valueBuff == null) { + throw new NullPointerException("Vector elements cannot encode to CQL NULL"); + } + int elementSize = valueBuff.limit(); + allValueBuffsSize += elementSize + VIntCoding.computeVIntSize(elementSize); + valueBuff.rewind(); + valueBuffs[i] = valueBuff; + } + + // if too many elements, throw + if (values.hasNext()) { + throw new IllegalArgumentException( + String.format( + "Too many elements; must provide elements for %d dimensions", + cqlType.getDimensions())); + } + + /* Since we already did an early return for <= 0 dimensions above */ + assert valueBuffs.length > 0; + ByteBuffer rv = ByteBuffer.allocate(allValueBuffsSize); + for (int i = 0; i < cqlType.getDimensions(); ++i) { + VIntCoding.writeUnsignedVInt32(valueBuffs[i].remaining(), rv); + rv.put(valueBuffs[i]); + } + rv.flip(); + return rv; + } + + @Override + public CqlVector<SubtypeT> decode( + @Nullable ByteBuffer bytes, @NonNull ProtocolVersion protocolVersion) { + if (bytes == null || bytes.remaining() == 0) { + return null; + } + + ByteBuffer input = bytes.duplicate(); + List<SubtypeT> rv = new ArrayList<SubtypeT>(cqlType.getDimensions()); + for (int i = 0; i < cqlType.getDimensions(); ++i) { + int size = VIntCoding.getUnsignedVInt32(input, input.position()); + input.position(input.position() + VIntCoding.computeUnsignedVIntSize(size)); + + ByteBuffer value; + if (size < 0) { + value = null; + } else { + value = input.duplicate(); + value.limit(value.position() + size); + input.position(input.position() + size); + } + rv.add(subtypeCodec.decode(value, protocolVersion)); + } + // if too many elements, throw + if (input.hasRemaining()) { + throw new IllegalArgumentException( + String.format( + "Too many elements; must provide elements for %d dimensions", + cqlType.getDimensions())); + } + + return CqlVector.newInstance(rv); + } Review Comment: I feel like there's a lot of code duplication between the 2 "proxy" codecs, I'd suggest either making it a single proxy codec (or just implementing it in the vector codec itself) or creating some methods (possibly an abstract class?) that can be reused by both proxy codecs. The only difference between both of them is that on one codec you have to read/write the length before the actual data to get the element size and on the other one you already have that element size so you can just skip the read/write of that size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org