Adar Dembo has posted comments on this change. Change subject: KUDU-1065: [java client] Flexible Partition Pruning ......................................................................
Patch Set 1: (32 comments) http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java File java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java: Line 311: * Not for public consumption: use either predicates or primary key bounds instead. If Impala is no longer using these two, should we just remove them? http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java: Line 142: /** The partition pruner */ Not a useful comment, just drop it. PS1, Line 260: if (!table.getPartitionSchema().isSimpleRangePartitioning() && : (startPrimaryKey.length != 0 || : endPrimaryKey.length != 0) && : PARTITION_PRUNE_WARN.getAndSet(false)) { : LOG.warn("Starting full table scan. " + : "In the future this scan may be automatically optimized with partition pruning."); Is this still relevant? http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java File java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java: Line 348 This is still a parameter though. Line 31: import java.io.ByteArrayOutputStream; Not used anymore? Line 45: private final ByteVec buf = ByteVec.create(); Not really seeing the point of this class member given that the first thing every non-static method does is clear() it. If the goal is to avoid allocation new ByteVecs with each call, maybe make it possible for those methods to pass in their own ByteVec? Line 63: public static int getHashBucket(PartialRow row, HashBucketSchema hashSchema) { Javadoc? http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java: Line 328: List <LocatedTablet> newTablets = table.getTabletsLocations( Nit: List<LocatedTablet> http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java File java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java: PS1, Line 292: { break; } Nit: why the braces if the break is on the same line as the condition? PS1, Line 360: Step 1: In this function there's only Step 1. Line 398: private static byte[] pushPredicatesIntoUpperBoundRangeKey(Schema schema, I wonder if these push methods would be clearer if combined with a bool (or enum) to control whether they're upper or lower bounds. There's a lot of commonality. Might even be interesting to handle both in a loop of size 2 (and return a pair); then you could reuse the product of idsToIndexes. If you go down that route, I'd look into combining with pushPredicatesIntoHashBucket() too, that way there's one logical method that returns everything that's been pushed. Line 433: throw new IllegalArgumentException("unexpected predicate type can not be pushed into key"); Nit: can you add the predicate as you did in pushPredicatesIntoLowerBoundRangeKey()? http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java File java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java: PS1, Line 60: /** : * Creates a new vector. : * @return the new vector. : */ : public static ByteVec create() { : return new ByteVec(DEFAULT_CAPACITY); : } : : /** : * Creates a new vector with the specified capacity. : * @param capacity the initial capacity of the vector : * @return a new vector with the specified capacity : */ : public static ByteVec withCapacity(int capacity) { : return new ByteVec(capacity); : } : : /** : * Wrap an existing array with a vector. : * The array should not be modified after this call. : * @param data the initial data for the vector : * @return a vector wrapping the data : */ : public static ByteVec wrap(byte[] data) { : return new ByteVec(data); : } You find these more useful than exposing the constructors directly? You could add a no-arg variant that uses DEFAULT_CAPACITY. Line 115: public void reserve(int additional) { Seems like the first two checks can be removed; they are duplicated in reserveExact() anyway. Line 116: if (additional < 0) throw new IllegalArgumentException("negative additional"); Nit: how about Preconditions.checkArgument(additional > 0) instead? Elsewhere too, use Preconditions; it's a little more succinct and clear. PS1, Line 130: len Shouldn't this be 'additional'? PS1, Line 131: > Shouldn't this be >= ? Line 177: if (index >= len) throw new IndexOutOfBoundsException(); May want to include a helpful message in the exception. PS1, Line 178: data[index] = value; Won't this do the bounds check for you? PS1, Line 181: /** : * Appends the bytes from another byte array to this vec. : * @param values the values to append : * @param offset the offset to append from : * @param len the number of bytes to append : */ Nit: reword a bit to make it clear that we're appending some subset of the bytes from 'values', and that 'offset'/'len' are the base and bounds of that subset. Right now it's not clear that 'offset' refers to 'values' (and not to the offset in this.data). Line 194: * Appends the bytes from another byte array to this vec. Nit: likewise, here you should say something like "Appends all the bytes..." PS1, Line 216: if (index >= len) throw new IndexOutOfBoundsException(); : return data[index]; Nit: add a helpful message, or let Java do the bounds check for you. PS1, Line 229: * The vector should not be concurrently modified while the iterator is in use. Will a CME be thrown if someone modifies during iteration? Line 232: public Iterator iterator() { How about having ByteVec inherit Iterable<Byte> and then have the nested Iterator class implement Iterator<Byte>? Then you could use ByteVec in enhanced for-loops. Line 281: public int hashCode() { Maybe use Guava's Objects.hashCode(...) to simplify? PS1, Line 290: ByteVec clone = new ByteVec(0); : clone.data = Arrays.copyOf(data, data.length); : clone.len = len; : return clone; How about: ByteVec clone = ByteVec.withCapacity(data.length); clone.append(this); return clone; http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java: Copyright Line 17: @Test Can you reorder the contents of this class so that the "helper" methods precede the tests? http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java File java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java: Line 47: private void checkByteVec(List<Byte> vals) { How about testing wrap or withCapacity? PS1, Line 55: assertEquals(vals, vec.asList()); This is going to rely on the correctness of ByteVec.equals(), but presumably that may be buggy, so you may want a variant that checks byte by byte too. Of course, that could be buggy as well, so you can't win everywhere. PS1, Line 66: copy.truncate(vals.size() / 2); For symmetry with the assertEquals() you just called, perhaps assert that vals is NOT equals to copy.asList()? http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/resources/flags File java/kudu-client/src/test/resources/flags: Line 1 Sneaky sneaky. At least mention it in your commit description? -- To view, visit http://gerrit.cloudera.org:8080/4299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert <d...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes