Adar Dembo has posted comments on this change.

Change subject: KUDU-1065: [java client] Flexible Partition Pruning
......................................................................


Patch Set 1:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java:

Line 311:    * Not for public consumption: use either predicates or primary key 
bounds instead.
If Impala is no longer using these two, should we just remove them?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

Line 142:   /** The partition pruner */
Not a useful comment, just drop it.


PS1, Line 260:     if (!table.getPartitionSchema().isSimpleRangePartitioning() 
&&
             :         (startPrimaryKey.length != 0 ||
             :          endPrimaryKey.length != 0) &&
             :         PARTITION_PRUNE_WARN.getAndSet(false)) {
             :       LOG.warn("Starting full table scan. " +
             :                "In the future this scan may be automatically 
optimized with partition pruning.");
Is this still relevant?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java:

Line 348
This is still a parameter though.


Line 31: import java.io.ByteArrayOutputStream;
Not used anymore?


Line 45:   private final ByteVec buf = ByteVec.create();
Not really seeing the point of this class member given that the first thing 
every non-static method does is clear() it.

If the goal is to avoid allocation new ByteVecs with each call, maybe make it 
possible for those methods to pass in their own ByteVec?


Line 63:   public static int getHashBucket(PartialRow row, HashBucketSchema 
hashSchema) {
Javadoc?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

Line 328:           List <LocatedTablet> newTablets = table.getTabletsLocations(
Nit: List<LocatedTablet>


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java:

PS1, Line 292: { break; }
Nit: why the braces if the break is on the same line as the condition?


PS1, Line 360: Step 1:
In this function there's only Step 1.


Line 398:   private static byte[] pushPredicatesIntoUpperBoundRangeKey(Schema 
schema,
I wonder if these push methods would be clearer if combined with a bool (or 
enum) to control whether they're upper or lower bounds. There's a lot of 
commonality.

Might even be interesting to handle both in a loop of size 2 (and return a 
pair); then you could reuse the product of idsToIndexes. If you go down that 
route, I'd look into combining with pushPredicatesIntoHashBucket() too, that 
way there's one logical method that returns everything that's been pushed.


Line 433:         throw new IllegalArgumentException("unexpected predicate type 
can not be pushed into key");
Nit: can you add the predicate as you did in 
pushPredicatesIntoLowerBoundRangeKey()?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java
File java/kudu-client/src/main/java/org/apache/kudu/util/ByteVec.java:

PS1, Line 60:   /**
            :    * Creates a new vector.
            :    * @return the new vector.
            :    */
            :   public static ByteVec create() {
            :     return new ByteVec(DEFAULT_CAPACITY);
            :   }
            : 
            :   /**
            :    * Creates a new vector with the specified capacity.
            :    * @param capacity the initial capacity of the vector
            :    * @return a new vector with the specified capacity
            :    */
            :   public static ByteVec withCapacity(int capacity) {
            :     return new ByteVec(capacity);
            :   }
            : 
            :   /**
            :    * Wrap an existing array with a vector.
            :    * The array should not be modified after this call.
            :    * @param data the initial data for the vector
            :    * @return a vector wrapping the data
            :    */
            :   public static ByteVec wrap(byte[] data) {
            :     return new ByteVec(data);
            :   }
You find these more useful than exposing the constructors directly? You could 
add a no-arg variant that uses DEFAULT_CAPACITY.


Line 115:   public void reserve(int additional) {
Seems like the first two checks can be removed; they are duplicated in 
reserveExact() anyway.


Line 116:     if (additional < 0) throw new IllegalArgumentException("negative 
additional");
Nit: how about Preconditions.checkArgument(additional > 0) instead? Elsewhere 
too, use Preconditions; it's a little more succinct and clear.


PS1, Line 130: len
Shouldn't this be 'additional'?


PS1, Line 131:  >
Shouldn't this be >= ?


Line 177:     if (index >= len) throw new IndexOutOfBoundsException();
May want to include a helpful message in the exception.


PS1, Line 178: data[index] = value;
Won't this do the bounds check for you?


PS1, Line 181:   /**
             :    * Appends the bytes from another byte array to this vec.
             :    * @param values the values to append
             :    * @param offset the offset to append from
             :    * @param len the number of bytes to append
             :    */
Nit: reword a bit to make it clear that we're appending some subset of the 
bytes from 'values', and that 'offset'/'len' are the base and bounds of that 
subset. Right now it's not clear that 'offset' refers to 'values' (and not to 
the offset in this.data).


Line 194:    * Appends the bytes from another byte array to this vec.
Nit: likewise, here you should say something like "Appends all the bytes..."


PS1, Line 216:     if (index >= len) throw new IndexOutOfBoundsException();
             :     return data[index];
Nit: add a helpful message, or let Java do the bounds check for you.


PS1, Line 229:    * The vector should not be concurrently modified while the 
iterator is in use.
Will a CME be thrown if someone modifies during iteration?


Line 232:   public Iterator iterator() {
How about having ByteVec inherit Iterable<Byte> and then have the nested 
Iterator class implement Iterator<Byte>? Then you could use ByteVec in enhanced 
for-loops.


Line 281:   public int hashCode() {
Maybe use Guava's Objects.hashCode(...) to simplify?


PS1, Line 290:     ByteVec clone = new ByteVec(0);
             :     clone.data = Arrays.copyOf(data, data.length);
             :     clone.len = len;
             :     return clone;
How about:

  ByteVec clone = ByteVec.withCapacity(data.length);
  clone.append(this);
  return clone;


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java:

Copyright


Line 17:   @Test
Can you reorder the contents of this class so that the "helper" methods precede 
the tests?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java
File java/kudu-client/src/test/java/org/apache/kudu/util/TestByteVec.java:

Line 47:   private void checkByteVec(List<Byte> vals) {
How about testing wrap or withCapacity?


PS1, Line 55: assertEquals(vals, vec.asList());
This is going to rely on the correctness of ByteVec.equals(), but presumably 
that may be buggy, so you may want a variant that checks byte by byte too. Of 
course, that could be buggy as well, so you can't win everywhere.


PS1, Line 66: copy.truncate(vals.size() / 2);
For symmetry with the assertEquals() you just called, perhaps assert that vals 
is NOT equals to copy.asList()?


http://gerrit.cloudera.org:8080/#/c/4299/1/java/kudu-client/src/test/resources/flags
File java/kudu-client/src/test/resources/flags:

Line 1
Sneaky sneaky. At least mention it in your commit description?


-- 
To view, visit http://gerrit.cloudera.org:8080/4299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib27b54841d87cf854175ab8cdfa8798b337d71f9
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to