Dan Burkert has posted comments on this change.

Change subject: KUDU-1309: [java client] support tables with non-covering 
partition-key ranges
......................................................................


Patch Set 2:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/3388/2/java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java:

PS2, Line 139: This map
             :    * is always the first to be updated, because that's the map 
from
             :    * which all the lookups are done in the fast-path of the 
requests
             :    * that need to locate a tablet. The second map to be updated 
is
             :    * tablet2client, because it comes second in the fast-path
             :    * of every requests that need to locate a tablet. The third 
map
             :    * is only used to handle TabletServer disconnections 
gracefully.
> Nit: reading this again (I reviewed it eons ago), I think this part of the 
I moved the non covered range cache out from under this comment since it really 
doesn't fit (it's not the same data indexed differently).  I agree that this 
comment is not great, but it should be fixed with a proper meta cache.


Line 155:    * Currently once a non-covered range is added to the cache, it is 
never
> Nit: prefix this part of the comment with "TODO:" since it is expected to c
Done


Line 576:   Deferred<AsyncKuduScanner.Response> openScanner(final 
AsyncKuduScanner scanner) {
> True, this can be removed.
Done


Line 685:       // Otherwise fall through to below where a GetTableLocations 
lookup will occur
> Nit: end with a period.
Done


Line 1148:       Pair<byte[], byte[]> nonCoveredRange = 
getNonCoveredRange(table.getTableId(), key);
> Nit: if you want, you can call getTableId() once outside of the loop, store
Done


PS2, Line 1355:     NonCoveredRangeCache nonCoveredRanges = 
nonCoveredRangeCaches.get(tableId);
              :     if (nonCoveredRanges == null) {
              :       nonCoveredRanges = new NonCoveredRangeCache();
              :       NonCoveredRangeCache oldCache = 
nonCoveredRangeCaches.putIfAbsent(tableId, nonCoveredRanges);
              :       if (oldCache != null) {
              :         nonCoveredRanges = oldCache;
              :       }
              :     }
> How about just:
putIfAbsent returns null if the entry was absent.


Line 1378:       nonCoveredRanges.addNonCoveredRange(EMPTY_ARRAY, 
firstStartKey);
> Don't we need all of these calls to addNonCoveredRange() to be atomic? With
It may contain stale entries, but that needs to be handled once add/drop 
partition is introduced anyway.


http://gerrit.cloudera.org:8080/#/c/3388/2/java/kudu-client/src/main/java/org/kududb/client/AsyncKuduScanner.java
File java/kudu-client/src/main/java/org/kududb/client/AsyncKuduScanner.java:

Line 474:             invalidate();
> Nit: invalidate() is called in both branches, can we pull it out?
Done


http://gerrit.cloudera.org:8080/#/c/3388/2/java/kudu-client/src/main/java/org/kududb/client/AsyncKuduSession.java
File java/kudu-client/src/main/java/org/kududb/client/AsyncKuduSession.java:

Line 127:   private final List<OperationResponse> failedLookups = new 
ArrayList<>();
> I think you can use a CopyOnWriteArrayList here and omit the use of a lock 
I think in this case (where we are incrementally building up the list), a mutex 
is the best answer.   Doing the checked put shuffle with an 
AtomicReference<List> is going to be hard and involve spinning + rebuilding the 
list per spin.  COWAL isn't even lock free: 
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/concurrent/CopyOnWriteArrayList.java#433


Line 303:       Callback<Deferred<List<OperationResponse>>, 
ArrayList<BatchResponse>> {
> Nit: not related to your change, but why is ArrayList the second type and n
Done


Line 306:       batchResponses = batchResponses == null ? new 
ArrayList<BatchResponse>() : batchResponses;
> Is a null batchResponses actually possible? How?
I'm not sure, the code already handled the null case.  When I comment out this 
line the tests still pass.  JD, do you know what the deal is?


PS2, Line 308: // flushTablet() can return null when a tablet we wanted to 
flush was already flushed. Those
             :       // nulls along with BatchResponses are then put in a list 
by Deferred.group(). We first need
             :       // to filter the nulls out.
> OMG.
Agreed.  Refactored.


Line 320:       List<OperationResponse> responsesList = new ArrayList<>(size);
> If we're going to go through the trouble of precomputing the size, we ought
I thought about this, but the only way to figure out how many failed lookups 
there are is to take out the lock.  Since there really shouldn't be failed 
lookups I decided not to slow down the normal path.  I'm not too concerned 
about making the case where there are slow lookups be slow.


Line 397:     if (tablet == null) {
> Why don't we perform this check to AUTO_FLUSH_SYNC on L377-384? Seems usefu
The first step of sendRpcToTablet (again, despite the name), is looking up the 
tablet to send it to based on the Rpc's partition key.  So the check is done 
before an RPC is made, but it's done in sendRpcToTablet.

All that being said, I'm not sure why we have to do it here.  Or for that 
matter, why AsyncKuduSession is tracking the lookups in flight at all.  Seems 
to me all that should happen for a batched apply is that a new 
GetTableLocations should be kicked off if the tablet isn't known for the RPC, 
and otherwise just add it to the batch.  On flush, it can just be sent through 
sentRpcToTablet.  If the probe has returned in time then it will be faster, 
otherwise it goes through the lookup process again.


PS2, Line 399: new RowError(Status.NotFound("Write to a non-covered range 
partition"),
             :                                          operation, null);
> Could you add a two-arg constructor for this case?
Done


PS2, Line 401: new OperationResponse(0, null, 0, operation, rowError);
> Likewise, perhaps add a new constructor for this?
OperationResponse already relies on setting null params to the point where it's 
entwined in the control flow of Operation::deserialize


PS2, Line 416:           if (lookupsDone != null && 
operationsInLookup.isEmpty()) {
             :             lookupsDoneCopy = lookupsDone;
             :             lookupsDone = null;
             :           }
> Copied from addToBuffer(), right? Can it be decomposed into a helper with a
refactored into oblivion


Line 422:         operation.callback(response);
> Why do we do this?
this completes the individual operation response returned from #apply.


http://gerrit.cloudera.org:8080/#/c/3388/2/java/kudu-client/src/main/java/org/kududb/client/CreateTableOptions.java
File java/kudu-client/src/main/java/org/kududb/client/CreateTableOptions.java:

Line 57:    * The table creator takes ownership of the rows. If either row is 
empty, then
> Is this copied from the C++ client? What does 'ownership' mean here? In any
Done


http://gerrit.cloudera.org:8080/#/c/3388/2/java/kudu-client/src/main/java/org/kududb/client/KuduRpc.java
File java/kudu-client/src/main/java/org/kududb/client/KuduRpc.java:

Line 77:     return null;
> Why return null instead of throwing some sort of "unsupported operation" ex
It's not really an exceptional circumstance, every master RPC falls in this 
category, for instance.


http://gerrit.cloudera.org:8080/#/c/3388/2/java/kudu-client/src/main/java/org/kududb/client/NonCoveredRangeCache.java
File java/kudu-client/src/main/java/org/kududb/client/NonCoveredRangeCache.java:

Line 39: final class NonCoveredRangeCache {
> Nit: why 'final'? The class is package-private, so it seems like the only c
Done


Line 40:   public static final Logger LOG = 
LoggerFactory.getLogger(NonCoveredRangeCache.class);
> Nit: can be private
Done


Line 59:     if (COMPARATOR.compare(partitionKey, range.getKey()) >= 0 &&
> Isn't this first comparison implied by the floorEntry() call?
Done


Line 61:       return new Pair<>(range.getKey(), range.getValue());
> Why can't we return the range as-is? Why do we need to make a copy?
Done


Line 77:     if (nonCoveredRanges.put(startPartitionKey, endPartitionKey) == 
null) {
> May want to comment on the concurrent behavior (i.e. multiple callers in di
Done


Line 84:   public String toString() {
> Looks like this will return an inconsistent string representation if nonCov
if concurrent updates happen to the entries the iteration is undefined, so I'm 
not sure there is much to be done.


http://gerrit.cloudera.org:8080/#/c/3388/2/java/kudu-client/src/main/java/org/kududb/client/NonCoveredRangeException.java
File 
java/kudu-client/src/main/java/org/kududb/client/NonCoveredRangeException.java:

Line 26: @InterfaceStability.Unstable
> Why not evolving?
evolving is at a higher level of stability than unstable.


http://gerrit.cloudera.org:8080/#/c/3388/2/java/kudu-client/src/test/java/org/kududb/client/BaseKuduTest.java
File java/kudu-client/src/test/java/org/kududb/client/BaseKuduTest.java:

PS2, Line 243:    * * [  0,  50)
             :    * * [ 50, 100)
             :    * * [200, 300)
> Nit: what does the extra '*' mean?
Done


http://gerrit.cloudera.org:8080/#/c/3388/2/java/kudu-client/src/test/java/org/kududb/client/TestFlexiblePartitioning.java
File 
java/kudu-client/src/test/java/org/kududb/client/TestFlexiblePartitioning.java:

Line 104:     KuduScanner s = 
syncClient.newScannerBuilder(table).scanRequestTimeout(9999999).build();
> That's an odd choice of timeout. Why? And if you're looking for a large num
woops, left over from debugging.


http://gerrit.cloudera.org:8080/#/c/3388/2/java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java
File java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java:

Line 457:       if (key == 0) session.flush();
> Why flush after the very first operation? What's the significance of that?
not sure; removed.


http://gerrit.cloudera.org:8080/#/c/3388/2/java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java
File java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java:

Line 35:     tableName = TestKuduClient.class.getName() + "-" + 
System.currentTimeMillis();
> FWIW, even though this is guaranteed to run with every test, if a test fini
Done


Line 221:     createOptions.setNumReplicas(1);
> Why do the new tests use only one replica, but the old ones don't?
Replicas aren't important for these particular tests.  Additionally, they 
create more tablets per table (the other tables have a single tablet).


-- 
To view, visit http://gerrit.cloudera.org:8080/3388
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id65a1f8a95bb16fc0360a17021a391afd3c9d03f
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

Reply via email to