Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11111 )

Change subject: KUDU-2525: KuduTableInputFormat may halt before exhausting scan
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11111/1/java/kudu-mapreduce/src/main/java/org/apache/kudu/mapreduce/KuduTableInputFormat.java
File 
java/kudu-mapreduce/src/main/java/org/apache/kudu/mapreduce/KuduTableInputFormat.java:

http://gerrit.cloudera.org:8080/#/c/11111/1/java/kudu-mapreduce/src/main/java/org/apache/kudu/mapreduce/KuduTableInputFormat.java@66
PS1, Line 66:  * This input format generates one split per tablet and the only 
location for each split is that
> Could you try to synthesize this into a unit test? That'll help solidify th
Empty scan batches can happen in at least two ways:

* It's possible for the client to set the scan bytes limit to 0 when creating a 
new scanner, and then use something bigger for subsequent continue scan calls.  
In the case of the new scanner call it will be an empty row batch (see 
https://github.com/apache/kudu/blob/master/src/kudu/tserver/tablet_service.cc?utf8=%E2%9C%93#L1867).
  As far as I know you can't actually configure a Java scanner to do this, 
though.

* If a scan is filtering many rows it may not be able to find a single matching 
row before the internal 500ms timeout expires.  Unfortunately this is one the 
few completely hardcoded timeouts in Kudu, so we can't easily toggle it for a 
unit test: 
https://github.com/apache/kudu/blob/master/src/kudu/tserver/tablet_service.cc?utf8=%E2%9C%93#L1941

Neither of these are easy to reproduce in a unit tests easily.  My best idea 
would be to add a test-only flag to the tserver to 'inject' empty batches 
randomly.

That being said, because the Spark KuduRDD has effectively the exact same loop 
(https://github.com/apache/kudu/blob/master/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala#L124),
 I'd also be OK just landing this as-is without a unit test.



--
To view, visit http://gerrit.cloudera.org:8080/11111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbfdd2efbd281e4d849917664b33e183e180bafd
Gerrit-Change-Number: 11111
Gerrit-PatchSet: 2
Gerrit-Owner: zhangqianqiong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: zhangqianqiong <[email protected]>
Gerrit-Comment-Date: Fri, 03 Aug 2018 18:09:32 +0000
Gerrit-HasComments: Yes

Reply via email to