Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/11111 )
Change subject: KUDU-2525: KuduTableInputFormat may halt before exhausting scan ...................................................................... Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/11111/1/java/kudu-mapreduce/src/main/java/org/apache/kudu/mapreduce/KuduTableInputFormat.java File java/kudu-mapreduce/src/main/java/org/apache/kudu/mapreduce/KuduTableInputFormat.java: http://gerrit.cloudera.org:8080/#/c/11111/1/java/kudu-mapreduce/src/main/java/org/apache/kudu/mapreduce/KuduTableInputFormat.java@66 PS1, Line 66: * This input format generates one split per tablet and the only location for each split is that > Could you try to synthesize this into a unit test? That'll help solidify th Empty scan batches can happen in at least two ways: * It's possible for the client to set the scan bytes limit to 0 when creating a new scanner, and then use something bigger for subsequent continue scan calls. In the case of the new scanner call it will be an empty row batch (see https://github.com/apache/kudu/blob/master/src/kudu/tserver/tablet_service.cc?utf8=%E2%9C%93#L1867). As far as I know you can't actually configure a Java scanner to do this, though. * If a scan is filtering many rows it may not be able to find a single matching row before the internal 500ms timeout expires. Unfortunately this is one the few completely hardcoded timeouts in Kudu, so we can't easily toggle it for a unit test: https://github.com/apache/kudu/blob/master/src/kudu/tserver/tablet_service.cc?utf8=%E2%9C%93#L1941 Neither of these are easy to reproduce in a unit tests easily. My best idea would be to add a test-only flag to the tserver to 'inject' empty batches randomly. That being said, because the Spark KuduRDD has effectively the exact same loop (https://github.com/apache/kudu/blob/master/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala#L124), I'd also be OK just landing this as-is without a unit test. -- To view, visit http://gerrit.cloudera.org:8080/11111 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbfdd2efbd281e4d849917664b33e183e180bafd Gerrit-Change-Number: 11111 Gerrit-PatchSet: 2 Gerrit-Owner: zhangqianqiong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: zhangqianqiong <[email protected]> Gerrit-Comment-Date: Fri, 03 Aug 2018 18:09:32 +0000 Gerrit-HasComments: Yes
