Jean-Daniel Cryans has submitted this change and it was merged. Change subject: spark: continue scanning after encountering empty batch ......................................................................
spark: continue scanning after encountering empty batch The Spark connector would previously stop scanning after the first empty batch returned by a tablet server. The tablet server will not return an empty batch when there are rows remaining in the tablet unless the scan hits an internal timeout of 500ms[1]. This can only realistically happen on large scans with highly selective predicates on data not in the block cache. As a result this behavior only occurs with very large tables on slow tablet server, which makes it very hard to test. No unit tests are included with this patch, but the fix has been verified on a real cluster exhibiting the issue. [1] https://github.com/apache/kudu/blob/2ed179a7a188b4748a43a829940764ab5dddbc1c/src/kudu/tserver/tablet_service.cc#L1670 Change-Id: I4fdb7836a27940cab674100da0ef0ea5e050bbdd Reviewed-on: http://gerrit.cloudera.org:8080/5531 Tested-by: Kudu Jenkins Reviewed-by: Chris George <[email protected]> Reviewed-by: Todd Lipcon <[email protected]> (cherry picked from commit cd02f9d409f4d7b06863b92d5d9d325bb05b8d55) Reviewed-on: http://gerrit.cloudera.org:8080/5541 Reviewed-by: Jean-Daniel Cryans <[email protected]> --- M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5541 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I4fdb7836a27940cab674100da0ef0ea5e050bbdd Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-1.2.x Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins
