Todd Lipcon has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5541

Change subject: spark: continue scanning after encountering empty batch
......................................................................

spark: continue scanning after encountering empty batch

The Spark connector would previously stop scanning after the first empty
batch returned by a tablet server. The tablet server will not return an
empty batch when there are rows remaining in the tablet unless the scan
hits an internal timeout of 500ms[1]. This can only realistically happen
on large scans with highly selective predicates on data not in the block
cache. As a result this behavior only occurs with very large tables on
slow tablet server, which makes it very hard to test.  No unit tests are
included with this patch, but the fix has been verified on a real
cluster exhibiting the issue.

[1] 
https://github.com/apache/kudu/blob/2ed179a7a188b4748a43a829940764ab5dddbc1c/src/kudu/tserver/tablet_service.cc#L1670

Change-Id: I4fdb7836a27940cab674100da0ef0ea5e050bbdd
Reviewed-on: http://gerrit.cloudera.org:8080/5531
Tested-by: Kudu Jenkins
Reviewed-by: Chris George <[email protected]>
Reviewed-by: Todd Lipcon <[email protected]>
(cherry picked from commit cd02f9d409f4d7b06863b92d5d9d325bb05b8d55)
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/5541/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5541
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4fdb7836a27940cab674100da0ef0ea5e050bbdd
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-1.2.x
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>

Reply via email to