Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14107 )

Change subject: KUDU-2921: Exposing the table statistics to spark relation.
......................................................................


Patch Set 9: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@281
PS8, Line 281: overestimate this size than underestimate
> The estimation of relation size is the more accurate the better, if we don'
I see, thanks for the explanation! I agree, having some kind of measurement for 
row size seems desirable in that case.

Seems like the missing piece might be per-column statistics, i.e. per column 
min/max, histograms, etc, which would be able to provide accurate estimates or 
upper bounds on the row sizes, and may allow us to give estimates with 
projections and predicates. Definitely room for improvements, but I think this 
patch is still an improvement over nothing :)


http://gerrit.cloudera.org:8080/#/c/14107/9/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/14107/9/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@71
PS9, Line 71:   val SIZE_FACTOR = "kudu.sizeFactor"
Not used?



--
To view, visit http://gerrit.cloudera.org:8080/14107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7742a76708f989b0ccc8ba417f3390013e260175
Gerrit-Change-Number: 14107
Gerrit-PatchSet: 9
Gerrit-Owner: ZhangYao <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: ZhangYao <[email protected]>
Gerrit-Comment-Date: Mon, 09 Sep 2019 18:38:12 +0000
Gerrit-HasComments: Yes

Reply via email to