Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14107 )
Change subject: KUDU-2921: Exposing the table statistics to spark relation. ...................................................................... Patch Set 9: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala: http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@281 PS8, Line 281: overestimate this size than underestimate > The estimation of relation size is the more accurate the better, if we don' I see, thanks for the explanation! I agree, having some kind of measurement for row size seems desirable in that case. Seems like the missing piece might be per-column statistics, i.e. per column min/max, histograms, etc, which would be able to provide accurate estimates or upper bounds on the row sizes, and may allow us to give estimates with projections and predicates. Definitely room for improvements, but I think this patch is still an improvement over nothing :) http://gerrit.cloudera.org:8080/#/c/14107/9/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala: http://gerrit.cloudera.org:8080/#/c/14107/9/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@71 PS9, Line 71: val SIZE_FACTOR = "kudu.sizeFactor" Not used? -- To view, visit http://gerrit.cloudera.org:8080/14107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7742a76708f989b0ccc8ba417f3390013e260175 Gerrit-Change-Number: 14107 Gerrit-PatchSet: 9 Gerrit-Owner: ZhangYao <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: ZhangYao <[email protected]> Gerrit-Comment-Date: Mon, 09 Sep 2019 18:38:12 +0000 Gerrit-HasComments: Yes
