Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/14107 )
Change subject: KUDU-2921: Exposing the table statistics to spark relation. ...................................................................... Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala: http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@33 PS6, Line 33: import org.slf4j.{Logger, LoggerFactory} Nit: Single import per line http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@288 PS6, Line 288: override def sizeInBytes: Long = estimatedSize Is there a jira tracking the improvement for factoring predicates and projections into the estimate? That would be useful to have marked here as a TODO. http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala: http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@35 PS6, Line 35: * server to ensure that scanners do not time out Please move the documentation for sizeFactor here. http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@49 PS6, Line 49: sizeFactor: Float = defaultSizeFactor, I am not 100% sure how this configuration is really useful/helpful. Given this is public API and it's not totally clear how a user would calculate and set this value I think we could just use bytes for now and open a follow up jira to improve the statistics to include the real size, or even estimate based on row size and row count. If there is good guidance on how to configure this value, we should include it in the documentation above. http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala: http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@32 PS6, Line 32: import org.apache.kudu.client.{CreateTableOptions, KuduTableStatistics} Nit: One import per line is preferred. Here and below. http://gerrit.cloudera.org:8080/#/c/14107/6/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/14107/6/src/kudu/master/catalog_manager.cc@271 PS6, Line 271: DEFINE_bool(mock_table_metrics_for_testing, false, Do these flags need to be marked as hidden? -- To view, visit http://gerrit.cloudera.org:8080/14107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7742a76708f989b0ccc8ba417f3390013e260175 Gerrit-Change-Number: 14107 Gerrit-PatchSet: 6 Gerrit-Owner: ZhangYao <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: ZhangYao <[email protected]> Gerrit-Comment-Date: Thu, 05 Sep 2019 18:21:42 +0000 Gerrit-HasComments: Yes
