Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14107 )

Change subject: KUDU-2921: Exposing the table statistics to spark relation.
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@33
PS6, Line 33: import org.slf4j.{Logger, LoggerFactory}
Nit: Single import per line


http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@288
PS6, Line 288:   override def sizeInBytes: Long = estimatedSize
Is there a jira tracking the improvement for factoring predicates and 
projections into the estimate? That would be useful to have marked here as a 
TODO.


http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala:

http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@35
PS6, Line 35:  *                          server to ensure that scanners do not 
time out
Please move the documentation for sizeFactor here.


http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@49
PS6, Line 49:     sizeFactor: Float = defaultSizeFactor,
I am not 100% sure how this configuration is really useful/helpful. Given this 
is public API and it's not totally clear how a user would calculate and set 
this value I think we could just use bytes for now and open a follow up jira to 
improve the statistics to include the real size, or even estimate based on row 
size and row count.

If there is good guidance on how to configure this value, we  should include it 
in the documentation above.


http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

http://gerrit.cloudera.org:8080/#/c/14107/6/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@32
PS6, Line 32: import org.apache.kudu.client.{CreateTableOptions, 
KuduTableStatistics}
Nit: One import per line is preferred. Here and below.


http://gerrit.cloudera.org:8080/#/c/14107/6/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14107/6/src/kudu/master/catalog_manager.cc@271
PS6, Line 271: DEFINE_bool(mock_table_metrics_for_testing, false,
Do these flags need to be marked as hidden?



--
To view, visit http://gerrit.cloudera.org:8080/14107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7742a76708f989b0ccc8ba417f3390013e260175
Gerrit-Change-Number: 14107
Gerrit-PatchSet: 6
Gerrit-Owner: ZhangYao <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: ZhangYao <[email protected]>
Gerrit-Comment-Date: Thu, 05 Sep 2019 18:21:42 +0000
Gerrit-HasComments: Yes

Reply via email to