ZhangYao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14107 )

Change subject: KUDU-2921: Exposing the table statistics to spark relation.
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsRequest.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsRequest.java:

http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsRequest.java@45
PS8, Line 45:             Master.GetTableStatisticsRequestPB.newBuilder();
> nit: to be consistent with the rest of the line-wraps, multi-line code shou
Done


http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTable.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTable.java:

http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTable.java@785
PS8, Line 785:           "--update_tablet_stats_interval_ms=" + 200,
> nit: Why not just "--update_tablet_states_interval_ms=200"? Same below?
Done


http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@281
PS8, Line 281: overestimate this size than underestimate
> It's worth noting that the on-disk size is encoded and compressed. Is that
The estimation of relation size is the more accurate the better, if we don't 
have accurate value, overestimate is better that underestimate.
Because we don't take projection and predicates into consideration of size 
estimation and the size we records contains meta and log so I suppose it is 
overestimate.
I thinks use (row count * row size) is more accurate, but we don't know the 
average row size and if the schema contains String/Binary we don't know it's 
size.


http://gerrit.cloudera.org:8080/#/c/14107/8/src/kudu/master/master-test.cc
File src/kudu/master/master-test.cc:

http://gerrit.cloudera.org:8080/#/c/14107/8/src/kudu/master/master-test.cc@1790
PS8, Line 1790: resp.on_disk_size(), 0);
              :   ASSERT_EQ(resp.live_row_count(), 0);
> nit: the expected value should be on the left-hand-side of the comma, other
Done


http://gerrit.cloudera.org:8080/#/c/14107/8/src/kudu/master/master-test.cc@1799
PS8, Line 1799: resp.on_disk_size(), FLAGS_on_disk_size_for_testing);
              :   ASSERT_EQ(resp.live_row_count(), 
FLAGS_live_row_count_for_testing
> Same here.
Done


http://gerrit.cloudera.org:8080/#/c/14107/8/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/14107/8/src/kudu/master/master_service.cc@422
PS8, Line 422:
> nit: got too many spaces here.
Done



--
To view, visit http://gerrit.cloudera.org:8080/14107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7742a76708f989b0ccc8ba417f3390013e260175
Gerrit-Change-Number: 14107
Gerrit-PatchSet: 8
Gerrit-Owner: ZhangYao <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: ZhangYao <[email protected]>
Gerrit-Comment-Date: Sun, 08 Sep 2019 15:56:51 +0000
Gerrit-HasComments: Yes

Reply via email to