ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/14107 )
Change subject: KUDU-2921: Exposing the table statistics to spark relation. ...................................................................... Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsRequest.java File java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsRequest.java: http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsRequest.java@45 PS8, Line 45: Master.GetTableStatisticsRequestPB.newBuilder(); > nit: to be consistent with the rest of the line-wraps, multi-line code shou Done http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTable.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTable.java: http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTable.java@785 PS8, Line 785: "--update_tablet_stats_interval_ms=" + 200, > nit: Why not just "--update_tablet_states_interval_ms=200"? Same below? Done http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala: http://gerrit.cloudera.org:8080/#/c/14107/8/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@281 PS8, Line 281: overestimate this size than underestimate > It's worth noting that the on-disk size is encoded and compressed. Is that The estimation of relation size is the more accurate the better, if we don't have accurate value, overestimate is better that underestimate. Because we don't take projection and predicates into consideration of size estimation and the size we records contains meta and log so I suppose it is overestimate. I thinks use (row count * row size) is more accurate, but we don't know the average row size and if the schema contains String/Binary we don't know it's size. http://gerrit.cloudera.org:8080/#/c/14107/8/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/14107/8/src/kudu/master/master-test.cc@1790 PS8, Line 1790: resp.on_disk_size(), 0); : ASSERT_EQ(resp.live_row_count(), 0); > nit: the expected value should be on the left-hand-side of the comma, other Done http://gerrit.cloudera.org:8080/#/c/14107/8/src/kudu/master/master-test.cc@1799 PS8, Line 1799: resp.on_disk_size(), FLAGS_on_disk_size_for_testing); : ASSERT_EQ(resp.live_row_count(), FLAGS_live_row_count_for_testing > Same here. Done http://gerrit.cloudera.org:8080/#/c/14107/8/src/kudu/master/master_service.cc File src/kudu/master/master_service.cc: http://gerrit.cloudera.org:8080/#/c/14107/8/src/kudu/master/master_service.cc@422 PS8, Line 422: > nit: got too many spaces here. Done -- To view, visit http://gerrit.cloudera.org:8080/14107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7742a76708f989b0ccc8ba417f3390013e260175 Gerrit-Change-Number: 14107 Gerrit-PatchSet: 8 Gerrit-Owner: ZhangYao <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: ZhangYao <[email protected]> Gerrit-Comment-Date: Sun, 08 Sep 2019 15:56:51 +0000 Gerrit-HasComments: Yes
