Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14107 )

Change subject: KUDU-2921: Exposing the table statistics to spark relation.
......................................................................


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@816
PS1, Line 816:   public Deferred<GetTableStatisticsResponse> 
getTableStatistics(String name) {
Are you planning to expose this in the c++ client too?


http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsRequest.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsRequest.java:

http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsRequest.java@11
PS1, Line 11: public class GetTableStatisticsRequest extends 
KuduRpc<GetTableStatisticsResponse> {
This can be package private.


http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsResponse.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsResponse.java:

http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-client/src/main/java/org/apache/kudu/client/GetTableStatisticsResponse.java@8
PS1, Line 8: public class GetTableStatisticsResponse extends KuduRpcResponse {
If you see my other comment on defining a KuduTableStatistics object, then this 
could be package private.


http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java:

http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduClient.java@227
PS1, Line 227:   public GetTableStatisticsResponse getTableStatistics(String 
name) throws KuduException {
Does it make sense for this method to be on the KuduTable object? The api would 
then be something like `table.getStatistics`.

I think the return object here should be a `KuduTableStatistics` object. It 
will look similar to the GetTableStatisticsResponse object, but decouples the 
RPC concerns from the domain objects. This pattern can be seen with 
`KuduClient.openTable` which returns KuduTable instead of GetKuduSchemaResponse.


http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTable.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTable.java:

http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTable.java@811
PS1, Line 811:     Thread.sleep(500 * 9);
Is there a more deterministic way to ensure this happens? Using sleep often 
leads to flaky tests.


http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTable.java@814
PS1, Line 814:     statistics = table.getTableStatistics();
Can you test some edge cases? Maybe call this with 0 rows, and insert rows 
again and check that the value has increased?


http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@263
PS1, Line 263:   private val sizeFactor: Float = 1.0F
What's the purpose of the size factor here?


http://gerrit.cloudera.org:8080/#/c/14107/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@267
PS1, Line 267:   override def sizeInBytes: Long = (math.max(onDiskSize, 0) * 
sizeFactor).toLong
Shouldn't this size take the projection and predicates into consideration? I 
suppose this would be the worst case size of the relation as is.

Is there a way we can test that spark is receiving and using this value 
correctly here?


http://gerrit.cloudera.org:8080/#/c/14107/1/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/14107/1/src/kudu/master/catalog_manager.h@638
PS1, Line 638:   Status GetTableStatistics(const GetTableStatisticsRequestPB* 
req,
Can you add a test for this?


http://gerrit.cloudera.org:8080/#/c/14107/1/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/14107/1/src/kudu/master/master_service.cc@415
PS1, Line 415: void MasterServiceImpl::GetTableStatistics(const 
GetTableStatisticsRequestPB* req,
Can you add a test for this? Maybe in master-test.cc



--
To view, visit http://gerrit.cloudera.org:8080/14107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7742a76708f989b0ccc8ba417f3390013e260175
Gerrit-Change-Number: 14107
Gerrit-PatchSet: 1
Gerrit-Owner: ZhangYao <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 20 Aug 2019 15:55:33 +0000
Gerrit-HasComments: Yes

Reply via email to