Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10934 )

Change subject: hms-tool: filter non-Kudu tables in the HMS
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10934/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10934/2//COMMIT_MSG@15
PS2, Line 15: The combination of these APIs
            : should be significantly more efficient than issuing a get for 
every
            : single table in the HMS and doing Kudu-side filtering.
> Sounds great. Is there anything you can measure to confirm the efficiency g
I don't have numbers yet, because I haven't started stress-testing yet.  There 
are theoretical changes in the number of RPCs sent to the HMS patch, though:

before: RetrieveTables executes 1 request + 1 request per datatabase in the HMS 
+ 1 request per table in the HMS, including fetching all Hive table objects.  
This can be a large amount of data, since parquet tables can have thousands of 
partitions, each of which has non-negligable data associated.

after: GetKuduTables executes 1 request + 2 requests per databases.  Only Kudu 
table objects are retrieved, which don't have partitions.


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog-test.cc
File src/kudu/hms/hms_catalog-test.cc:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog-test.cc@429
PS2, Line 429: hive::Table
> Nit: auto
Done


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog.cc@219
PS2, Line 219: const auto &
> Nit: const auto&
Done


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_client.h
File src/kudu/hms/hms_client.h:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_client.h@179
PS2, Line 179:   // Retrieves HMS table metadata for many tables.
> Nit: how about "for all tables listed in 'table_names'", to be more precise
Done


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc
File src/kudu/tools/tool_action_hms.cc:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc@a378
PS2, Line 378:
             :
> Nit: What do you have against passing smart pointers by cref?
I find the double indirection confusing while reading code, and there's 
arguably a runtime cost.  Mostly that it's just confusing, though.  const T& 
and T* are unambiguous in that the first means only const methods may be 
called, while the second typically means non-const methods will be called.  
const unique_ptr<T>& completely breaks this pattern.  Additionally, we 
sometimes use crefs to smart pointers to represent optional values, but that's 
not the case here.


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc@299
PS2, Line 299: isSynced
> Nit: since you're already in the area, this should be IsSynced.
Done



--
To view, visit http://gerrit.cloudera.org:8080/10934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
Gerrit-Change-Number: 10934
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Fri, 13 Jul 2018 00:27:48 +0000
Gerrit-HasComments: Yes

Reply via email to