Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10934 )
Change subject: hms-tool: filter non-Kudu tables in the HMS ...................................................................... Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/10934/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10934/2//COMMIT_MSG@15 PS2, Line 15: The combination of these APIs : should be significantly more efficient than issuing a get for every : single table in the HMS and doing Kudu-side filtering. > Sounds great. Is there anything you can measure to confirm the efficiency g I don't have numbers yet, because I haven't started stress-testing yet. There are theoretical changes in the number of RPCs sent to the HMS patch, though: before: RetrieveTables executes 1 request + 1 request per datatabase in the HMS + 1 request per table in the HMS, including fetching all Hive table objects. This can be a large amount of data, since parquet tables can have thousands of partitions, each of which has non-negligable data associated. after: GetKuduTables executes 1 request + 2 requests per databases. Only Kudu table objects are retrieved, which don't have partitions. http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog-test.cc File src/kudu/hms/hms_catalog-test.cc: http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog-test.cc@429 PS2, Line 429: hive::Table > Nit: auto Done http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog.cc File src/kudu/hms/hms_catalog.cc: http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog.cc@219 PS2, Line 219: const auto & > Nit: const auto& Done http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_client.h File src/kudu/hms/hms_client.h: http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_client.h@179 PS2, Line 179: // Retrieves HMS table metadata for many tables. > Nit: how about "for all tables listed in 'table_names'", to be more precise Done http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc File src/kudu/tools/tool_action_hms.cc: http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc@a378 PS2, Line 378: : > Nit: What do you have against passing smart pointers by cref? I find the double indirection confusing while reading code, and there's arguably a runtime cost. Mostly that it's just confusing, though. const T& and T* are unambiguous in that the first means only const methods may be called, while the second typically means non-const methods will be called. const unique_ptr<T>& completely breaks this pattern. Additionally, we sometimes use crefs to smart pointers to represent optional values, but that's not the case here. http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc@299 PS2, Line 299: isSynced > Nit: since you're already in the area, this should be IsSynced. Done -- To view, visit http://gerrit.cloudera.org:8080/10934 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b Gerrit-Change-Number: 10934 Gerrit-PatchSet: 2 Gerrit-Owner: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Fri, 13 Jul 2018 00:27:48 +0000 Gerrit-HasComments: Yes