Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11798 )
Change subject: [docs] add Hive Metastore integration ...................................................................... Patch Set 5: (24 comments) http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc File docs/hive_metastore.adoc: http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@38 PS5, Line 38: : ## Databases and Table Names Are there column name restrictions that are worth mentioning? http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@42 PS5, Line 42: nit: not sure if jekyll will get rid of this, but there's an extra space here http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@45 PS5, Line 45: tables must belong to a database nit: since there's still no such thing as a database in Kudu, maybe "the table name must indicate the table's membership of a Hive database"? Not sure how far we want to go in emphasizing that databases are a Hive thing and not a Kudu thing. http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@46 PS5, Line 46: identifiers are nit: maybe call out, "...identifiers (i.e. the table name and database name) are..."? I know we use it in our codebase, but I'm not sure "table name identifier" is common parlance, and might be construed to just mean the table name and not the database name. http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@59 PS5, Line 59: application which nit: "applications that" http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@58 PS5, Line 58: By including databases as an implicit part : of the Kudu table name, existing application which use Kudu tables can operate : on non-HMS-integrated and HMS-integrated table names with minimal or no changes. Maybe as a note, do you think it's worth calling out the previous convention with Impala? I.e. that old tables were commonly prefixed with "impala::" and that they should be renamed? I guess that sort of thing should come about after the Impala integration lands. http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@61 PS5, Line 61: : In order to create or drop Hive databases, administrators or users must use : existing Hive tools such as the Beeline Shell or Impala. I was a little confused about why this sentence was here, so maybe reword it slightly: "Kudu provides no additional tooling to create or drop Hive databases. Administrators or users should use existing Hive tools such as the Beeline Shell or Impala to do so." Also is it worth mentioning the effects of dropping a Hive database that contains Kudu tables? http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@68 PS5, Line 68: table name identifier constraints nit: maybe just "naming constraints"? Maybe the section should be called "Naming Constraints" too? Not sure if the term "table name identifier" has caught on elsewhere. WDYT? http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@70 PS5, Line 70: underscore nit: underscores http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@73 PS5, Line 73: table name identifiers Is this referring to both table names and database names? The configuration name makes me think just table names, but I'm not sure. http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@78 PS5, Line 78: Kudu will prevent tables from being created whose names only differ : by case, and table lookup when opening, altering, or dropping tables is : case-insensitive, for both the Hive database and table name portion of the Kudu : table name. nit reword a bit: "Kudu will prevent tables from being created when a table already exists in the Hive Metastore that only differs by case. Additionally, operations that open, alter, or drop tables are case-insensitive for both the database and table name portion of the Kudu table name when the integration is enabled." http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@83 PS5, Line 83: Enabling the Hive Metastore Integration : There should be a step here that renames existing Kudu tables. http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@86 PS5, Line 86: On the Command Line But this isn't necessarily on the command line? I think this is alluding to the expectation that Kudu vendors will provide an automated process for this (e.g. going through Cloudera Manager), and this is the process to enable the integration manually. I think we should just omit this title until that is the case, at which point we can say "If using Cloudera Manager, don't do any of this stuff". http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@101 PS5, Line 101: <property> : <name>hive.metastore.disallow.incompatible.col.type.changes</name> : <value>false</value> : </property> This isn't either of the listed updates to the XML. What is this doing? Can you add that in the bullet above? http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@108 PS5, Line 108: HIVE_AUX_JARS_PATH configuration nit: backticks and call it an environment variable? i.e. "`HIVE_AUX_JARS_PATH` environment variable"? Also, in other docs, I think we've provided sample bash lines; that might be helpful here. Is there a step missing where the user downloads `kudu-hive.jar` from somewhere? Or builds it locally? What is the recommendation for getting the JAR? http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@116 PS5, Line 116: HMS Thrift URI(s) For parity, "match hive.metastore.uris"? At least from here: https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_metastore_configure.html#topic_18_4 http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@121 PS5, Line 121: : ## Upgrading Existing Tables I think these sections should be reordered: Current order: 1. Databases and Table Names 2. Enabling the Hive Metastore Integration 3. Upgrading Existing Tables 4. Administrative Tools I'd propose: 1. Databases and Table Names 2. Upgrading Existing Tables 3. Administrative Tools 4. Enabling the Hive Metastore Integration I think this order makes more sense because it reflects the order of understanding I'd expect an administrator to want to go through: 1. understand conceptually what changes will have to be made 2. understand physically what changes need to be made 3. understand how to make those changes 4. turn on the thing that depended on those changes http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@125 PS5, Line 125: ensure the tables are : appropriately added to the Hive Metastore catalog Is this implying that administrators need to add the tables themselves, or is renaming sufficient? If the latter, maybe preface this by saying, "Upon starting up, Kudu will automatically synchronize its catalog with the Hive Metastore. As such, it is important that, before enabling the integration, Kudu tables are renamed to abide by the Hive naming conventions." If you do go with my below suggestion of moving "Administrative tools" higher up, I think this preface should be moved into that section as a preface to "Administrative tools" section, since it describes the preconditions we should have before enabling the integration and how administrative tools can help. http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@136 PS5, Line 136: Table names that only differ by case will prevent the master(s) to start. nit reword a bit: "Kudu tables with names that conflict only by case with a table entry in the Hive Metastore will prevent Kudu masters from starting up when the integration is enabled." http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@140 PS5, Line 140: Administrators nit: maybe "administrative" instead? also capitalize "tools" to match other sections, i.e. "Administrative Tools" http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@151 PS5, Line 151: nit: extra space http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@150 PS5, Line 150: The tool will : make suggestions on how to fix any inconsistencies that are found. I'm guessing this tool must be run while Kudu is online, right? And so the first step in the upgrade process should be to run this tool with the integration disabled? http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@158 PS5, Line 158: automatically-fixable issues. nit: perhaps give an example? "for instance, by renaming a Kudu table"? http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@159 PS5, Line 159: automatic fix fails How can a fix fail? Will a failed fix actually change the state of a cluster? Maybe an example here would help here too? Or does this mean to say, "when no automatic fix is available"? -- To view, visit http://gerrit.cloudera.org:8080/11798 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12939c8f2245450ad46898c2050451b090c7ea01 Gerrit-Change-Number: 11798 Gerrit-PatchSet: 5 Gerrit-Owner: Dan Burkert <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alex Rodoni <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 10 Dec 2018 23:03:59 +0000 Gerrit-HasComments: Yes
