Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11798 )

Change subject: [docs] add Hive Metastore integration
......................................................................


Patch Set 5:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc
File docs/hive_metastore.adoc:

http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@38
PS5, Line 38:
            : ## Databases and Table Names
Are there column name restrictions that are worth mentioning?


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@42
PS5, Line 42:
nit: not sure if jekyll will get rid of this, but there's an extra space here


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@45
PS5, Line 45: tables must belong to a database
nit: since there's still no such thing as a database in Kudu, maybe "the table 
name must indicate the table's membership of a Hive database"? Not sure how far 
we want to go in emphasizing that databases are a Hive thing and not a Kudu 
thing.


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@46
PS5, Line 46: identifiers are
nit: maybe call out, "...identifiers (i.e. the table name and database name) 
are..."? I know we use it in our codebase, but I'm not sure "table name 
identifier" is common parlance, and might be construed to just mean the table 
name and not the database name.


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@59
PS5, Line 59: application which
nit: "applications that"


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@58
PS5, Line 58: By including databases as an implicit part
            : of the Kudu table name, existing application which use Kudu 
tables can operate
            : on non-HMS-integrated and HMS-integrated table names with minimal 
or no changes.
Maybe as a note, do you think it's worth calling out the previous convention 
with Impala? I.e. that old tables were commonly prefixed with "impala::" and 
that they should be renamed?

I guess that sort of thing should come about after the Impala integration lands.


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@61
PS5, Line 61:
            : In order to create or drop Hive databases, administrators or 
users must use
            : existing Hive tools such as the Beeline Shell or Impala.
I was a little confused about why this sentence was here, so maybe reword it 
slightly:

"Kudu provides no additional tooling to create or drop Hive databases. 
Administrators or users should use existing Hive tools such as the Beeline 
Shell or Impala to do so."

Also is it worth mentioning the effects of dropping a Hive database that 
contains Kudu tables?


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@68
PS5, Line 68: table name identifier constraints
nit: maybe just "naming constraints"? Maybe the section should be called 
"Naming Constraints" too? Not sure if the term "table name identifier" has 
caught on elsewhere. WDYT?


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@70
PS5, Line 70: underscore
nit: underscores


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@73
PS5, Line 73: table name identifiers
Is this referring to both table names and database names? The configuration 
name makes me think just table names, but I'm not sure.


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@78
PS5, Line 78: Kudu will prevent tables from being created whose names only 
differ
            : by case, and table lookup when opening, altering, or dropping 
tables is
            : case-insensitive, for both the Hive database and table name 
portion of the Kudu
            : table name.
nit reword a bit:

"Kudu will prevent tables from being created when a table already exists in the 
Hive Metastore that only differs by case. Additionally, operations that open, 
alter, or drop tables are case-insensitive for both the database and table name 
portion of the Kudu table name when the integration is enabled."


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@83
PS5, Line 83: Enabling the Hive Metastore Integration
            :
There should be a step here that renames existing Kudu tables.


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@86
PS5, Line 86: On the Command Line
But this isn't necessarily on the command line? I think this is alluding to the 
expectation that Kudu vendors will provide an automated process for this (e.g. 
going through Cloudera Manager), and this is the process to enable the 
integration manually. I think we should just omit this title until that is the 
case, at which point we can say "If using Cloudera Manager, don't do any of 
this stuff".


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@101
PS5, Line 101: <property>
             :   
<name>hive.metastore.disallow.incompatible.col.type.changes</name>
             :   <value>false</value>
             : </property>
This isn't either of the listed updates to the XML. What is this doing? Can you 
add that in the bullet above?


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@108
PS5, Line 108: HIVE_AUX_JARS_PATH configuration
nit: backticks and call it an environment variable? i.e. "`HIVE_AUX_JARS_PATH` 
environment variable"?

Also, in other docs, I think we've provided sample bash lines; that might be 
helpful here. Is there a step missing where the user downloads `kudu-hive.jar` 
from somewhere? Or builds it locally? What is the recommendation for getting 
the JAR?


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@116
PS5, Line 116: HMS Thrift URI(s)
For parity, "match hive.metastore.uris"? At least from here: 
https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_metastore_configure.html#topic_18_4


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@121
PS5, Line 121:
             : ## Upgrading Existing Tables
I think these sections should be reordered:

Current order:
1. Databases and Table Names
2. Enabling the Hive Metastore Integration
3. Upgrading Existing Tables
4. Administrative Tools

I'd propose:
1. Databases and Table Names
2. Upgrading Existing Tables
3. Administrative Tools
4. Enabling the Hive Metastore Integration

I think this order makes more sense because it reflects the order of 
understanding I'd expect an administrator to want to go through:
1. understand conceptually what changes will have to be made
2. understand physically what changes need to be made
3. understand how to make those changes
4. turn on the thing that depended on those changes


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@125
PS5, Line 125:  ensure the tables are
             : appropriately added to the Hive Metastore catalog
Is this implying that administrators need to add the tables themselves, or is 
renaming sufficient? If the latter, maybe preface this by saying, "Upon 
starting up, Kudu will automatically synchronize its catalog with the Hive 
Metastore. As such, it is important that, before enabling the integration, Kudu 
tables are renamed to abide by the Hive naming conventions."

If you do go with my below suggestion of moving "Administrative tools" higher 
up, I think this preface should be moved into that section as a preface to 
"Administrative tools" section, since it describes the preconditions we should 
have before enabling the integration and how administrative tools can help.


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@136
PS5, Line 136: Table names that only differ by case will prevent the master(s) 
to start.
nit reword a bit: "Kudu tables with names that conflict only by case with a 
table entry in the Hive Metastore will prevent Kudu masters from starting up 
when the integration is enabled."


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@140
PS5, Line 140: Administrators
nit: maybe "administrative" instead? also capitalize "tools" to match other 
sections, i.e. "Administrative Tools"


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@151
PS5, Line 151:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@150
PS5, Line 150: The tool will
             : make suggestions on how to fix any inconsistencies that are 
found.
I'm guessing this tool must be run while Kudu is online, right? And so the 
first step in the upgrade process should be to run this tool with the 
integration disabled?


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@158
PS5, Line 158: automatically-fixable issues.
nit: perhaps give an example? "for instance, by renaming a Kudu table"?


http://gerrit.cloudera.org:8080/#/c/11798/5/docs/hive_metastore.adoc@159
PS5, Line 159: automatic fix fails
How can a fix fail? Will a failed fix actually change the state of a cluster? 
Maybe an example here would help here too?

Or does this mean to say, "when no automatic fix is available"?



--
To view, visit http://gerrit.cloudera.org:8080/11798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12939c8f2245450ad46898c2050451b090c7ea01
Gerrit-Change-Number: 11798
Gerrit-PatchSet: 5
Gerrit-Owner: Dan Burkert <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alex Rodoni <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 10 Dec 2018 23:03:59 +0000
Gerrit-HasComments: Yes

Reply via email to