Balazs Jeszenszky has posted comments on this change. ( http://gerrit.cloudera.org:8080/10339 )
Change subject: IMPALA-6987: [DOCS] Update when INVALIDATE METADATA is required ...................................................................... Patch Set 1: (18 comments) IMO the page overall needs a wider cleanup, commented on that too. http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml File docs/topics/impala_invalidate_metadata.xml: http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@47 PS1, Line 47: relatively replace: very http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@48 PS1, Line 48: in the common scenario of adding new data files to an existing table replace: whenever possible (link to INVALIDATE vs. REFRESH usage page, if exists) http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@59 PS1, Line 59: By default replace: If there is no table specified http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@60 PS1, Line 60: Even for a single table, <codeph>INVALIDATE METADATA</codeph> is more expensive : than <codeph>REFRESH</codeph>, so prefer <codeph>REFRESH</codeph> in the common case where you add new data : files for an existing table. Same thing mentioned ~10 lines above - remove? http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@69 PS1, Line 69: Therefore, if some other entity modifies information used by Impala in the metastore : that Impala and Hive share, the information cached by Impala must be updated. However, this does not mean : that all metadata updates require an Impala update. This is vague, explicitly state when is manual invalidate needed (see L100-102), and remove this. http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@74 PS1, Line 74: <note> : <p conref="../shared/impala_common.xml#common/catalog_server_124"/> : <p rev="1.2"> : In Impala 1.2 and higher, a dedicated daemon (<cmdname>catalogd</cmdname>) broadcasts DDL changes made : through Impala to all Impala nodes. Formerly, after you created a database or table while connected to one : Impala node, you needed to issue an <codeph>INVALIDATE METADATA</codeph> statement on another Impala node : before accessing the new database or table from the other node. Now, newly created or altered objects are : picked up automatically by all Impala nodes. You must still use the <codeph>INVALIDATE METADATA</codeph> : technique after creating or altering objects through Hive. See : <xref href="impala_components.xml#intro_catalogd"/> for more information on the catalog service. : </p> : <p> : The <codeph>INVALIDATE METADATA</codeph> statement is new in Impala 1.1 and higher, and takes over some of : the use cases of the Impala 1.0 <codeph>REFRESH</codeph> statement. Because <codeph>REFRESH</codeph> now : requires a table name parameter, to flush the metadata for all tables at once, use the <codeph>INVALIDATE : METADATA</codeph> statement. : </p> : <p conref="../shared/impala_common.xml#common/invalidate_then_refresh"/> : </note> This section is very outdated and mixes up usage of old vs. new usage of REFRESH, remove. http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@99 PS1, Line 99: instance Don't mention individual instances in this context, it's the service as a whole that needs update. http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@100 PS1, Line 100: required if a change is made from another <codeph>impalad</codeph> : instance in your cluster, or through Hive and is distributed by : <codeph>catalogd</codeph>. INVALIDATE METADATA is required when: * metadata changes of existing tables are done outside of Impala * new tables are added outside of Impala, that are to be used by Impala * SERVER or DATABASE level Sentry privileges are changed outside of Impala * block metadata changes outside of Impala, but files remain the same (HDFS rebalance) * possibly when UDF jars change (needs verification) All metadata changes go through catalogd and are distributed by statestored. 'Outside of Impala' for table metadata means Hive and any other Hive client, e.g. SparkSQL. http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@106 PS1, Line 106: same Impala node This is pre-1.2 information. No INVALIDATE is needed as long as the changes are done through the Impala service (using any impalad). http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@127 PS1, Line 127: <codeph>INVALIDATE METADATA</codeph> causes the metadata for that table to be marked as stale, and reloaded : the next time the table is referenced. For a huge table, that process could take a noticeable amount of time; Repeat of L44-47 http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@129 PS1, Line 129: thus you might prefer to use <codeph>REFRESH</codeph> where practical, to avoid an unpredictable delay later, : for example if the next reference to the table is during a benchmark test. Key information missing: use REFRESH after invalidating a specific table to separate the metadata load from the first query that's run against that table. No need to mention this here IMO, remove. http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@137 PS1, Line 137: (such as SequenceFile or HBase tables) remove http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@139 PS1, Line 139: DESCRIBE If L129-130 is left in, use REFRESH to make recommendation consistent. http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@154 PS1, Line 154: <p conref="../shared/impala_common.xml#common/permissions_blurb"/> : <p rev=""> : The user ID that the <cmdname>impalad</cmdname> daemon runs under, : typically the <codeph>impala</codeph> user, must have execute : permissions for all the relevant directories holding table data. : (A table could have data spread across multiple directories, : or in unexpected paths, if it uses partitioning or : specifies a <codeph>LOCATION</codeph> attribute for : individual partitions or the entire table.) : Issues with permissions might not cause an immediate error for this statement, : but subsequent statements such as <codeph>SELECT</codeph> : or <codeph>SHOW TABLE STATS</codeph> could fail. : </p> Not specific to INVALIDATE, remove. http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@171 PS1, Line 171: By default, the <codeph>INVALIDATE METADATA</codeph> command checks HDFS permissions of the underlying data : files and directories, caching this information so that a statement can be cancelled immediately if for : example the <codeph>impala</codeph> user does not have permission to write to the data directory for the : table. (This checking does not apply when the <cmdname>catalogd</cmdname> configuration option : <codeph>--load_catalog_in_background</codeph> is set to <codeph>false</codeph>, which it is by default.) Sentence and bracketed part is contradictory, clarify. http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@176 PS1, Line 176: Impala reports any lack of write permissions as an <codeph>INFO</codeph> message in the log file, in case : that represents an oversight. Is this INVALIDATE specific at all? http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@178 PS1, Line 178: INVALIDATE METADATA Verify, wouldn't REFRESH do? http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@186 PS1, Line 186: The ability to specify <codeph>INVALIDATE METADATA : <varname>table_name</varname></codeph> for a table created in Hive is a new capability in Impala 1.2.4. In : earlier releases, that statement would have returned an error indicating an unknown table, requiring you to : do <codeph>INVALIDATE METADATA</codeph> with no table name, a more expensive operation that reloaded metadata : for all tables and databases. Remove, already mentioned above. -- To view, visit http://gerrit.cloudera.org:8080/10339 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2124e14900d0f82569c061cc46006447bb054b36 Gerrit-Change-Number: 10339 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Rodoni <[email protected]> Gerrit-Reviewer: Balazs Jeszenszky <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Wed, 09 May 2018 12:01:18 +0000 Gerrit-HasComments: Yes
