Balazs Jeszenszky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10339 )

Change subject: IMPALA-6987: [DOCS] Update when INVALIDATE METADATA is required
......................................................................


Patch Set 1:

(18 comments)

IMO the page overall needs a wider cleanup, commented on that too.

http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml
File docs/topics/impala_invalidate_metadata.xml:

http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@47
PS1, Line 47: relatively
replace: very


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@48
PS1, Line 48: in the common scenario of adding new data files to an existing 
table
replace: whenever possible (link to INVALIDATE vs. REFRESH usage page, if 
exists)


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@59
PS1, Line 59: By default
replace: If there is no table specified


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@60
PS1, Line 60: Even for a single table, <codeph>INVALIDATE METADATA</codeph> is 
more expensive
            :       than <codeph>REFRESH</codeph>, so prefer 
<codeph>REFRESH</codeph> in the common case where you add new data
            :       files for an existing table.
Same thing mentioned ~10 lines above - remove?


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@69
PS1, Line 69: Therefore, if some other entity modifies information used by 
Impala in the metastore
            :       that Impala and Hive share, the information cached by 
Impala must be updated. However, this does not mean
            :       that all metadata updates require an Impala update.
This is vague, explicitly state when is manual invalidate needed (see 
L100-102), and remove this.


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@74
PS1, Line 74: <note>
            :       <p 
conref="../shared/impala_common.xml#common/catalog_server_124"/>
            :       <p rev="1.2">
            :         In Impala 1.2 and higher, a dedicated daemon 
(<cmdname>catalogd</cmdname>) broadcasts DDL changes made
            :         through Impala to all Impala nodes. Formerly, after you 
created a database or table while connected to one
            :         Impala node, you needed to issue an <codeph>INVALIDATE 
METADATA</codeph> statement on another Impala node
            :         before accessing the new database or table from the other 
node. Now, newly created or altered objects are
            :         picked up automatically by all Impala nodes. You must 
still use the <codeph>INVALIDATE METADATA</codeph>
            :         technique after creating or altering objects through 
Hive. See
            :         <xref href="impala_components.xml#intro_catalogd"/> for 
more information on the catalog service.
            :       </p>
            :       <p>
            :         The <codeph>INVALIDATE METADATA</codeph> statement is new 
in Impala 1.1 and higher, and takes over some of
            :         the use cases of the Impala 1.0 <codeph>REFRESH</codeph> 
statement. Because <codeph>REFRESH</codeph> now
            :         requires a table name parameter, to flush the metadata 
for all tables at once, use the <codeph>INVALIDATE
            :         METADATA</codeph> statement.
            :       </p>
            :       <p 
conref="../shared/impala_common.xml#common/invalidate_then_refresh"/>
            :     </note>
This section is very outdated and mixes up usage of old vs. new usage of 
REFRESH, remove.


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@99
PS1, Line 99: instance
Don't mention individual instances in this context, it's the service as a whole 
that needs update.


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@100
PS1, Line 100: required if a change is made from another 
<codeph>impalad</codeph>
             :       instance in your cluster, or through Hive and is 
distributed by
             :         <codeph>catalogd</codeph>.
INVALIDATE METADATA is required when:
* metadata changes of existing tables are done outside of Impala
* new tables are added outside of Impala, that are to be used by Impala
* SERVER or DATABASE level Sentry privileges are changed outside of Impala
* block metadata changes outside of Impala, but files remain the same (HDFS 
rebalance)
* possibly when UDF jars change (needs verification)

All metadata changes go through catalogd and are distributed by statestored.
'Outside of Impala' for table metadata means Hive and any other Hive client, 
e.g. SparkSQL.


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@106
PS1, Line 106: same Impala node
This is pre-1.2 information. No INVALIDATE is needed as long as the changes are 
done through the Impala service (using any impalad).


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@127
PS1, Line 127: <codeph>INVALIDATE METADATA</codeph> causes the metadata for 
that table to be marked as stale, and reloaded
             :       the next time the table is referenced. For a huge table, 
that process could take a noticeable amount of time;
Repeat of L44-47


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@129
PS1, Line 129: thus you might prefer to use <codeph>REFRESH</codeph> where 
practical, to avoid an unpredictable delay later,
             :       for example if the next reference to the table is during a 
benchmark test.
Key information missing: use REFRESH after invalidating a specific table to 
separate the metadata load from the first query that's run against that table.

No need to mention this here IMO, remove.


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@137
PS1, Line 137: (such as SequenceFile or HBase tables)
remove


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@139
PS1, Line 139: DESCRIBE
If L129-130 is left in, use REFRESH to make recommendation consistent.


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@154
PS1, Line 154: <p 
conref="../shared/impala_common.xml#common/permissions_blurb"/>
             :     <p rev="">
             :       The user ID that the <cmdname>impalad</cmdname> daemon 
runs under,
             :       typically the <codeph>impala</codeph> user, must have 
execute
             :       permissions for all the relevant directories holding table 
data.
             :       (A table could have data spread across multiple 
directories,
             :       or in unexpected paths, if it uses partitioning or
             :       specifies a <codeph>LOCATION</codeph> attribute for
             :       individual partitions or the entire table.)
             :       Issues with permissions might not cause an immediate error 
for this statement,
             :       but subsequent statements such as <codeph>SELECT</codeph>
             :       or <codeph>SHOW TABLE STATS</codeph> could fail.
             :     </p>
Not specific to INVALIDATE, remove.


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@171
PS1, Line 171: By default, the <codeph>INVALIDATE METADATA</codeph> command 
checks HDFS permissions of the underlying data
             :       files and directories, caching this information so that a 
statement can be cancelled immediately if for
             :       example the <codeph>impala</codeph> user does not have 
permission to write to the data directory for the
             :       table. (This checking does not apply when the 
<cmdname>catalogd</cmdname> configuration option
             :       <codeph>--load_catalog_in_background</codeph> is set to 
<codeph>false</codeph>, which it is by default.)
Sentence and bracketed part is contradictory, clarify.


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@176
PS1, Line 176: Impala reports any lack of write permissions as an 
<codeph>INFO</codeph> message in the log file, in case
             :       that represents an oversight.
Is this INVALIDATE specific at all?


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@178
PS1, Line 178: INVALIDATE METADATA
Verify, wouldn't REFRESH do?


http://gerrit.cloudera.org:8080/#/c/10339/1/docs/topics/impala_invalidate_metadata.xml@186
PS1, Line 186: The ability to specify <codeph>INVALIDATE METADATA
             :       <varname>table_name</varname></codeph> for a table created 
in Hive is a new capability in Impala 1.2.4. In
             :       earlier releases, that statement would have returned an 
error indicating an unknown table, requiring you to
             :       do <codeph>INVALIDATE METADATA</codeph> with no table 
name, a more expensive operation that reloaded metadata
             :       for all tables and databases.
Remove, already mentioned above.



--
To view, visit http://gerrit.cloudera.org:8080/10339
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2124e14900d0f82569c061cc46006447bb054b36
Gerrit-Change-Number: 10339
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Rodoni <[email protected]>
Gerrit-Reviewer: Balazs Jeszenszky <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Wed, 09 May 2018 12:01:18 +0000
Gerrit-HasComments: Yes

Reply via email to