Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 )
Change subject: IMPALA-5429: Multi threaded block metadata loading ...................................................................... Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@221 PS7, Line 221: // are excluded because they are considered hidden from Impala's perspecitve Shrink: // Number of hidden files excluded from file metadata loading . More details at isValidDataFile(). http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@226 PS7, Line 226: // Number of files skipped while computeing the block metadata information. Shrink: // Number of files skipped from file metadata loading because the the files have not changed since the last load. More details at hasFileChanged(). http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@248 PS7, Line 248: // If set to true, reloads the block metadata only when the files in this path reloads the file metadata (let's try to use the new "file metadata" terminology consistently) http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@249 PS7, Line 249: // has changed since last load (more details in hasFileChanged()). have changed http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@785 PS7, Line 785: * much higher throughput of RPC calls for listStatus/listFiles. Based on our ... RPC calls for listStatus/listFiles. For simplicity, the filesystem type is determined based on the table's root path and not for each partition individually. Based on our experiments, S3 showed ... http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@791 PS7, Line 791: * This method is not optimized for tables with mixed partition types (partitions mapped I don'd think we need this much detail, how about folding a simple sentence into the existing paragraph above, see suggestion above. http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@794 PS7, Line 794: * This may not work well with the mixed partition types and needs to fixed. Don't think this is needed. http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@819 PS7, Line 819: // For tables without partitions, we have no block metadata to load. remove "," http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@841 PS7, Line 841: LOG.error("Encountered an error loading block metadata for table: " + Could this flood the log when HDFS is under pressure or there is an intermittent Kerberos issue? I'm thinking we should aggregate these into a single log message. Do you think having every single path helps with supportability? My feeling is it's more important to note how many tasks failed, but I'll defer to you if you think having the paths is useful. -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Sat, 21 Oct 2017 00:12:39 +0000 Gerrit-HasComments: Yes
