Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11027 )
Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded ...................................................................... IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded Prior to this patch, when a table is first loaded, the catalog iterated over each of the partition directories and called getFileStatus() on each, serially, to determine the overall access level of the table. In some testing, each such call took 1-2ms, so this could add many seconds to the overall table load time for a table with thousands of partitions and also add to the NN load. This patch adds some batch pre-fetching of file status information: for any parent directory which contains more than one partition, we use the listStatus() API to fetch the FileStatus objects in bulk. A new unit test verifies the number of API calls made to the NameNode during a table load. Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734 Reviewed-on: http://gerrit.cloudera.org:8080/11027 Tested-by: Impala Public Jenkins <[email protected]> Reviewed-by: Todd Lipcon <[email protected]> --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/util/FsPermissionCache.java M fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java 4 files changed, 218 insertions(+), 59 deletions(-) Approvals: Impala Public Jenkins: Verified Todd Lipcon: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/11027 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734 Gerrit-Change-Number: 11027 Gerrit-PatchSet: 9 Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
