Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17171 )
Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil ...................................................................... Patch Set 3: When I loop over tests/stress/test_insert_stress.py to run it several times, I found the inserts are more easy to fail by TableLoadingException. And then lead the table metadata to a bad state. The exception is E0311 11:59:42.633015 20506 ParallelFileMetadataLoader.java:166] Refreshing file and block metadata for 1 paths for table test_inserts_ab08196b.test_concurrent_inserts encountered an error loading data for path hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts Java exception follows: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/5e4afdf5a6978311_9f9d0baa00000000 does not exist. at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513) at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:86) at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:163) at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:115) at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:747) at org.apache.impala.catalog.HdfsTable.updateUnpartitionedTableFileMd(HdfsTable.java:1296) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1182) at org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:1015) at org.apache.impala.service.CatalogOpExecutor.updateCatalog(CatalogOpExecutor.java:4808) at org.apache.impala.service.JniCatalog.updateCatalog(JniCatalog.java:327) Caused by: java.io.FileNotFoundException: File hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/5e4afdf5a6978311_9f9d0baa00000000 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1273) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1247) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1192) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1188) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1206) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2126) at org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2314) at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2291) at org.apache.impala.common.FileSystemUtil$FilterIterator.hasNext(FileSystemUtil.java:813) at org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:202) at org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$loadInternal$1(ParallelFileMetadataLoader.java:157) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:322) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:66) at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36) at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:157) ... 7 more The reason is that we have removed the catching of FileNotFoundException in FilterIterator. When listing files with locations, we use FileSystem#listFiles() which returns a RemoteIterator similar to our RecursingIterator except the handling of non-exisitng subdir in its hasNext(). To make the file listing with location more robust as well, PS3 use our RecursingIterator when fs is not S3. -- To view, visit http://gerrit.cloudera.org:8080/17171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea Gerrit-Change-Number: 17171 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Thu, 11 Mar 2021 07:38:48 +0000 Gerrit-HasComments: No