Quanlong Huang has posted comments on this change. (
http://gerrit.cloudera.org:8080/17171 )
Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
......................................................................
Patch Set 3:
When I loop over tests/stress/test_insert_stress.py to run it several times, I
found the inserts are more easy to fail by TableLoadingException. And then lead
the table metadata to a bad state. The exception is
E0311 11:59:42.633015 20506 ParallelFileMetadataLoader.java:166] Refreshing
file and block metadata for 1 paths for table
test_inserts_ab08196b.test_concurrent_inserts encountered an error loading data
for path
hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts
Java exception follows:
java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File
hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/5e4afdf5a6978311_9f9d0baa00000000
does not exist.
at
com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
at
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
at
com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:86)
at
org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:163)
at
org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:115)
at
org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:747)
at
org.apache.impala.catalog.HdfsTable.updateUnpartitionedTableFileMd(HdfsTable.java:1296)
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1182)
at
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:1015)
at
org.apache.impala.service.CatalogOpExecutor.updateCatalog(CatalogOpExecutor.java:4808)
at
org.apache.impala.service.JniCatalog.updateCatalog(JniCatalog.java:327)
Caused by: java.io.FileNotFoundException: File
hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/5e4afdf5a6978311_9f9d0baa00000000
does not exist.
at
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1273)
at
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1247)
at
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1192)
at
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1188)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1206)
at
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2126)
at
org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2314)
at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2291)
at
org.apache.impala.common.FileSystemUtil$FilterIterator.hasNext(FileSystemUtil.java:813)
at
org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:202)
at
org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$loadInternal$1(ParallelFileMetadataLoader.java:157)
at
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at
com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:322)
at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at
com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:66)
at
com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
at
org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:157)
... 7 more
The reason is that we have removed the catching of FileNotFoundException in
FilterIterator. When listing files with locations, we use
FileSystem#listFiles() which returns a RemoteIterator similar to our
RecursingIterator except the handling of non-exisitng subdir in its hasNext().
To make the file listing with location more robust as well, PS3 use our
RecursingIterator when fs is not S3.
--
To view, visit http://gerrit.cloudera.org:8080/17171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
Gerrit-Change-Number: 17171
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Thu, 11 Mar 2021 07:38:48 +0000
Gerrit-HasComments: No