Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17171 )

Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
......................................................................


Patch Set 3:

When I loop over tests/stress/test_insert_stress.py to run it several times, I 
found the inserts are more easy to fail by TableLoadingException. And then lead 
the table metadata to a bad state. The exception is

E0311 11:59:42.633015 20506 ParallelFileMetadataLoader.java:166] Refreshing 
file and block metadata for 1 paths for table 
test_inserts_ab08196b.test_concurrent_inserts encountered an error loading data 
for path 
hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts
Java exception follows:
java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File 
hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/5e4afdf5a6978311_9f9d0baa00000000
 does not exist.
        at 
com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
        at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
        at 
com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:86)
        at 
org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:163)
        at 
org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:115)
        at 
org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:747)
        at 
org.apache.impala.catalog.HdfsTable.updateUnpartitionedTableFileMd(HdfsTable.java:1296)
        at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1182)
        at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:1015)
        at 
org.apache.impala.service.CatalogOpExecutor.updateCatalog(CatalogOpExecutor.java:4808)
        at 
org.apache.impala.service.JniCatalog.updateCatalog(JniCatalog.java:327)
Caused by: java.io.FileNotFoundException: File 
hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/5e4afdf5a6978311_9f9d0baa00000000
 does not exist.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1273)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1247)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1192)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1188)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1206)
        at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2126)
        at 
org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2314)
        at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2291)
        at 
org.apache.impala.common.FileSystemUtil$FilterIterator.hasNext(FileSystemUtil.java:813)
        at 
org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:202)
        at 
org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$loadInternal$1(ParallelFileMetadataLoader.java:157)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at 
com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:322)
        at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
        at 
com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:66)
        at 
com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
        at 
org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:157)
        ... 7 more

The reason is that we have removed the catching of FileNotFoundException in 
FilterIterator. When listing files with locations, we use 
FileSystem#listFiles() which returns a RemoteIterator similar to our 
RecursingIterator except the handling of non-exisitng subdir in its hasNext(). 
To make the file listing with location more robust as well, PS3 use our 
RecursingIterator when fs is not S3.


--
To view, visit http://gerrit.cloudera.org:8080/17171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
Gerrit-Change-Number: 17171
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Thu, 11 Mar 2021 07:38:48 +0000
Gerrit-HasComments: No

Reply via email to