Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )
Change subject: IMPALA-7712: Support Google Cloud Storage ...................................................................... Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py File tests/stress/test_insert_stress.py: http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81 PS5, Line 81: @SkipIfGCS.jira(reason="IMPALA-10563") > Yeah, in the time out period (600s), only half of the inserts finish. I'm t Sorry that I found it's not just a slow down. I found a dead-loop in catalogd due to calling RemoteIterator#hasNext() in FileSystemUtil$FilterIterator#hasNext(). It seems the iterator implementation of GCS won't skip non-existing files after throwing a FileNotFoundException. And it keeps throwing the same exception for the same file in the next call of hasNext(). This happens when concurrent inserts to the same table. Some transient tmp files will be removed after an Insert finish, which causes file listing of other Inserts throw FileNotFoundException. The HDFS implementation is able to skip them in the next call of hasNext(), but GCS can't. I'm trying to find a workaround for this issue... -- To view, visit http://gerrit.cloudera.org:8080/17121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Gerrit-Change-Number: 17121 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Wed, 10 Mar 2021 02:53:30 +0000 Gerrit-HasComments: Yes
