Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py
File tests/stress/test_insert_stress.py:

http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81
PS5, Line 81:   @SkipIfGCS.jira(reason="IMPALA-10563")
> Yeah, in the time out period (600s), only half of the inserts finish. I'm t
Sorry that I found it's not just a slow down. I found a dead-loop in catalogd 
due to calling RemoteIterator#hasNext() in 
FileSystemUtil$FilterIterator#hasNext(). It seems the iterator implementation 
of GCS won't skip non-existing files after throwing a FileNotFoundException. 
And it keeps throwing the same exception for the same file in the next call of 
hasNext().

This happens when concurrent inserts to the same table. Some transient tmp 
files will be removed after an Insert finish, which causes file listing of 
other Inserts throw FileNotFoundException. The HDFS implementation is able to 
skip them in the next call of hasNext(), but GCS can't.

I'm trying to find a workaround for this issue...



--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Wed, 10 Mar 2021 02:53:30 +0000
Gerrit-HasComments: Yes

Reply via email to