Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 6:

(4 comments)

Thanks for your quick review, Joe! Addressed your comments. Also rename some 
env vars to fit with other var names.

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc@186
PS5, Line 186:
> We'll need to double check that GCS file handles work with the file handle 
Ah, sure. I thought tests/custom_cluster/test_hdfs_fd_caching.py provide the 
coverage. But looking into codes of gcs-connector, GoogleHadoopFSInputStream 
doesn't implement the CanUnbuffer interface: 
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/905e45d58a7b331f4b590815f0e6d0706022088d/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoopFSInputStream.java#L31
So it hasn't supported unbuffer() yet.

I'll remove this flag and leave it as a follow-up work in IMPALA-10568. Also 
filed a feature request for GCS: 
https://github.com/GoogleCloudDataproc/hadoop-connectors/issues/540


http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@863
PS5, Line 863:     }
             :
             :     @Ov
> I have seen an issue like this before on older versions of the S3 connector
Good point!


http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh
File testdata/bin/load-test-warehouse-snapshot.sh:

http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh@80
PS5, Line 80:       hadoop fs -rm -r -skipTrash 
${FILESYSTEM_PREFIX}${TEST_WAREHOUSE_DIR}
> I'm assuming that this command to remove any existing warehouse works for G
Yeah, the hadoop CLI works with GCS as well.


http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py
File tests/stress/test_insert_stress.py:

http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81
PS5, Line 81:   @SkipIfGCS.jira(reason="IMPALA-10563")
> Does IMPALA-10563 consistently reproduce? Do we have any idea if it is spec
Yes, it's consistently reproducable on GCE instances, even if I use a newer 
hive version (3.1.3000.7.2.9.0-100).
I can always find exceptions in some write ids allocation in HMS's log. The 
error will be retried by it causes slow down. It seems a Hive bug for me so 
needs further investigation.

FWIW, I'm using GCE instance type n1-standard-16 (16cpu, 60GB RAM).



--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Tue, 09 Mar 2021 10:22:59 +0000
Gerrit-HasComments: Yes

Reply via email to