Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14311 )

Change subject: IMPALA-8950: Add -d, -f options to hdfs copyFromLocal, put, cp
......................................................................


Patch Set 4:

(2 comments)

Thanks for taking this on. Some nits about a couple things, but I think this is 
looking good.

http://gerrit.cloudera.org:8080/#/c/14311/4/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/14311/4/tests/common/impala_test_suite.py@179
PS4, Line 179:     # There are multiple clients for interacting with the 
underlying storage service.
             :     #
             :     # There are two main types of clients: HTTP clients and CLI 
clients. CLI clients all
             :     # use the 'hdfs dfs' CLI to execute operations againt a 
target filesystem. HTTP
             :     # clients issue HTTP requests to execute operations and are 
filesystem specific. For
             :     # HDFS, the HTTP client uses WebHDFS.
             :     #
             :     # 'hdfs_client' is a wrapper around a HTTP client and CLI 
client for interacting
             :     # with HDFS. The 'hdfs_client' delegates to the HTTP client 
when possible, and for
             :     # operations not supported by the HTTP client, it delegates 
to the CLI client. The
             :     # 'hdfs_client' is specific to HDFS and always points to the 
local HDFS cluster.
             :     #
             :     # 'filesystem_client' is set depending on the value of the 
'TARGET_FILESYSTEM'. For
             :     # HDFS, it is the same as the 'hdfs_client'. For S3 and and 
ABFS, the client is a
             :     # HadoopFsCommandLineClient which is a simple wrapper around 
'hdfs dfs' commands.
             :     # For ADLS, the 'filesystem_client' is an ADLSClient.
These are mostly nits, but here goes:
 - I want to emphasize that 'filesystem_client' is the right thing to use and 
'hdfs_client' is only for tests that run on HDFS-only. We don't want test 
writers to consider using hdfs_client unless they really need it. It is really 
only useful for HDFS ACL.
 - I think the HTTP vs CLI breakdown is better expressed as Hadoop CLI vs 
filesystem-specific library.
 - After thinking on this a bit, I think it might be good to have 'hdfs_client' 
be None when this is not HDFS. This is to discourage people from using it.
 - I think I would prefer ordering it to put the description of the 
filesystem_client up front, as I think that is the most useful part.
Putting it together, here is a sketch of what I was thinking:

'filesystem_client' is a generic interface for doing filesystem operations that 
works across all the filesystems that Impala supports. 'filesystem_client' uses 
either the HDFS commandline or a filesystem-specific library to implement 
common HDFS operations. Etc Etc fill this in... Test writers should always use 
'filesystem_client' unless they are using filesystem specific functionality 
(e.g. HDFS ACL).
The implementation of 'filesystem_client' for each filesystem is:
HDFS - uses a mixture of PyWebHdfs (which is faster than the HDFS CLI) and the 
HDFS CLI
S3 - uses HDFS CLI
ABFS - uses HDFS CLI
ADLS - uses the ADLSClient (TODO: this should switch to HDFS CLI once we test 
it)

'hdfs_client' is an HDFS-specific client library, and it only works when 
running on HDFS. When using 'hdfs_client', the test must be skipped on 
everything other than HDFS. This is only really useful for tests that do HDFS 
ACL operations. This is None on non-HDFS systems.


http://gerrit.cloudera.org:8080/#/c/14311/4/tests/util/hdfs_util.py
File tests/util/hdfs_util.py:

http://gerrit.cloudera.org:8080/#/c/14311/4/tests/util/hdfs_util.py@58
PS4, Line 58: DelegatingHdfsFilesystem
Nit: The other clients all have "Client" in their name rather than Filesystem 
(even though they inherit from BaseFilesystem). For consistency, maybe call 
this DelegatingHdfsClient?



--
To view, visit http://gerrit.cloudera.org:8080/14311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d45db1c00554e6fb6bcc0b552596d86d4e30144
Gerrit-Change-Number: 14311
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Comment-Date: Wed, 02 Oct 2019 00:33:07 +0000
Gerrit-HasComments: Yes

Reply via email to