Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14311 )
Change subject: IMPALA-8950: Add -d, -f options to hdfs copyFromLocal, put, cp ...................................................................... Patch Set 4: (2 comments) Thanks for taking this on. Some nits about a couple things, but I think this is looking good. http://gerrit.cloudera.org:8080/#/c/14311/4/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/14311/4/tests/common/impala_test_suite.py@179 PS4, Line 179: # There are multiple clients for interacting with the underlying storage service. : # : # There are two main types of clients: HTTP clients and CLI clients. CLI clients all : # use the 'hdfs dfs' CLI to execute operations againt a target filesystem. HTTP : # clients issue HTTP requests to execute operations and are filesystem specific. For : # HDFS, the HTTP client uses WebHDFS. : # : # 'hdfs_client' is a wrapper around a HTTP client and CLI client for interacting : # with HDFS. The 'hdfs_client' delegates to the HTTP client when possible, and for : # operations not supported by the HTTP client, it delegates to the CLI client. The : # 'hdfs_client' is specific to HDFS and always points to the local HDFS cluster. : # : # 'filesystem_client' is set depending on the value of the 'TARGET_FILESYSTEM'. For : # HDFS, it is the same as the 'hdfs_client'. For S3 and and ABFS, the client is a : # HadoopFsCommandLineClient which is a simple wrapper around 'hdfs dfs' commands. : # For ADLS, the 'filesystem_client' is an ADLSClient. These are mostly nits, but here goes: - I want to emphasize that 'filesystem_client' is the right thing to use and 'hdfs_client' is only for tests that run on HDFS-only. We don't want test writers to consider using hdfs_client unless they really need it. It is really only useful for HDFS ACL. - I think the HTTP vs CLI breakdown is better expressed as Hadoop CLI vs filesystem-specific library. - After thinking on this a bit, I think it might be good to have 'hdfs_client' be None when this is not HDFS. This is to discourage people from using it. - I think I would prefer ordering it to put the description of the filesystem_client up front, as I think that is the most useful part. Putting it together, here is a sketch of what I was thinking: 'filesystem_client' is a generic interface for doing filesystem operations that works across all the filesystems that Impala supports. 'filesystem_client' uses either the HDFS commandline or a filesystem-specific library to implement common HDFS operations. Etc Etc fill this in... Test writers should always use 'filesystem_client' unless they are using filesystem specific functionality (e.g. HDFS ACL). The implementation of 'filesystem_client' for each filesystem is: HDFS - uses a mixture of PyWebHdfs (which is faster than the HDFS CLI) and the HDFS CLI S3 - uses HDFS CLI ABFS - uses HDFS CLI ADLS - uses the ADLSClient (TODO: this should switch to HDFS CLI once we test it) 'hdfs_client' is an HDFS-specific client library, and it only works when running on HDFS. When using 'hdfs_client', the test must be skipped on everything other than HDFS. This is only really useful for tests that do HDFS ACL operations. This is None on non-HDFS systems. http://gerrit.cloudera.org:8080/#/c/14311/4/tests/util/hdfs_util.py File tests/util/hdfs_util.py: http://gerrit.cloudera.org:8080/#/c/14311/4/tests/util/hdfs_util.py@58 PS4, Line 58: DelegatingHdfsFilesystem Nit: The other clients all have "Client" in their name rather than Filesystem (even though they inherit from BaseFilesystem). For consistency, maybe call this DelegatingHdfsClient? -- To view, visit http://gerrit.cloudera.org:8080/14311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d45db1c00554e6fb6bcc0b552596d86d4e30144 Gerrit-Change-Number: 14311 Gerrit-PatchSet: 4 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Wed, 02 Oct 2019 00:33:07 +0000 Gerrit-HasComments: Yes
