Fang-Yu Rao has posted comments on this change. (
http://gerrit.cloudera.org:8080/19119 )
Change subject: IMPALA-11646 IMPALA-11562: Fix
test_unsupported_text_compression in s3
......................................................................
Patch Set 3:
> Patch Set 3:
>
> > Patch Set 3:
> >
> > > Patch Set 3: Code-Review+1
> > >
> > > Thanks for the prompt fix Michael! I do not have any other comment.
> > >
> > > I just like to check if my understanding in the following is correct
> > > after reading your fix and the corresponding functions.
> > >
> > > When the underlying storage is HDFS and we call
> > > filesystem_client.create_file() in test_unsupported_text_compression(),
> > > DelegatingHdfsClient::create_file() is called. In this case,
> > > DelegatingHdfsClient::_normalize_path() will be invoked.
> > > _normalize_path() removes the leading slash in the given path if there is
> > > any and then webhdfs_client.create_file() will be used to create the file.
> > >
> > > In your patch for IMPALA-11562, you removed the leading slash before
> > > calling filesystem_client.create_file(), which still works when the
> > > underlying storage is HDFS but doesn't work in s3 because
> > > HadoopFsCommandLineClient("S3") behaves a bit differently, i.e., it
> > > expects that leading slash, and thus this test failed.
> >
> > My understanding above is not completely accurate. IMPALA-11562 added that
> > _normalize_path() in DelegatingHdfsClient. Thus
> > test_unsupported_text_compression() has to be changed accordingly to pass
> > the test in s3. IMPALA-11562 did not change
> > test_unsupported_text_compression().
> >
> > Now I seem to understand this part of the Python test more. :-)
>
> Yes, tests were written to remove the leading slash, because
> DelegatingHdfsClient expected a path without slash, while the
> HadoopFsCommandLineClient would normalize the path by adding a leading slash
> if it was missing.
>
> For the Ozone work I started also using URIs with a schema, which would have
> complicated the HadoopFsCommandLineClient logic. I decided to invert what
> "normal" is, so you should now use absolute paths and DelegatingHdfsClient
> will normalize the path (by removing the leading slash) to fit the
> expectations of PyWebHdfsClient.
Thanks for the detailed explanation Michael!
--
To view, visit http://gerrit.cloudera.org:8080/19119
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74724b2d5af0bd879d92d5e5febe92d01c972525
Gerrit-Change-Number: 19119
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Comment-Date: Wed, 12 Oct 2022 16:56:51 +0000
Gerrit-HasComments: No