Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19119 )

Change subject: IMPALA-11646 IMPALA-11562: Fix 
test_unsupported_text_compression in s3
......................................................................


Patch Set 3:

> Patch Set 3:
>
> > Patch Set 3:
> >
> > > Patch Set 3: Code-Review+1
> > >
> > > Thanks for the prompt fix Michael! I do not have any other comment.
> > >
> > > I just like to check if my understanding in the following is correct 
> > > after reading your fix and the corresponding functions.
> > >
> > > When the underlying storage is HDFS and we call 
> > > filesystem_client.create_file() in test_unsupported_text_compression(), 
> > > DelegatingHdfsClient::create_file() is called. In this case, 
> > > DelegatingHdfsClient::_normalize_path() will be invoked. 
> > > _normalize_path() removes the leading slash in the given path if there is 
> > > any and then webhdfs_client.create_file() will be used to create the file.
> > > 
> > > In your patch for IMPALA-11562, you removed the leading slash before 
> > > calling filesystem_client.create_file(), which still works when the 
> > > underlying storage is HDFS but doesn't work in s3 because 
> > > HadoopFsCommandLineClient("S3") behaves a bit differently, i.e., it 
> > > expects that leading slash, and thus this test failed.
> >
> > My understanding above is not completely accurate. IMPALA-11562 added that 
> > _normalize_path() in DelegatingHdfsClient. Thus 
> > test_unsupported_text_compression() has to be changed accordingly to pass 
> > the test in s3. IMPALA-11562 did not change 
> > test_unsupported_text_compression().
> >
> > Now I seem to understand this part of the Python test more. :-)
>
> Yes, tests were written to remove the leading slash, because 
> DelegatingHdfsClient expected a path without slash, while the 
> HadoopFsCommandLineClient would normalize the path by adding a leading slash 
> if it was missing.
>
> For the Ozone work I started also using URIs with a schema, which would have 
> complicated the HadoopFsCommandLineClient logic. I decided to invert what 
> "normal" is, so you should now use absolute paths and DelegatingHdfsClient 
> will normalize the path (by removing the leading slash) to fit the 
> expectations of PyWebHdfsClient.

Thanks for the detailed explanation Michael!


--
To view, visit http://gerrit.cloudera.org:8080/19119
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74724b2d5af0bd879d92d5e5febe92d01c972525
Gerrit-Change-Number: 19119
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Comment-Date: Wed, 12 Oct 2022 16:56:51 +0000
Gerrit-HasComments: No

Reply via email to