Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20386 )

Change subject: IMPALA-12389: Use -skipTrash for 
HadoopFsCommandLineClient::delete_file_dir
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20386/10/tests/conftest.py
File tests/conftest.py:

http://gerrit.cloudera.org:8080/#/c/20386/10/tests/conftest.py@410
PS10, Line 410:     if request.instance.filesystem_client.exists(db_location):
> I'm unsure how adding this exists check will speed up the delete.  Can we d
There are two different clients. pywebhdfs uses HDFS's REST API and is pure 
python. Then there is a wrapper around the hadoop commandline, which is a Java 
program. It is generally slower, as it needs to start up a JVM and load jars.

In some configurations exists() uses pywebhdfs and doesn't need to start the 
JVM. delete_file_dir() now always uses the Java client. So, if the directory 
doesn't exist (which can be very common), using a pywebhdfs exists() avoids the 
JVM startup.

As Michael noted, there is a cleaner place to put this.

When I ran this, the run time wasn't all that different from a normal run, but 
normal runs vary a bit. I'll move it to the spot Michael suggested and do a new 
upload.



--
To view, visit http://gerrit.cloudera.org:8080/20386
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2d304113596aaf70a122202a33276fc7c3d599e8
Gerrit-Change-Number: 20386
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Comment-Date: Tue, 18 Nov 2025 23:58:46 +0000
Gerrit-HasComments: Yes

Reply via email to