Tired of waiting 75 seconds in front of Thunar when listing a directory of 7500 
symbolic links over a CIFS share, I tried to improve performances.

I noticed Thunar (as well as pretty all Linux file managers) perform network or 
disk intensive stat(2)/lstat(2)/access(2) and readlink(2) operations on every 
file or directory before listing them, even though most of the info it needs to 
get a proper icon and sort order can be got from the d_type byte in struct 
dirent (see readdir(3)) and from the file name.

First, I wrote a set of patches for Thunar-1.0.2, which is basically what I had 
available on my CentOS 5.x system. Then, I discovered that Thunar-1.2 I/O 
subsystem had been dramatically changed to use the GIO virtual file system 
rather than thunar-vfs.

Not wanting to uselessly fork the Thunar project (because IMO, forking is 
harmful to the OSS community), I wrote similar patches for Thunar-1.2, and, I 
wonder whether they can be included in the mainline Thunar.

Patches overview:
First, I had to patch the standard file system backend of GIO of GLIB 2.28, 
because, GIO performed stat(2)/lstat(2) operations even when requested 
information didn't need that.
I wrote a patch to GIO that make it lazy. 
stat(2)/lstat(2)/readlink(2)/access(2) operations are only performed when 
required. Moreover, special optimization has been done to reduce the number of 
access(2) calls. For example, in a directory where most files are 
read-executable, it will perform only access(R_OK|X_OK) and access(W_OK) calls 
on most files, which is 2 syscalls rather than 3.

Secondly, I patched Thunar, as follows:
1) Changed the ThunarFile implementation to only compute information that's 
"required", and, as more information is needed, complete this information.

2) When a directory is loaded, only basic information is retrieved: File type 
(directory/regular-file/symlink/special-inode), fast content type without 
following symlinks. This is very fast as it requires very little I/O. On the 
big CIFS directory: 75 seconds -> 2 or 3 seconds.
This is not very correct for symlinks, as it doesn't even know whether it 
points to a file/special-inode or directory. So, we go to point (3).

3) When the file is actually viewed, a background thread computes more info 
(and so, follow symlinks), and, after a few milliseconds, the icon is changed. 
If a symlink is discovered to point to a directory, it's inserted as 
subdirectory in the tree view.

4) When a file or set of files is selected, their real content-type is computed 
in order to show a correct context menu, although, not everything is computed 
in some cases (e.g. If there's a file + a directory in the selection, it knows 
that the only verb is "open").

Consequently, the behavior of double-click or context menu is not changed. Only 
icons may be "incorrect" a short amount of time.

5) The side tree view was extremely slow in some cases. It could freeze Thunar 
for several minutes. This is because Thunar wanted to know if each directory 
visible in the tree view had any subdirectory (following symlinks) in order to 
display a little cross to be able to expand the directory and view the 
subdirectories. This was performed in a background thread, but, on I/O bound 
systems, could slow down extremely all other I/O operations.

Actually, this was the "bug" that made me initially write this set of patches.

I changed that to make it behave like Nautilus: Don't enter subdirectories 
until the cross is clicked, in that case, if no subdirectory is found, just 
make the cross disappear... This is a "feature regression", but I may update 
the patch to make something fast and correct most of the time: Seek a 
subdirectory, parsing a limited number (e.g. one hundred) of sub-files at most, 
and stop as soon as a true subdirectory (not symlink) is found. In doubt, 
assume the directory may have subdirectories.

I may also provide a user preference to balance between performances and 
correctness.
At the highest level of correctness, it would behave as the old Thunar 
(although twice as fast because, a "bug" in the folder listing function would 
make everything listed/stat-ed twice).

6) I fixed a few performance bugs. For example, when viewing a directory, it 
was sorted with a O(n^2) algorithm because the dir was initially listed as 
empty, and files, after having been listed in a background job, were seen as 
dynamically added files.

7) IIRC, I fixed a bug that made Thunar crash when seeing broken mount points 
(e.g. FUSE file system where the user-space process crashed).

My code is neither very pretty nor very commented, but I can improve the code 
quality before submitting the patches. I'd prefer to know whether Thunar 
maintainers would accept the patches befor cleaning them up.

I hope that the philosophy of correctness of Thunar don't prevent pure 
performance patches like these ones, to be included.

-- 
André Gillibert
_______________________________________________
Thunar-dev mailing list
Thunar-dev@xfce.org
https://mail.xfce.org/mailman/listinfo/thunar-dev

Reply via email to