Hi André, On Wed, 16 Nov 2011 17:54:11 +0100 André Gillibert <metaentr...@gmail.com> wrote:
> Tired of waiting 75 seconds in front of Thunar when listing a > directory of 7500 symbolic links over a CIFS share, I tried to > improve performances. Cool! We often don't have the time to perform detailed benchmarking and performance optimizations so I'm happy to hear you picked this up as a challenge. > Patches overview: > First, I had to patch the standard file system backend of GIO of GLIB > 2.28, because, GIO performed stat(2)/lstat(2) operations even when > requested information didn't need that. I wrote a patch to GIO that > make it lazy. stat(2)/lstat(2)/readlink(2)/access(2) operations are > only performed when required. Moreover, special optimization has been > done to reduce the number of access(2) calls. For example, in a > directory where most files are read-executable, it will perform only > access(R_OK|X_OK) and access(W_OK) calls on most files, which is 2 > syscalls rather than 3. I've seen you reported your GIO/GLib related patches on gtk-devel-list. You'll have to talk to the GLib developers about getting them approved. > Secondly, I patched Thunar, as follows: > 1) Changed the ThunarFile implementation to only compute information > that's "required", and, as more information is needed, complete this > information. The "required" bit is a bit complicated, I guess, so I can only judge about this improvement based on real code. > 2) When a directory is loaded, only basic information is retrieved: > File type (directory/regular-file/symlink/special-inode), fast > content type without following symlinks. This is very fast as it > requires very little I/O. On the big CIFS directory: 75 seconds -> 2 > or 3 seconds. This is not very correct for symlinks, as it doesn't > even know whether it points to a file/special-inode or directory. So, > we go to point (3). Interesting and good so far. > 3) When the file is actually viewed, a background thread computes > more info (and so, follow symlinks), and, after a few milliseconds, > the icon is changed. If a symlink is discovered to point to a > directory, it's inserted as subdirectory in the tree view. Doesn't that mean things will jump up and down in the directory as you browse it? > 4) When a file or set of files is selected, their real content-type > is computed in order to show a correct context menu, although, not > everything is computed in some cases (e.g. If there's a file + a > directory in the selection, it knows that the only verb is "open"). I wouldn't want to add two many special cases where we load additional information. A first quick pass and then lazy loading additional information all at once sounds more simple to me. > Consequently, the behavior of double-click or context menu is not > changed. Only icons may be "incorrect" a short amount of time. > > 5) The side tree view was extremely slow in some cases. It could > freeze Thunar for several minutes. This is because Thunar wanted to > know if each directory visible in the tree view had any subdirectory > (following symlinks) in order to display a little cross to be able to > expand the directory and view the subdirectories. This was performed > in a background thread, but, on I/O bound systems, could slow down > extremely all other I/O operations. > > Actually, this was the "bug" that made me initially write this set of > patches. > > I changed that to make it behave like Nautilus: Don't enter > subdirectories until the cross is clicked, in that case, if no > subdirectory is found, just make the cross disappear... This is a > "feature regression", but I may update the patch to make something > fast and correct most of the time: Seek a subdirectory, parsing a > limited number (e.g. one hundred) of sub-files at most, and stop as > soon as a true subdirectory (not symlink) is found. In doubt, assume > the directory may have subdirectories. That sounds a little better and not too complicated either. Although I wonder if seeking a subdirectory won't be much faster if we query less information. Maybe that is enough optimization already? > I may also provide a user preference to balance between performances > and correctness. At the highest level of correctness, it would behave > as the old Thunar (although twice as fast because, a "bug" in the > folder listing function would make everything listed/stat-ed twice). Please don't. No option for technical feature sets like this. > 6) I fixed a few performance bugs. For example, when viewing a > directory, it was sorted with a O(n^2) algorithm because the dir was > initially listed as empty, and files, after having been listed in a > background job, were seen as dynamically added files. That itself doesn't imply O(n²), does it? My guess would be that it depends on how you do the online sorting. > 7) IIRC, I fixed a bug that made Thunar crash when seeing broken > mount points (e.g. FUSE file system where the user-space process > crashed). Crash fixers are always good. > My code is neither very pretty nor very commented, but I can improve > the code quality before submitting the patches. I'd prefer to know > whether Thunar maintainers would accept the patches befor cleaning > them up. It would of course be nice if you cleaned things up before we merge them. I'm interested in your optimizations although I can't promise to merge all individual improvements. Can you, for each of the above points, create a bug on bugzilla.xfce.org and attach the corresponding patches? Or, alternatively, clone the Thunar git repository, upload your clone and put the fixes for each point into a dedicated branch? Then we can continue discussing them on bugzilla.xfce.org. > I hope that the philosophy of correctness of Thunar don't prevent > pure performance patches like these ones, to be included. Absolutely not, as state earlier. Raising these problems in a good email is already worth quite a lot. Patches: awesome. - Jannis _______________________________________________ Thunar-dev mailing list Thunar-dev@xfce.org https://mail.xfce.org/mailman/listinfo/thunar-dev