2011/11/17, Jannis Pohlmann <jan...@xfce.org>: > Hi André, > >> Secondly, I patched Thunar, as follows: >> 1) Changed the ThunarFile implementation to only compute information >> that's "required", and, as more information is needed, complete this >> information. > > The "required" bit is a bit complicated, I guess, so I can only judge > about this improvement based on real code. >
Indeed. Now, I'm working on commenting and splitting patches. >> 3) When the file is actually viewed, a background thread computes >> more info (and so, follow symlinks), and, after a few milliseconds, >> the icon is changed. If a symlink is discovered to point to a >> directory, it's inserted as subdirectory in the tree view. > > Doesn't that mean things will jump up and down in the directory as you > browse it? > Good question. When a file was changed thunar_list_model_file_changed was invoked, and the file changed its position to keep the directory sorted. A symlink becoming a directory would have made the directory jump to the start of the folder, if directories are sorted before files. To avoid that side effect, plus a re-entrancy issue with thunar_standard_view_selection_changed (although it was fixable), I simply don't move files when they change, so the folder is not really sorted properly. Currently, leaving a folder and re-entering it, sort it again, so that symlink to directories that had been seen are moved at the top of the folder, while symlink to directories that had not been seen, are still sorted together with files. This only affects symlinks to directories. Regular directories and symlinks to files are not affected. This inconsistency disappears if the "sort folders before files" option is unset. >> 4) When a file or set of files is selected, their real content-type >> is computed in order to show a correct context menu, although, not >> everything is computed in some cases (e.g. If there's a file + a >> directory in the selection, it knows that the only verb is "open"). > > I wouldn't want to add two many special cases where we load additional > information. A first quick pass and then lazy loading additional > information all at once sounds more simple to me. > The main code change is in thunar/thunar-file.c. A few "fast" functions, that don't follow symlinks unless they have already been followed previously, are added such as thunar_file_is_directory_fast. The logic of lazy loading of information is in thunar-file.c, through a central function: thunar_file_complete_info(ThunarFile *file, guint flags) where flags are a few categories. THUNAR_INFO_BASIC (zero-cost information every ThunarFile gets on creation) THUNAR_INFO_XSTAT (accurate target file type, content-type, and UNIX attributes, which is pretty all info you can get with lstat(2)/stat(2)). THUNAR_INFO_ACCESS (Info you get with access(2)) THUNAR_INFO_SYMLINK_TARGET (target of a symlink (readlink(2))) THUNAR_INFO_TRASH (info related to trash items) The idea was that mass operations (viewing, selecting) need only THUNAR_INFO_XSTAT, and other info would be rarely asked. But, I discovered yesterday that, to be compatible with plugins such as UCA, complete info must be computed whenever files are selected, unless the plugin interface is changed to allow plugins to ask only information they need. Consequently, it makes sense to limit this info to two levels: THUNAR_INFO_BASIC THUNAR_INFO_ALL There is not much code to change. :) >> Consequently, the behavior of double-click or context menu is not >> changed. Only icons may be "incorrect" a short amount of time. >> >> 5) The side tree view was extremely slow in some cases. It could >> freeze Thunar for several minutes. This is because Thunar wanted to >> know if each directory visible in the tree view had any subdirectory >> (following symlinks) in order to display a little cross to be able to >> expand the directory and view the subdirectories. This was performed >> in a background thread, but, on I/O bound systems, could slow down >> extremely all other I/O operations. >> >> Actually, this was the "bug" that made me initially write this set of >> patches. >> >> I changed that to make it behave like Nautilus: Don't enter >> subdirectories until the cross is clicked, in that case, if no >> subdirectory is found, just make the cross disappear... This is a >> "feature regression", but I may update the patch to make something >> fast and correct most of the time: Seek a subdirectory, parsing a >> limited number (e.g. one hundred) of sub-files at most, and stop as >> soon as a true subdirectory (not symlink) is found. In doubt, assume >> the directory may have subdirectories. > > That sounds a little better and not too complicated either. Although I > wonder if seeking a subdirectory won't be much faster if we query less > information. Maybe that is enough optimization already? > If symlinks are not followed, it can have acceptable performances. readdir(3) is much cheaper than stat(2). For example (target machine = K6-2 550 Mhz, 448 MB RAM, samba, 7500 symlink folder, client machine = Core 2 Duo 100 Mbps ethernet network): time ls -f /huge_cifs_folder > /dev/null -> 0.89 second time ls -l /huge_cifs_folder > /dev/null -> 14 seconds (even though everything is in server cache) BTW, NFS is much faster than CIFS. :) Basically, we want to know whether a directory has regular sub-directories. Symlink sub-directories are not significant as far as they are not viewed. POSSIBLE optimization, but hard to make portable: Use the st_nlink field on the few well-known file systems where its behavior is reliable. If st_nlink > 2, then, most probably, there's a regular sub-directory. Even with readdir(3), the worst case can be very poor: Many symlinks to the same huge CIFS directory viewed, or a CIFS directory with many crossed symlinks. Each one would be parsed independently, so it would require 0.89*many seconds. This is due to the fact that we cannot assume that the graph of symlinks is a tree. It's an arbitrary oriented graph. SOLUTION proposition: It may be possible to save info on each parsed symlink target folder, in order to avoid recomputing whether there are subfolders. This doesn't solve the problem of many multi-mounting (as can be obtained with mount --bind), or indirection through a symlink unaware system (e.g. Thunar on Linux viewing a CIFS share of a Linux SAMBA directory shared through a Windows SMB share, and so, masking directory symlinks as if they were regular directories, but these are corner cases that may not be significant. Anyway, I wouldn't like Thunar to be one of those applications that make the system CPU, disk or network slower for extensive periods of time, only because it's launched, even though there's no user interaction. >> I may also provide a user preference to balance between performances >> and correctness. At the highest level of correctness, it would behave >> as the old Thunar (although twice as fast because, a "bug" in the >> folder listing function would make everything listed/stat-ed twice). > > Please don't. No option for technical feature sets like this. > Okay. It's always better to make the machine take this decision... The computer can find its own balance. For example, the concept of not following symlinks could be moderated by the number of symlinks or a time limit. >> 6) I fixed a few performance bugs. For example, when viewing a >> directory, it was sorted with a O(n^2) algorithm because the dir was >> initially listed as empty, and files, after having been listed in a >> background job, were seen as dynamically added files. > > That itself doesn't imply O(n²), does it? My guess would be that it > depends on how you do the online sorting. > Yes, it could be made O(n*log(n)) by sorting the list of files and then, merging them to the already-sorted list with a linear sorted list merge algorithm, but it was done through iterative insertions. This raises a question about events creating new files in unsorted folders (e.g. new files created through a shell script and notified via GAMIN). Symlinks that "become" directories are currently not properly sorted (not moved), and so, make any sorted insertion algorithm non-functionnal in some way or other. Moreover, this orthogonal idea of having the folder always sorted is not so nice in the real world. The Windows explorer behavior of putting new files at the end of the folder make it much easier to keep track of the last file you just created with a separate command line tool. -- André Gillibert _______________________________________________ Thunar-dev mailing list Thunar-dev@xfce.org https://mail.xfce.org/mailman/listinfo/thunar-dev