This is similar to using fuzzy / -y in a large directory. O(n^2) behaviour occurs and can be incredibly slow. No caching of md5's for the directory occurs, it would seem (or even so, there are O(N^2) comparisons).
/kc On Mon, Jun 15, 2015 at 06:02:14PM -0500, ray vantassle said: > I investigated the rsync code and found the reason why. > For every file in the source, it searches the entire filter-list looking > to see if that filename is on the exclude/include list.** Most aren't, so > it compares (350K - 72K) * 72K names (the non-listed files) plus (72K * > 72K/2) names (the ones that are listed), for a total of about** > 22,608,000,000 strcmp's.** That's 22 BILLION comparisons. (I may have left > off a zero there, it might be 220 B). > > I'm working on a fix to improve this.** The first phase was to just > improve the existing code without changing the methodology. > The set I've been testing with is local-local machine, dry-run, 216K files > in the source directory, 25,000 files in the exclude-from list. > The original rsync takes 488 seconds. > The improved code takes 300 seconds. > > The next phase was to improve the algorithm of handling large > filter_lists.** Change the unsorted linear search to a sorted binary > search (skiplist). > This improved code takes 2 seconds. > > The original code does 4,492,304,682 strcmp's. > The fully improved code does 6,472,564.** 98.5% fewer. > > I am cleaning up the code and will submit a patchfile soon. >-- >Please use reply-all for most replies to avoid omitting the mailing list. >To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync >Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Ken Chase - k...@heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html