This troubleshooting is great. I was also hoping that the OS's disk caching would make things acceptibly fast, it works that way on GNU/Linux and Mac OS X.
I really think that xapian is the way forward on this. I don't think it'll be too hard to use in the search-plugin, and it will provide much faster searching. .hc On 01/17/2013 03:35 PM, Jonathan Wilkes wrote: > Another update on this: > If I pair down the number of files to something that approaches Pd-extended's > extra folder + doc folder (around 6,500 items): > Results for GNU/Linux: > > Getting dirs takes 524 microseconds per iteration > Getting files takes 324646 microseconds per iteration > Reading files takes 5483581 microseconds per iteration > Done. > > > This is on the Debian Wheezy machine with the 1gig ram > and fast processor-- it might take slightly longer on the other > machine but there's probably not much difference. > > Unfortunately WinXP still takes an eternity to do the same-- I didn't > copy the results but getting files was something like 8 seconds, and > reading them was about 30 seconds. (And that wasn't including the > "doc" folder so about 700 fewer files!) > > Unfortunately this discrepancy is so large that I can think of few decent > changes to the plugin and its interface that would improve the situation > for Windows users without bothering unix-derived OS users. Five > seconds on Debian for the first search (and less than a second for > subsequent ones) is completely reasonable IMO, and I don't see the > Pd-extended documentation growing significantly any time soon. > > So to end a _long_ answer to your question, I think you have to remove > the old libs from your search path and simply put up with the minimum > 35 second initial search time. If subsequent searches for anything other > than empy symbol (i.e., "") are taking 2 minutes to complete let me know > and I'll try to troubleshoot it. > > -Jonathan > > > > ----- Original Message ----- >> From: Jonathan Wilkes <[email protected]> >> To: João Pais <[email protected]>; PD-List <[email protected]> >> Cc: >> Sent: Thursday, January 17, 2013 3:21 AM >> Subject: Re: [PD] search plugin time optimisation >> >> If you're describing the time it takes for an _initial_ search, see below. >> However, >> subsequent file access is an order of magnitude faster-- on my pd-extended >> install >> with ca. 5,000 docs I barely even see the progressbar at all after the first >> search. >> >> >> I tested the attached tcl script in a folder that had 307 subdirs and about >> 200megs of files; roughly 18,000 files, 13,000 of which were docs readable >> by >> the >> script. >> >> Debian Wheezy amd_64 >> AMD Athlon(tm) II P360 Dual-Core Processor >> 4gigs ram >> Results: >> Getting dirs takes 184543 microseconds per iteration >> Getting files takes 1387819 microseconds per iteration >> Reading files takes 14766208 microseconds per iteration >> Done. >> >> In other words gathering up all the directories into a list >> takes less than 200 milliseconds, getting a list of all the >> files takes about a second and a half, and actually opening >> the file and feeding the contents to a variable takes about >> 15 seconds. >> >> Debian Wheezy (32bit) >> Intel(R) Pentium(R) 4 CPU 3.60GHz >> 1gig of ram >> Results: >> Getting dirs takes 46418 microseconds per iteration >> Getting files takes 1365663 microseconds per iteration >> Reading files takes 18203551 microseconds per iteration >> Done. >> >> >> Similar results on a machine with less ram and 32bit. >> >> >> WinXP >> Intel Core2 6600 @ 2.4GHz >> 1gig of ram >> (NTFS filesystem) >> Results: >> Getting dirs takes 0 microseconds per iteration >> Getting files takes 13109000 microseconds per iteration >> Reading files takes 41250000 microseconds per iteration >> Done. >> >> >> Not sure why it doesn't register anything for getting dirs. Also, >> >> no idea why Windows takes so much longer to return the >> list of files. I haven't found any clues on the tcl wiki, tcl docs, >> or tcl irc. Finally, notice how much longer Windows takes to >> read the files: it's nearly 3x slower than the Debian 64 machine. >> >> >> Mac OS X 10.7.5 >> 2.33 GHz Intel Core 2 Duo >> 2gig of ram >> Results: >> Getting dirs takes 5158 microseconds per iteration >> Getting files takes 979583 microseconds per iteration >> Reading files takes 30045212 microseconds per iteration >> Done. >> >> >> Still much faster than Windows for a comparable CPU, but >> reading still takes some time >> >> >> *** >> >> So while I can make some optimizations here and there in >> the search plugin, the measurements above are best >> case scenarios. You can try the script on Windows 7 if >> you want-- unfortunately the script only looks inside directories in >> it's own parent directory so you might have to make a >> test folder with lots of docs in order to make use of it. However, I >> suspect you'll see number more like my WinXP report above, >> and that would mean you simply cannot get an initial search below >> one minute with a comparable amount of docs. >> >> Alternatives are: >> * build an index from [pd META] data. I did this with the original >> search tool built in pd, but you lose the ability to do a full text search >> and effectively can no longer search text files and html. >> * build a full text index from the docs. Faster probably but it would >> be a large file. >> * use a search engine library like Xapian. But it requires someone >> who wants to do the work of using a searching engine library like >> Xapian. >> >> All those alternatives still have the requirement that you >> >> spend time building the index at least once, instead of each time you restart >> your computer or flush/overwrite wherever your OS caches the >> dirs/files for the current search plugin. And even with Xapian you're >> opening files in tcl and sending them to an index through the Xapain >> interface, >> so you'd still see the long wait time building the initial index. >> >> >> I'll try to test a Pd-extended nightly to see how the smaller number >> of docs performs later. >> >> -Jonathan >> >> >> ----- Original Message ----- >>> From: João Pais <[email protected]> >>> To: PD-List <[email protected]> >>> Cc: >>> Sent: Wednesday, January 9, 2013 7:05 AM >>> Subject: [PD] search plugin time optimisation >>> >>> Hi, >>> >>> I was trying the search plugin, and one search takes around 60s. In the >> end, the >>> plugin reports that he had to look through "16337 docs". >>> I have several work directories in my path, which I don't use that >> often. Is >>> there a way of optimising the search plugin? >>> >>> The system is W7. >>> >>> Best, >>> >>> João >>> >>> _______________________________________________ >>> [email protected] mailing list >>> UNSUBSCRIBE and account-management -> >>> http://lists.puredata.info/listinfo/pd-list >>> >> >> _______________________________________________ >> [email protected] mailing list >> UNSUBSCRIBE and account-management -> >> http://lists.puredata.info/listinfo/pd-list >> > > > _______________________________________________ > [email protected] mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list > _______________________________________________ [email protected] mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
