Re: [Unicon-group] Walk of file directory

Wade Fri, 23 Jan 2015 13:59:02 -0800

Okay, thanks. It's been so long since I've programmed against the WindowsAPI. Definitely sounds like a bug in there. Pity we have to take that intoaccount.

Could I suggest a prefix based on some part of the current timestamp aswell as a random factor. That would do a fairly simple but scalablenamespace partitioning. I don't have any facility to test it, sorry, soI'm just tossing ideas out here.


Wade.

On Sat, 24 Jan 2015 04:01:03 +1100, Jafar Al-Gharaibeh<[email protected]> wrote:

Wade,
_tempnam(dir, prefix) is provided by Windows, we just use it and itturned not to be smart at all - at least on my Windows 7 machine.However, our >code that uses it could be made smarter to always userandomized prefix every time - that is one approach.
Thanks,
Jafar


On Fri, Jan 23, 2015 at 3:44 AM, Wade <[email protected]> wrote:
Sounds like the _tempnam() function could be a lot smarter in creatingtemporary filenames. Is that our function or is it provided by Windows?
Wade.
On Fri, 23 Jan 2015 20:13:17 +1100, Sergey Logichev<[email protected]> wrote:
Jafar,
I am very appreciate for your investigations! Actually, my Windows%TMP% folder included ~135000 temporary files, so when I cleaned it myrun >>>time decreased from ~40 secs to ~20. And the very first open()was instant, then its time increased as number of temporary filesincreases too. My >>>proposal to purge all temporary files afterprogram finishes or instead use virtual storage at RAM, as on everysearched subdirectory is created >>>single temporary file. After veryshort time TMP folder will contain a myriad of such files.Nevertheless I confirm that number of threads practically do notinfluence on execution time. Probably, it's the problem of "lazycleanup", as you >>>mentioned. Hope you could find solution. Comparedwith Linux - Windows is quite a bag of different bugs! Which runs fromevery holes :-)
Thank you,
Sergey
23.01.2015, 10:19, "Jafar Al-Gharaibeh" <[email protected]>:
Sergey,
Thanks for the report. I had in mind to look at why we don't getmuch speed up with more threads. I did look and found that the>>>>main thread was grabbing most "new thread tokens" and notrecycling them fast enough. I have to tweak my algorithm to allow>>>>quick cleanup and reuse of threads. I will do that when I get achance.Now the second issue - and you've gotta love this!-, I was able toconfirm the slow open(). With the help of gdb and after spending>>>>a couple of hours digging into the C code and the Windows APIcalls, I found that the problem is in a call to _tempnam() to create>>>>a temporary file name. The call was taking so long to finish. Itcreates the tmp file under your system TMP folder (%TMP% on>>>>Windows). I looked in that folder and found that it has more thanhalf a million files (~2.7GB)!It turned out that every time myprogram runs, Windows was looping through that huge pile of tmp filesto find a name that doesn't >>>>exist so that it can give it to theprogram. Of course I think most of those tmp files were generated bymy program during previous >>>>runs the last couple of days. As abonus, I discovered a memory leak in the process of tracking theopen() problem. I committed a fix for that leak. This is only>>>>affecting Windows.
Short  term solution: flush your TMP folder.
Long term: we will into ways to improve our tmp file strategy toovercome the shortcoming of Windows API. This will come in a>>>>later date! :)
Cheers,
Jafar
On Thu, Jan 22, 2015 at 4:43 AM, Sergey Logichev<[email protected]> wrote:
Jafar,
You've provided very interesting version of walk directoryalgorithm. Communication with active threads' is a great thing!I have checked your program under Windows 7. I was confused the factthat execution time is negligibly depended on number of>>>>>concurrent threads. I dug into and discovered that the firstoperation open(s) takes near ALL execution time! 95% at least. Check>>>>>it yourself when you slightly edit getdirs():
...
if ( stat(s).mode ? ="d" ) & ( tm := &time, d := open(s) ) then {
     if n=1 then write(s," : ",&time-tm)
...
So, if first open() is so long then all other enhancements have nosense. Please clarify if I am wrong.
Best regards,
Sergey
22.01.2015, 00:58, "Jafar Al-Gharaibeh" <[email protected]>:
Here is a slightly tweaked/reformatted version. It now by defaultauto-detect the number of available cores in the >>>>>>machine andlaunch twice as many threads.
--Jafar
On Wed, Jan 21, 2015 at 12:17 PM, Jafar Al-Gharaibeh<[email protected]> wrote:
David,
I added a threaded solution @http://rosettacode.org/wiki/Walk_a_directory/Recursively#Icon_and_UniconPlease review/edit as you see fit. (The source file isattached). Combining recursion with thread might not be >>>>>>>thebest solution for this problem. If I were to put this in real useI'd go with an iterative approach using master/>>>>>>>workersmodel. Anyway, this is a excellent demonstration on how to usethreads!. The key features are:1- How to create threads, limit their numbers, self-loadbalanced (new threads are spawned at the time/place >>>>>>>whereneeded. One they are done, they vanish allowing new threads to popup in new places in the directory >>>>>>>structure)2- pass data and collect results to/from the threads using thenew language features.Here is some sample output from my desktop machine (quad-core withmechanical HDD. I will try another >>>>>>>machine with an SSD andsee if more threads scale better).the first argument to theprogram is the target directory. The second is the maximum numberof concurrent >>>>>>>threads to use at any given moment. (softlimit! my counters are "unmutexed", so the actual number might>>>>>>>deviate). Note that this is different from the actualnumber of threads used during the run which is reported at>>>>>>>the end. The program can create/destroy threads as needed,but cannot use more than "max" # of threads at >>>>>>>any givenmoment, and again "max" is "soft". :)
Cheers,
Jafar
c:\proj>tdir c:\ 1
39708 directories in 99867 ms using 1 threads
c:\proj>tdir c:\ 4
39708 directories in 62222 ms using 4 threads
c:\proj>tdir c:\ 4
39708 directories in 87650 ms using 4 threads
c:\proj>tdir c:\ 1
39708 directories in 92525 ms using 1 threads
c:\proj>tdir c:\ 4
39708 directories in 95655 ms using 4 threads
c:\proj>tdir c:\ 16
39708 directories in 66138 ms using 21 threads
c:\proj>tdir c:\ 8
39708 directories in 69307 ms using 8 threads
c:\proj>tdir c:\ 4
39708 directories in 70539 ms using 4 threads
c:\proj>tdir c:\ 16
39708 directories in 76392 ms using 32 threads
On Sun, Jan 11, 2015 at 1:25 PM, David Gamey<[email protected]> wrote:
Sergey,
I am responsible for much of the Rosetta code contributions(thanks also to Steve, Andrew, Matt, >>>>>>>>Peter, and about 4others) and this one in particular dating from 2010. As I recallthis was before the >>>>>>>>multi-threading versions were widelyavailable. I think multi-threading is underrepresented inRosetta/>>>>>>>>Unicon.If you come up with a multi-threading version, we should add itto the post as an alternative version. >>>>>>>>If you don't feelcomfortable doing this, post the code and I can add it.
David
From: Sergey Logichev <[email protected]>
To: Jafar Al-Gharaibeh <[email protected]>Cc: Unicon group<[email protected]>Sent: Sunday, January 11,2015 1:16 AM
Subject: Re: [Unicon-group] Walk of file directory

Jafar,
Thank you for a whole bundle of advices and suggestions! Threadsare worth >>>>>>>>>to try. The thought of search by fileattributes is very useful too. Your >>>>>>>>>suggestion aboutslow I/O partly is right. For UNIX I tried the program on>>>>>>>>>Raspberry Pi with 6 Class microSD as HDD (it's slow,agree). But for >>>>>>>>>Windows it was quite fast HDD. It wouldbe interesting to compare >>>>>>>>>performance of the program onWindows with classic approach based on >>>>>>>>>Win32_FINDFIRST, _FINDNEXT functions. I have threaded Delphi/Lazarus>>>>>>>>>implementations of this algorithm. Feel that it will befaster but in which >>>>>>>>>degree?
Sergey
10.01.2015, 21:50, "Jafar Al-Gharaibeh" <[email protected]>:
Sergey,
There are so many things that came to mind when I saw your>>>>>>>>>>program.1- At the end of your email, sourceforge ad says "GoParallel", >>>>>>>>>>Which is not a bad idea for this highlyparallel application.There is a similar program "wordcount"listed in my dissertation >>>>>>>>>>(available on unicon.org)that go through directories and count >>>>>>>>>>words in everyfile using threads (Chapter 7, page 107)2- Unicon open() already supports " pattern matching that would>>>>>>>>>>greatly (I believe) speedup your program. For exampleyou can >>>>>>>>>>do this:
   L := open("*.icn")
to get a list of all of Unicon source files in the currentdirectory. Note: It would be nice if there were a way to tellopen() to return >>>>>>>>>>files not only based on a pattern,but also on file attribute to allow >>>>>>>>>>something like"get me all directories in the current directory", or>>>>>>>>>>"get me all read only file". There are a lot ofsituations where >>>>>>>>>>filtering directory names forexample is very useful - like this >>>>>>>>>>program3- The program on Rosetta Code is not optimized for speed. You>>>>>>>>>>can minimize the number of lists created and put() bycareful >>>>>>>>>>rewriting of the code.4- Depending on how deep the directory tree is, there might bea >>>>>>>>>>lot of I/O going on. A slow disk might limit howfast you can go >>>>>>>>>>regardless of how optimized your codeis.
I will share results if get around trying any of these options.
Cheers,
Jafar
On Sat, Jan 10, 2015 at 5:51 AM, Sergey Logichev>>>>>>>>>><[email protected]> wrote:
Hello all!
Now I investigate the best approach to get list of files in>>>>>>>>>>>specified directory and beneath in Unicon.I found excellent example at rosettacode.org:http://>>>>>>>>>>>rosettacode.org/wiki/Walk_a_directory/>>>>>>>>>>>Recursively#Icon_and_UniconI reconstructed this one to implement matching of filenames to>>>>>>>>>>>specified pattern (regular expression). My programrecursively >>>>>>>>>>>walks a directory and printsappropriate filenames. The same >>>>>>>>>>>as dir (ls) does.All working fine except performance. If >>>>>>>>>>>directoryhas a lot of subdirs the search may took 10-20>>>>>>>>>>>seconds before starting output. Could you providesome >>>>>>>>>>>advices how to enchance the performance?Some notes how to make and use. Unpack content of udir.zip>>>>>>>>>>>to your local directory. Define which environmentyou use in >>>>>>>>>>>env.icn file - uncomment line "$define_UNIX 1" in the case of >>>>>>>>>>>UNIX. Nothing to do in thecase of Windows.
Make udir program:
unicon -c futils.icn
unicon -c options.icn
unicon -c regexp.icn
unicon udir.icn
Usage: udir -f<filemask>
for example: udir -f*.icn
shall list of icn files in the current dir and all itssubdirectories.
Best regards,
Sergey Logichev

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel>>>>>>>>>>>Website,sponsored by Intel and developed in partnership with Slashdot>>>>>>>>>>>Media, is yourhub for all things parallel software development, from weekly>>>>>>>>>>>thoughtleadership blogs to news, videos, case studies, tutorials and>>>>>>>>>>>more. Take alook and join the conversation now.http://>>>>>>>>>>>goparallel.sourceforge.net
_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go ParallelWebsite,sponsored by Intel and developed in partnership with SlashdotMedia, is yourhub for all things parallel software development, from weeklythoughtleadership blogs to news, videos, case studies, tutorials andmore. Take alook and join the conversation now.http://goparallel.sourceforge.net
_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server inAshburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet

_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group

Re: [Unicon-group] Walk of file directory

Reply via email to