Re: [Unicon-group] Walk of file directory

Jafar Al-Gharaibeh Fri, 23 Jan 2015 09:02:07 -0800

Wade,

   _tempnam(dir, prefix) is provided by Windows, we just use it and it
turned not to be smart at all - at least on my Windows 7 machine. However,
our code that uses it could be made smarter to always use randomized prefix
every time - that is one approach.


Thanks,
Jafar


On Fri, Jan 23, 2015 at 3:44 AM, Wade <[email protected]> wrote:

>  Sounds like the _tempnam() function could be a lot smarter in creating
> temporary filenames. Is that our function or is it provided by Windows?
>
> Wade.
>
>
> On Fri, 23 Jan 2015 20:13:17 +1100, Sergey Logichev <[email protected]>
> wrote:
>
> Jafar,
>
> I am very appreciate for your investigations! Actually, my Windows %TMP%
> folder included ~135000 temporary files, so when I cleaned it my run time
> decreased from ~40 secs to ~20. And the very first open() was instant, then
> its time increased as number of temporary files increases too. My proposal
> to purge all temporary files after program finishes or instead use virtual
> storage at RAM, as on every searched subdirectory is created single
> temporary file. After very short time TMP folder will contain a myriad of
> such files.
>
> Nevertheless I confirm that number of threads practically do not influence
> on execution time. Probably, it's the problem of "lazy cleanup", as you
> mentioned. Hope you could find solution. Compared with Linux - Windows is
> quite a bag of different bugs! Which runs from every holes :-)
>
> Thank you,
> Sergey
>
> 23.01.2015, 10:19, "Jafar Al-Gharaibeh" <[email protected]>:
>
> Sergey,
>
>    Thanks for the report. I had in mind to look at why we don't get much
> speed up with more threads. I did look and found that the main thread was
> grabbing most "new thread tokens" and not recycling them fast enough. I
> have to tweak my algorithm to allow quick cleanup and reuse of threads. I
> will do that when I get a chance.
>
> Now the second issue - and you've gotta love this!-, I was able to confirm
> the slow open(). With the help of gdb and after spending a couple of hours
> digging into the C code and the Windows API calls, I found that the problem
> is in a call to _tempnam() to create a temporary file name. The call was
> taking so long to finish. It creates the tmp file under your system TMP
> folder (%TMP% on Windows). I looked in that folder and found that it has
> more than half a million files (~2.7GB)!
>
> It turned out that every time my program runs, Windows was looping through
> that huge pile of tmp files to find a name that doesn't exist so that it
> can give it to the program. Of course I think most of those tmp files were
> generated by my program during previous runs the last couple of days.
>
> As a bonus, I discovered a memory leak in the process of tracking the
> open() problem. I committed a fix for that leak. This is only affecting
> Windows.
>
> Short  term solution: flush your TMP folder.
> Long term: we will into ways to improve our tmp file strategy to overcome
> the shortcoming of Windows API. This will come in a later date! :)
>
> Cheers,
> Jafar
>
>
> On Thu, Jan 22, 2015 at 4:43 AM, Sergey Logichev <[email protected]>
> wrote:
>
> Jafar,
>
> You've provided very interesting version of walk directory algorithm.
> Communication with active threads' is a great thing!
> I have checked your program under Windows 7. I was confused the fact that
> execution time is negligibly depended on number of concurrent threads. I
> dug into and discovered that the first operation open(s) takes near ALL
> execution time! 95% at least. Check it yourself when you slightly edit
> getdirs():
> ...
> if ( stat(s).mode ? ="d" ) & ( tm := &time, d := open(s) ) then {
>       if n=1 then write(s," : ",&time-tm)
> ...
>
>
> So, if first open() is so long then all other enhancements have no sense.
> Please clarify if I am wrong.
>
> Best regards,
> Sergey
>
> 22.01.2015, 00:58, "Jafar Al-Gharaibeh" <[email protected]>:
>
> Here is a slightly tweaked/reformatted version. It now by default
> auto-detect the number of available cores in the machine and launch twice
> as many threads.
>
> --Jafar
>
> On Wed, Jan 21, 2015 at 12:17 PM, Jafar Al-Gharaibeh <[email protected]>
> wrote:
>
> David,
>
>     I added a threaded solution @
> http://rosettacode.org/wiki/Walk_a_directory/Recursively#Icon_and_Unicon
>    Please review/edit as you see fit. (The source file is attached).
> Combining recursion with thread might not be the best solution for this
> problem. If I were to put this in real use I'd go with an iterative
> approach using master/workers model. Anyway, this is a excellent
> demonstration on how to use threads!. The key features are:
>
>    1- How to create threads, limit their numbers, self-load balanced (new
> threads  are spawned at the time/place where needed. One they are done,
> they vanish allowing new threads to pop up in new places in the directory
> structure)
>    2- pass data and collect results to/from the threads using the new
> language features.
>
>
> Here is some sample output from my desktop machine (quad-core with
> mechanical HDD. I will try another machine with an SSD and see if more
> threads scale better).
>
> the first argument to the program is the target directory. The second is
> the maximum number of  concurrent threads to use at any given moment. (soft
> limit! my counters are "unmutexed", so the actual number might deviate).
> Note that this is different from the actual number of threads used during
> the run which is reported at the end. The program can create/destroy
> threads as needed, but cannot  use more than "max" # of threads at any
> given moment, and again "max" is "soft". :)
>
> Cheers,
> Jafar
>
> c:\proj>tdir c:\ 1
> 39708 directories in 99867 ms using 1 threads
>
> c:\proj>tdir c:\ 4
> 39708 directories in 62222 ms using 4 threads
>
> c:\proj>tdir c:\ 4
> 39708 directories in 87650 ms using 4 threads
>
> c:\proj>tdir c:\ 1
> 39708 directories in 92525 ms using 1 threads
>
> c:\proj>tdir c:\ 4
> 39708 directories in 95655 ms using 4 threads
>
> c:\proj>tdir c:\ 16
> 39708 directories in 66138 ms using 21 threads
>
> c:\proj>tdir c:\ 8
> 39708 directories in 69307 ms using 8 threads
>
> c:\proj>tdir c:\ 4
> 39708 directories in 70539 ms using 4 threads
>
> c:\proj>tdir c:\ 16
> 39708 directories in 76392 ms using 32 threads
>
>
>
> On Sun, Jan 11, 2015 at 1:25 PM, David Gamey <[email protected]>
> wrote:
>
> Sergey,
>
> I am responsible for much of the Rosetta code contributions (thanks also
> to Steve, Andrew, Matt, Peter, and about 4 others) and this one in
> particular dating from 2010. As I recall this was before the
> multi-threading versions were widely available. I think multi-threading is
> underrepresented in Rosetta/Unicon.
>
> If you come up with a multi-threading version, we should add it to the
> post as an alternative version.  If you don't feel comfortable doing this,
> post the code and I can add it.
>
> David
>
>
> ------------------------------
> *From:* Sergey Logichev <[email protected]>
> *To:* Jafar Al-Gharaibeh <[email protected]>
> *Cc:* Unicon group <[email protected]>
> *Sent:* Sunday, January 11, 2015 1:16 AM
> *Subject:* Re: [Unicon-group] Walk of file directory
>
> Jafar,
>
> Thank you for a whole bundle of advices and suggestions! Threads are worth
> to try. The thought of search by file attributes is very useful too. Your
> suggestion about slow I/O partly is right. For UNIX I tried the program on
> Raspberry Pi with 6 Class microSD as HDD (it's slow, agree). But for
> Windows it was quite fast HDD. It would be interesting to compare
> performance of the program on Windows with classic approach based on Win32
> _FINDFIRST, _FINDNEXT functions. I have threaded Delphi/Lazarus
> implementations of this algorithm. Feel that it will be faster but in which
> degree?
>
> Sergey
>
> 10.01.2015, 21:50, "Jafar Al-Gharaibeh" <[email protected]>:
>
>
> Sergey,
>
>   There are so many things that came to mind when I saw your program.
>
> 1-  At the end of your email, sourceforge ad says "Go Parallel", Which is
> not a bad idea for this highly parallel application.
>
>  There is a similar program "wordcount" listed in my dissertation
> (available on unicon.org) that go through directories and count words in
> every file using threads (Chapter 7, page 107)
>
> 2- Unicon open() already supports " pattern matching that would greatly (I
> believe) speedup your program. For example you can do this:
>     L := open("*.icn")
>
>    to get a list of all of Unicon source files in the current directory.
>
>   Note: It would be nice if there were a way to tell open() to return
> files not only based on a pattern, but also on file attribute to allow
> something like "get me all directories in the current directory", or "get
> me all read only file". There are a lot of situations where filtering
> directory names for example is very useful - like this program
>
> 3- The program on Rosetta Code is not optimized for speed. You can
> minimize the number of lists created and put() by careful rewriting of the
> code.
>
> 4- Depending on how deep the directory tree is, there might be a lot of
> I/O going on. A slow disk might limit how fast you can go regardless of how
> optimized your code is.
>
> I will share results if get around trying any of these options.
>
> Cheers,
> Jafar
>
>
>
> On Sat, Jan 10, 2015 at 5:51 AM, Sergey Logichev <[email protected]>
> wrote:
>
> Hello all!
>
> Now I investigate the best approach to get list of files in specified
> directory and beneath in Unicon.
> I found excellent example at rosettacode.org:
> http://rosettacode.org/wiki/Walk_a_directory/Recursively#Icon_and_Unicon
>
> I reconstructed this one to implement matching of filenames to specified
> pattern (regular expression). My program recursively walks a directory and
> prints appropriate filenames. The same as dir (ls) does. All working fine
> except performance. If directory has a lot of subdirs the search may took
> 10-20 seconds before starting output. Could you provide some advices how to
> enchance the performance?
>
> Some notes how to make and use. Unpack content of udir.zip to your local
> directory. Define which environment you use in env.icn file - uncomment
> line "$define _UNIX 1" in the case of UNIX. Nothing to do in the case of
> Windows.
> Make udir program:
> unicon -c futils.icn
> unicon -c options.icn
> unicon -c regexp.icn
> unicon udir.icn
>
> Usage: udir -f<filemask>
> for example: udir -f*.icn
> shall list of icn files in the current dir and all its subdirectories.
>
> Best regards,
> Sergey Logichev
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming! The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net
> _______________________________________________
> Unicon-group mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/unicon-group
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming! The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net
>
> _______________________________________________
> Unicon-group mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/unicon-group
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet
> _______________________________________________
> Unicon-group mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/unicon-group
>
>

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet

_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group

Re: [Unicon-group] Walk of file directory

Reply via email to