Re: [Unicon-group] [SPAM] Re: Walk of file directory

Jafar Al-Gharaibeh Fri, 23 Jan 2015 12:43:42 -0800

>From what I saw, the temp files are not removed from one run to another.
Who is responsible of cleaning those up?.


The program in question - in my tests at least - traverses the entire c
drive so it create a huge number of temp files. I have noticed that only
the first temp file creation is slow. It is like once _tempnam() finds a
valid name in a given program run, it keeps an internal counter that it
uses/increments with the prefix in subsequent calls to quickly find valid
names.

--Jafar


On Fri, Jan 23, 2015 at 2:24 PM, Jeffery, Clint (jeffe...@uidaho.edu) <
jeffe...@uidaho.edu> wrote:

>  Are temp files persisting instead of being removed when the process
> terminates? That's a bad leak, if so. Or is it just that we have such large
> file systems now that the common prefix is killing us for the number of
> tempnames generated on a single run?
>
>
>
> -------- Original message --------
> From: Jafar Al-Gharaibeh <to.ja...@gmail.com>
> Date: 01/23/2015 12:01 PM (GMT-05:00)
> To: Wade <sta...@yceran.org>
> Cc: Unicon group <unicon-group@lists.sourceforge.net>
> Subject: [SPAM] Re: [Unicon-group] Walk of file directory
>
>
>  Wade,
>
>     _tempnam(dir, prefix) is provided by Windows, we just use it and it
> turned not to be smart at all - at least on my Windows 7 machine. However,
> our code that uses it could be made smarter to always use randomized prefix
> every time - that is one approach.
>
>  Thanks,
> Jafar
>
>
> On Fri, Jan 23, 2015 at 3:44 AM, Wade <sta...@yceran.org> wrote:
>
>>  Sounds like the _tempnam() function could be a lot smarter in creating
>> temporary filenames. Is that our function or is it provided by Windows?
>>
>>  Wade.
>>
>>
>>  On Fri, 23 Jan 2015 20:13:17 +1100, Sergey Logichev <slogic...@yandex.ru>
>> wrote:
>>
>>  Jafar,
>>
>> I am very appreciate for your investigations! Actually, my Windows %TMP%
>> folder included ~135000 temporary files, so when I cleaned it my run time
>> decreased from ~40 secs to ~20. And the very first open() was instant, then
>> its time increased as number of temporary files increases too. My proposal
>> to purge all temporary files after program finishes or instead use virtual
>> storage at RAM, as on every searched subdirectory is created single
>> temporary file. After very short time TMP folder will contain a myriad of
>> such files.
>>
>> Nevertheless I confirm that number of threads practically do not
>> influence on execution time. Probably, it's the problem of "lazy cleanup",
>> as you mentioned. Hope you could find solution. Compared with Linux -
>> Windows is quite a bag of different bugs! Which runs from every holes :-)
>>
>> Thank you,
>> Sergey
>>
>> 23.01.2015, 10:19, "Jafar Al-Gharaibeh" <to.ja...@gmail.com>:
>>
>> Sergey,
>>
>>    Thanks for the report. I had in mind to look at why we don't get much
>> speed up with more threads. I did look and found that the main thread was
>> grabbing most "new thread tokens" and not recycling them fast enough. I
>> have to tweak my algorithm to allow quick cleanup and reuse of threads. I
>> will do that when I get a chance.
>>
>> Now the second issue - and you've gotta love this!-, I was able to
>> confirm the slow open(). With the help of gdb and after spending a couple
>> of hours digging into the C code and the Windows API calls, I found that
>> the problem is in a call to _tempnam() to create a temporary file name. The
>> call was taking so long to finish. It creates the tmp file under your
>> system TMP folder (%TMP% on Windows). I looked in that folder and found
>> that it has more than half a million files (~2.7GB)!
>>
>> It turned out that every time my program runs, Windows was looping
>> through that huge pile of tmp files to find a name that doesn't exist so
>> that it can give it to the program. Of course I think most of those tmp
>> files were generated by my program during previous runs the last couple of
>> days.
>>
>> As a bonus, I discovered a memory leak in the process of tracking the
>> open() problem. I committed a fix for that leak. This is only affecting
>> Windows.
>>
>> Short  term solution: flush your TMP folder.
>> Long term: we will into ways to improve our tmp file strategy to overcome
>> the shortcoming of Windows API. This will come in a later date! :)
>>
>> Cheers,
>> Jafar
>>
>>
>> On Thu, Jan 22, 2015 at 4:43 AM, Sergey Logichev <slogic...@yandex.ru>
>> wrote:
>>
>> Jafar,
>>
>> You've provided very interesting version of walk directory algorithm.
>> Communication with active threads' is a great thing!
>> I have checked your program under Windows 7. I was confused the fact that
>> execution time is negligibly depended on number of concurrent threads. I
>> dug into and discovered that the first operation open(s) takes near ALL
>> execution time! 95% at least. Check it yourself when you slightly edit
>> getdirs():
>>  ...
>> if ( stat(s).mode ? ="d" ) & ( tm := &time, d := open(s) ) then {
>>       if n=1 then write(s," : ",&time-tm)
>> ...
>>
>>
>> So, if first open() is so long then all other enhancements have no sense.
>> Please clarify if I am wrong.
>>
>> Best regards,
>> Sergey
>>
>> 22.01.2015, 00:58, "Jafar Al-Gharaibeh" <to.ja...@gmail.com>:
>>
>> Here is a slightly tweaked/reformatted version. It now by default
>> auto-detect the number of available cores in the machine and launch twice
>> as many threads.
>>
>> --Jafar
>>
>> On Wed, Jan 21, 2015 at 12:17 PM, Jafar Al-Gharaibeh <to.ja...@gmail.com>
>> wrote:
>>
>> David,
>>
>>     I added a threaded solution @
>> http://rosettacode.org/wiki/Walk_a_directory/Recursively#Icon_and_Unicon
>>    Please review/edit as you see fit. (The source file is attached).
>> Combining recursion with thread might not be the best solution for this
>> problem. If I were to put this in real use I'd go with an iterative
>> approach using master/workers model. Anyway, this is a excellent
>> demonstration on how to use threads!. The key features are:
>>
>>    1- How to create threads, limit their numbers, self-load balanced (new
>> threads  are spawned at the time/place where needed. One they are done,
>> they vanish allowing new threads to pop up in new places in the directory
>> structure)
>>    2- pass data and collect results to/from the threads using the new
>> language features.
>>
>>
>> Here is some sample output from my desktop machine (quad-core with
>> mechanical HDD. I will try another machine with an SSD and see if more
>> threads scale better).
>>
>> the first argument to the program is the target directory. The second is
>> the maximum number of  concurrent threads to use at any given moment. (soft
>> limit! my counters are "unmutexed", so the actual number might deviate).
>> Note that this is different from the actual number of threads used during
>> the run which is reported at the end. The program can create/destroy
>> threads as needed, but cannot  use more than "max" # of threads at any
>> given moment, and again "max" is "soft". :)
>>
>> Cheers,
>> Jafar
>>
>>  c:\proj>tdir c:\ 1
>> 39708 directories in 99867 ms using 1 threads
>>
>> c:\proj>tdir c:\ 4
>> 39708 directories in 62222 ms using 4 threads
>>
>> c:\proj>tdir c:\ 4
>> 39708 directories in 87650 ms using 4 threads
>>
>> c:\proj>tdir c:\ 1
>> 39708 directories in 92525 ms using 1 threads
>>
>> c:\proj>tdir c:\ 4
>> 39708 directories in 95655 ms using 4 threads
>>
>> c:\proj>tdir c:\ 16
>> 39708 directories in 66138 ms using 21 threads
>>
>> c:\proj>tdir c:\ 8
>> 39708 directories in 69307 ms using 8 threads
>>
>> c:\proj>tdir c:\ 4
>> 39708 directories in 70539 ms using 4 threads
>>
>> c:\proj>tdir c:\ 16
>> 39708 directories in 76392 ms using 32 threads
>>
>>
>>
>> On Sun, Jan 11, 2015 at 1:25 PM, David Gamey <david.ga...@rogers.com>
>> wrote:
>>
>>  Sergey,
>>
>> I am responsible for much of the Rosetta code contributions (thanks also
>> to Steve, Andrew, Matt, Peter, and about 4 others) and this one in
>> particular dating from 2010. As I recall this was before the
>> multi-threading versions were widely available. I think multi-threading is
>> underrepresented in Rosetta/Unicon.
>>
>> If you come up with a multi-threading version, we should add it to the
>> post as an alternative version.  If you don't feel comfortable doing this,
>> post the code and I can add it.
>>
>> David
>>
>>
>>   ------------------------------
>> *From:* Sergey Logichev <slogic...@yandex.ru>
>> *To:* Jafar Al-Gharaibeh <to.ja...@gmail.com>
>> *Cc:* Unicon group <unicon-group@lists.sourceforge.net>
>> *Sent:* Sunday, January 11, 2015 1:16 AM
>> *Subject:* Re: [Unicon-group] Walk of file directory
>>
>>  Jafar,
>>
>> Thank you for a whole bundle of advices and suggestions! Threads are
>> worth to try. The thought of search by file attributes is very useful too.
>> Your suggestion about slow I/O partly is right. For UNIX I tried the
>> program on Raspberry Pi with 6 Class microSD as HDD (it's slow, agree). But
>> for Windows it was quite fast HDD. It would be interesting to compare
>> performance of the program on Windows with classic approach based on Win32
>> _FINDFIRST, _FINDNEXT functions. I have threaded Delphi/Lazarus
>> implementations of this algorithm. Feel that it will be faster but in which
>> degree?
>>
>> Sergey
>>
>> 10.01.2015, 21:50, "Jafar Al-Gharaibeh" <to.ja...@gmail.com>:
>>
>>
>>   Sergey,
>>
>>   There are so many things that came to mind when I saw your program.
>>
>> 1-  At the end of your email, sourceforge ad says "Go Parallel", Which is
>> not a bad idea for this highly parallel application.
>>
>>  There is a similar program "wordcount" listed in my dissertation
>> (available on unicon.org) that go through directories and count words in
>> every file using threads (Chapter 7, page 107)
>>
>> 2- Unicon open() already supports " pattern matching that would greatly
>> (I believe) speedup your program. For example you can do this:
>>     L := open("*.icn")
>>
>>    to get a list of all of Unicon source files in the current directory.
>>
>>   Note: It would be nice if there were a way to tell open() to return
>> files not only based on a pattern, but also on file attribute to allow
>> something like "get me all directories in the current directory", or "get
>> me all read only file". There are a lot of situations where filtering
>> directory names for example is very useful - like this program
>>
>> 3- The program on Rosetta Code is not optimized for speed. You can
>> minimize the number of lists created and put() by careful rewriting of the
>> code.
>>
>> 4- Depending on how deep the directory tree is, there might be a lot of
>> I/O going on. A slow disk might limit how fast you can go regardless of how
>> optimized your code is.
>>
>> I will share results if get around trying any of these options.
>>
>> Cheers,
>> Jafar
>>
>>
>>
>> On Sat, Jan 10, 2015 at 5:51 AM, Sergey Logichev <slogic...@yandex.ru>
>> wrote:
>>
>> Hello all!
>>
>> Now I investigate the best approach to get list of files in specified
>> directory and beneath in Unicon.
>> I found excellent example at rosettacode.org:
>> http://rosettacode.org/wiki/Walk_a_directory/Recursively#Icon_and_Unicon
>>
>> I reconstructed this one to implement matching of filenames to specified
>> pattern (regular expression). My program recursively walks a directory and
>> prints appropriate filenames. The same as dir (ls) does. All working fine
>> except performance. If directory has a lot of subdirs the search may took
>> 10-20 seconds before starting output. Could you provide some advices how to
>> enchance the performance?
>>
>> Some notes how to make and use. Unpack content of udir.zip to your local
>> directory. Define which environment you use in env.icn file - uncomment
>> line "$define _UNIX 1" in the case of UNIX. Nothing to do in the case of
>> Windows.
>> Make udir program:
>> unicon -c futils.icn
>> unicon -c options.icn
>> unicon -c regexp.icn
>> unicon udir.icn
>>
>> Usage: udir -f<filemask>
>> for example: udir -f*.icn
>> shall list of icn files in the current dir and all its subdirectories.
>>
>> Best regards,
>> Sergey Logichev
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming! The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net
>> _______________________________________________
>> Unicon-group mailing list
>> Unicon-group@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/unicon-group
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming! The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net
>>
>> _______________________________________________
>> Unicon-group mailing list
>> Unicon-group@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/unicon-group
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>> GigeNET is offering a free month of service with a new server in Ashburn.
>> Choose from 2 high performing configs, both with 100TB of bandwidth.
>> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
>> http://p.sf.net/sfu/gigenet
>> _______________________________________________
>> Unicon-group mailing list
>> Unicon-group@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/unicon-group
>>
>>
>

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet

_______________________________________________
Unicon-group mailing list
Unicon-group@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/unicon-group

Re: [Unicon-group] [SPAM] Re: Walk of file directory

Reply via email to