Sounds like the _tempnam() function could be a lot smarter in creating
temporary filenames. Is that our function or is it provided by Windows?
Wade.
On Fri, 23 Jan 2015 20:13:17 +1100, Sergey Logichev <[email protected]>
wrote:
Jafar,
I am very appreciate for your investigations! Actually, my Windows %TMP%
folder included ~135000 temporary files, so when I cleaned it my run
>time decreased from ~40 secs to ~20. And the very first open() was
instant, then its time increased as number of temporary files increases
too. My >proposal to purge all temporary files after program finishes or
instead use virtual storage at RAM, as on every searched subdirectory is
created single >temporary file. After very short time TMP folder will
contain a myriad of such files.
Nevertheless I confirm that number of threads practically do not
influence on execution time. Probably, it's the problem of "lazy
cleanup", as you >mentioned. Hope you could find solution. Compared with
Linux - Windows is quite a bag of different bugs! Which runs from every
holes :-)
Thank you,
Sergey
23.01.2015, 10:19, "Jafar Al-Gharaibeh" <[email protected]>:
Sergey,
Thanks for the report. I had in mind to look at why we don't get much
speed up with more threads. I did look and found that the main >>thread
was grabbing most "new thread tokens" and not recycling them fast
enough. I have to tweak my algorithm to allow quick >>cleanup and reuse
of threads. I will do that when I get a chance.
Now the second issue - and you've gotta love this!-, I was able to
confirm the slow open(). With the help of gdb and after spending a
>>couple of hours digging into the C code and the Windows API calls, I
found that the problem is in a call to _tempnam() to create a
>>temporary file name. The call was taking so long to finish. It
creates the tmp file under your system TMP folder (%TMP% on >>Windows).
I looked in that folder and found that it has more than half a million
files (~2.7GB)!It turned out that every time my program runs, Windows
was looping through that huge pile of tmp files to find a name that
doesn't >>exist so that it can give it to the program. Of course I
think most of those tmp files were generated by my program during
previous runs >>the last couple of days. As a bonus, I discovered a
memory leak in the process of tracking the open() problem. I committed
a fix for that leak. This is only >>affecting Windows.
Short term solution: flush your TMP folder.
Long term: we will into ways to improve our tmp file strategy to
overcome the shortcoming of Windows API. This will come in a later
>>date! :)
Cheers,
Jafar
On Thu, Jan 22, 2015 at 4:43 AM, Sergey Logichev <[email protected]>
wrote:
Jafar,
You've provided very interesting version of walk directory algorithm.
Communication with active threads' is a great thing!
I have checked your program under Windows 7. I was confused the fact
that execution time is negligibly depended on number of >>>concurrent
threads. I dug into and discovered that the first operation open(s)
takes near ALL execution time! 95% at least. Check it >>>yourself when
you slightly edit getdirs():
...
if ( stat(s).mode ? ="d" ) & ( tm := &time, d := open(s) ) then {
if n=1 then write(s," : ",&time-tm)
...
So, if first open() is so long then all other enhancements have no
sense. Please clarify if I am wrong.
Best regards,
Sergey
22.01.2015, 00:58, "Jafar Al-Gharaibeh" <[email protected]>:
Here is a slightly tweaked/reformatted version. It now by default
auto-detect the number of available cores in the >>>>machine and
launch twice as many threads.
--Jafar
On Wed, Jan 21, 2015 at 12:17 PM, Jafar Al-Gharaibeh
<[email protected]> wrote:
David,
I added a threaded solution @
http://rosettacode.org/wiki/Walk_a_directory/Recursively#Icon_and_Unicon
Please review/edit as you see fit. (The source file is attached).
Combining recursion with thread might not be the >>>>>best solution
for this problem. If I were to put this in real use I'd go with an
iterative approach using master/workers >>>>>model. Anyway, this is
a excellent demonstration on how to use threads!. The key features
are:
1- How to create threads, limit their numbers, self-load balanced
(new threads are spawned at the time/place >>>>>where needed. One
they are done, they vanish allowing new threads to pop up in new
places in the directory >>>>>structure)
2- pass data and collect results to/from the threads using the new
language features.
Here is some sample output from my desktop machine (quad-core with
mechanical HDD. I will try another machine >>>>>with an SSD and see
if more threads scale better).the first argument to the program is
the target directory. The second is the maximum number of
concurrent threads >>>>>to use at any given moment. (soft limit! my
counters are "unmutexed", so the actual number might deviate). Note
>>>>>that this is different from the actual number of threads used
during the run which is reported at the end. The >>>>>program can
create/destroy threads as needed, but cannot use more than "max" #
of threads at any given moment, >>>>>and again "max" is "soft". :)
Cheers,
Jafar
c:\proj>tdir c:\ 1
39708 directories in 99867 ms using 1 threads
c:\proj>tdir c:\ 4
39708 directories in 62222 ms using 4 threads
c:\proj>tdir c:\ 4
39708 directories in 87650 ms using 4 threads
c:\proj>tdir c:\ 1
39708 directories in 92525 ms using 1 threads
c:\proj>tdir c:\ 4
39708 directories in 95655 ms using 4 threads
c:\proj>tdir c:\ 16
39708 directories in 66138 ms using 21 threads
c:\proj>tdir c:\ 8
39708 directories in 69307 ms using 8 threads
c:\proj>tdir c:\ 4
39708 directories in 70539 ms using 4 threads
c:\proj>tdir c:\ 16
39708 directories in 76392 ms using 32 threads
On Sun, Jan 11, 2015 at 1:25 PM, David Gamey
<[email protected]> wrote:
Sergey,
I am responsible for much of the Rosetta code contributions (thanks
also to Steve, Andrew, Matt, Peter, >>>>>>and about 4 others) and
this one in particular dating from 2010. As I recall this was
before the multi->>>>>>threading versions were widely available. I
think multi-threading is underrepresented in Rosetta/Unicon.
If you come up with a multi-threading version, we should add it to
the post as an alternative version. If >>>>>>you don't feel
comfortable doing this, post the code and I can add it.
David
From: Sergey Logichev <[email protected]>
To: Jafar Al-Gharaibeh <[email protected]>Cc: Unicon group
<[email protected]>Sent: Sunday, January 11, 2015
1:16 AM
Subject: Re: [Unicon-group] Walk of file directory
Jafar,
Thank you for a whole bundle of advices and suggestions! Threads
are worth to >>>>>>>try. The thought of search by file attributes
is very useful too. Your suggestion >>>>>>>about slow I/O partly
is right. For UNIX I tried the program on Raspberry Pi with
>>>>>>>6 Class microSD as HDD (it's slow, agree). But for Windows
it was quite fast >>>>>>>HDD. It would be interesting to compare
performance of the program on >>>>>>>Windows with classic approach
based on Win32 _FINDFIRST, _FINDNEXT >>>>>>>functions. I have
threaded Delphi/Lazarus implementations of this algorithm.
>>>>>>>Feel that it will be faster but in which degree?
Sergey
10.01.2015, 21:50, "Jafar Al-Gharaibeh" <[email protected]>:
Sergey,
There are so many things that came to mind when I saw your
>>>>>>>>program.
1- At the end of your email, sourceforge ad says "Go Parallel",
>>>>>>>>Which is not a bad idea for this highly parallel
application.There is a similar program "wordcount" listed in my
dissertation >>>>>>>>(available on unicon.org) that go through
directories and count >>>>>>>>words in every file using threads
(Chapter 7, page 107)
2- Unicon open() already supports " pattern matching that would
>>>>>>>>greatly (I believe) speedup your program. For example you
can do >>>>>>>>this:
L := open("*.icn")
to get a list of all of Unicon source files in the current
directory. Note: It would be nice if there were a way to tell
open() to return files >>>>>>>>not only based on a pattern, but
also on file attribute to allow >>>>>>>>something like "get me
all directories in the current directory", or >>>>>>>>"get me all
read only file". There are a lot of situations where filtering
>>>>>>>>directory names for example is very useful - like this
program
3- The program on Rosetta Code is not optimized for speed. You
>>>>>>>>can minimize the number of lists created and put() by
careful >>>>>>>>rewriting of the code.
4- Depending on how deep the directory tree is, there might be a
lot >>>>>>>>of I/O going on. A slow disk might limit how fast you
can go >>>>>>>>regardless of how optimized your code is.
I will share results if get around trying any of these options.
Cheers,
Jafar
On Sat, Jan 10, 2015 at 5:51 AM, Sergey Logichev
>>>>>>>><[email protected]> wrote:
Hello all!
Now I investigate the best approach to get list of files in
specified >>>>>>>>>directory and beneath in Unicon.
I found excellent example at rosettacode.org:
http://>>>>>>>>>rosettacode.org/wiki/Walk_a_directory/>>>>>>>>>Recursively#Icon_and_Unicon
I reconstructed this one to implement matching of filenames to
>>>>>>>>>specified pattern (regular expression). My program
recursively >>>>>>>>>walks a directory and prints appropriate
filenames. The same as >>>>>>>>>dir (ls) does. All working fine
except performance. If directory has >>>>>>>>>a lot of subdirs
the search may took 10-20 seconds before >>>>>>>>>starting
output. Could you provide some advices how to enchance
>>>>>>>>>the performance?
Some notes how to make and use. Unpack content of udir.zip to
>>>>>>>>>your local directory. Define which environment you use
in env.icn >>>>>>>>>file - uncomment line "$define _UNIX 1" in
the case of UNIX. >>>>>>>>>Nothing to do in the case of Windows.
Make udir program:
unicon -c futils.icn
unicon -c options.icn
unicon -c regexp.icn
unicon udir.icn
Usage: udir -f<filemask>
for example: udir -f*.icn
shall list of icn files in the current dir and all its
subdirectories.
Best regards,
Sergey Logichev
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel
>>>>>>>>>Website,
sponsored by Intel and developed in partnership with Slashdot
>>>>>>>>>Media, is your
hub for all things parallel software development, from weekly
>>>>>>>>>thought
leadership blogs to news, videos, case studies, tutorials and
>>>>>>>>>more. Take a
look and join the conversation now.
http://>>>>>>>>>goparallel.sourceforge.net
_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel
Website,
sponsored by Intel and developed in partnership with Slashdot
Media, is your
hub for all things parallel software development, from weekly
thought
leadership blogs to news, videos, case studies, tutorials and
more. Take a
look and join the conversation now.
http://goparallel.sourceforge.net
_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group