Re: [Unicon-group] [SPAM] Re: Walk of file directory

ptho Fri, 23 Jan 2015 14:12:13 -0800

I'm seeing the uni files from compilation in the /tmp directory underfedora 21. I thought we had that fixed early last year. As it happens all15 are from a short period on Jan. 16 when I was using "unicon -C" soperhaps Linux cleans up (or I do) for some unobvious reason. I believethere is an old UNIX specification that files not owned by root and foundin the /tmp directory can be removed at will by the operating system. Thatdoesn't seem to happen, but our immediate problem is removing the fileswhen compilation is successful.

For Linux, there is a utility named tmpwatch that can do automatic,configurable cleanups of /tmp directories.


On Windows, who knows?

--Phillip

On Fri, 23 Jan 2015, Jafar Al-Gharaibeh wrote:

From what I saw, the temp files are not removed from one run to another. Who
is responsible of cleaning those up?.
The program in question - in my tests at least - traverses the entire c
drive so it create a huge number of temp files. I have noticed that only the
first temp file creation is slow. It is like once _tempnam() finds a valid
name in a given program run, it keeps an internal counter that it
uses/increments with the prefix in subsequent calls to quickly find valid
names.

--Jafar
 

On Fri, Jan 23, 2015 at 2:24 PM, Jeffery, Clint ([email protected])
<[email protected]> wrote:
      Are temp files persisting instead of being removed when the
      process terminates? That's a bad leak, if so. Or is it just that
      we have such large file systems now that the common prefix is
      killing us for the number of tempnames generated on a single
      run?


      -------- Original message --------
      From: Jafar Al-Gharaibeh <[email protected]>
      Date: 01/23/2015 12:01 PM (GMT-05:00)
      To: Wade <[email protected]>
      Cc: Unicon group <[email protected]>
      Subject: [SPAM] Re: [Unicon-group] Walk of file directory


      Wade,
   _tempnam(dir, prefix) is provided by Windows, we just use it and it
turned not to be smart at all - at least on my Windows 7 machine.
However, our code that uses it could be made smarter to always use
randomized prefix every time - that is one approach.

Thanks,
Jafar


On Fri, Jan 23, 2015 at 3:44 AM, Wade <[email protected]> wrote:
      Sounds like the _tempnam() function could be a lot smarter
      in creating temporary filenames. Is that our function or
      is it provided by Windows?

Wade.


On Fri, 23 Jan 2015 20:13:17 +1100, Sergey Logichev
<[email protected]> wrote:

      Jafar,
 
I am very appreciate for your investigations! Actually, my
Windows %TMP% folder included ~135000 temporary files, so
when I cleaned it my run time decreased from ~40 secs to
~20. And the very first open() was instant, then its time
increased as number of temporary files increases too. My
proposal to purge all temporary files after program
finishes or instead use virtual storage at RAM, as on
every searched subdirectory is created single temporary
file. After very short time TMP folder will contain a
myriad of such files.
 
Nevertheless I confirm that number of threads practically
do not influence on execution time. Probably, it's the
problem of "lazy cleanup", as you mentioned. Hope you
could find solution. Compared with Linux - Windows is
quite a bag of different bugs! Which runs from every holes
:-)
 
Thank you,
Sergey
 
23.01.2015, 10:19, "Jafar Al-Gharaibeh"
<[email protected]>:
      Sergey,  
   Thanks for the report. I had in mind to look at
why we don't get much speed up with more threads. I
did look and found that the main thread was grabbing
most "new thread tokens" and not recycling them fast
enough. I have to tweak my algorithm to allow quick
cleanup and reuse of threads. I will do that when I
get a chance.
 
Now the second issue - and you've gotta love this!-,
I was able to confirm the slow open(). With the help
of gdb and after spending a couple of hours digging
into the C code and the Windows API calls, I found
that the problem is in a call to _tempnam() to
create a temporary file name. The call was taking so
long to finish. It creates the tmp file under your
system TMP folder (%TMP% on Windows). I looked in
that folder and found that it has more than half a
million files (~2.7GB)! 
 
It turned out that every time my program runs,
Windows was looping through that huge pile of tmp
files to find a name that doesn't exist so that it
can give it to the program. Of course I think most
of those tmp files were generated by my program
during previous runs the last couple of days.  
 
As a bonus, I discovered a memory leak in the
process of tracking the open() problem. I committed
a fix for that leak. This is only affecting Windows.
 
Short  term solution: flush your TMP folder.
Long term: we will into ways to improve our tmp file
strategy to overcome the shortcoming of Windows API.
This will come in a later date! :)
 
Cheers,
Jafar
 

On Thu, Jan 22, 2015 at 4:43 AM, Sergey Logichev
<[email protected]> wrote:
      Jafar,
 
You've provided very interesting version of
walk directory algorithm. Communication with
active threads' is a great thing!
I have checked your program under Windows 7. I
was confused the fact that execution time is
negligibly depended on number of concurrent
threads. I dug into and discovered that the
first operation open(s) takes near ALL
execution time! 95% at least. Check it
yourself when you slightly edit getdirs():
...
if ( stat(s).mode ? ="d" ) & ( tm := &time, d
:= open(s) ) then {
      if n=1 then write(s," : ",&time-tm)
...
 
 
So, if first open() is so long then all other
enhancements have no sense. Please clarify if
I am wrong.
 
Best regards,
Sergey
 
22.01.2015, 00:58, "Jafar Al-Gharaibeh"
<[email protected]>:
      Here is a slightly
      tweaked/reformatted version. It
      now by default auto-detect the
      number of available cores in the
      machine and launch twice as many
      threads.  
--Jafar

On Wed, Jan 21, 2015 at 12:17 PM, Jafar
Al-Gharaibeh <[email protected]> wrote:
      David,  
    I added a threaded solution@ 
http://rosettacode.org/wiki/Walk_a_directory/Recursively#Icon_and_Unicon

?? ?   Please review/edit as you see

fit. (The source file is
attached). Combining recursion
with thread might not be the best
solution for this problem. If I
were to put this in real use I'd
go with an iterative approach
using master/workers model.
Anyway, this is a excellent
demonstration on how to use
threads!. The key features are:
 
   1- How to create threads, limit
their numbers, self-load balanced
(new threads  are spawned at the
time/place where needed. One they
are done, they vanish allowing new
threads to pop up in new places in
the directory structure)
   2- pass data and collect
results to/from the threads using
the new language features.
 
 
Here is some sample output from my
desktop machine (quad-core with
mechanical HDD. I will try another
machine with an SSD and see if
more threads scale better). 
 
the first argument to the program
is the target directory. The
second is the maximum number of
 concurrent threads to use at any
given moment. (soft limit! my
counters are "unmutexed", so the
actual number might deviate). Note
that this is different from the
actual number of threads used
during the run which is reported
at the end. The program can
create/destroy threads as needed,
but cannot  use more than "max" #
of threads at any given moment,
and again "max" is "soft". :)
 
Cheers,
Jafar
 
c:\proj>tdir c:\ 1
39708 directories in 99867 ms
using 1 threads
 
c:\proj>tdir c:\ 4
39708 directories in 62222 ms
using 4 threads
 
c:\proj>tdir c:\ 4
39708 directories in 87650 ms
using 4 threads
 
c:\proj>tdir c:\ 1
39708 directories in 92525 ms
using 1 threads
 
c:\proj>tdir c:\ 4
39708 directories in 95655 ms
using 4 threads
 
c:\proj>tdir c:\ 16
39708 directories in 66138 ms
using 21 threads
 
c:\proj>tdir c:\ 8
39708 directories in 69307 ms
using 8 threads
 
c:\proj>tdir c:\ 4
39708 directories in 70539 ms
using 4 threads
 
c:\proj>tdir c:\ 16
39708 directories in 76392 ms
using 32 threads
 
 

On Sun, Jan 11, 2015 at 1:25 PM,
David Gamey
<[email protected]> wrote:
      Sergey,
 
I am responsible for much of
the Rosetta code
contributions (thanks also
to Steve, Andrew, Matt,
Peter, and about 4 others)
and this one in particular
dating from 2010. As I
recall this was before the
multi-threading versions
were widely available. I
think multi-threading is
underrepresented in
Rosetta/Unicon.
 
If you come up with a
multi-threading version, we
should add it to the post as
an alternative version.  If
you don't feel comfortable
doing this, post the code
and I can add it.
 
David
 

____________________________________________________________________________
      From: Sergey
      Logichev
      <[email protected]>
      To: Jafar
      Al-Gharaibeh
      <[email protected]>
      Cc: Unicon group
      <[email protected]>
      Sent: Sunday,
      January 11, 2015
      1:16 AM
      Subject: Re:
      [Unicon-group]
      Walk of file
      directory

Jafar,
 
Thank you for a whole
bundle of advices and
suggestions! Threads
are worth to try. The
thought of search by
file attributes is
very useful too. Your
suggestion about slow
I/O partly is right.
For UNIX I tried the
program on Raspberry
Pi with 6 Class
microSD as HDD (it's
slow, agree). But for
Windows it was quite
fast HDD. It would be
interesting to compare
performance of the
program on Windows
with classic approach
based on Win32
_FINDFIRST, _FINDNEXT
functions. I have
threaded
Delphi/Lazarus
implementations of
this algorithm. Feel
that it will be faster
but in which degree?
 
Sergey
 
10.01.2015, 21:50,
"Jafar Al-Gharaibeh"
<[email protected]>:


      Sergey,  
  There are so
many things that
came to mind
when I saw your
program.
 
1-  At the end
of your email,
sourceforge ad
says "Go
Parallel", Which
is not a bad
idea for this
highly parallel
application. 
 
 There is a
similar program
"wordcount"
listed in my
dissertation
(available on
unicon.org) that
go through
directories and
count words in
every file using
threads (Chapter
7, page 107)
 
2- Unicon open()
already supports
" pattern
matching that
would greatly (I
believe) speedup
your program.
For example you
can do this:
    L :=
open("*.icn")
 
   to get a list
of all of Unicon
source files in
the current
directory. 
 
  Note: It would
be nice if there
were a way to
tell open() to
return files not
only based on a
pattern, but
also on file
attribute to
allow something
like "get me all
directories in
the current
directory", or
"get me all read
only file".
There are a lot
of situations
where filtering
directory names
for example is
very useful -
like this
program
 
3- The program
on Rosetta Code
is not optimized
for speed. You
can minimize the
number of lists
created and
put() by careful
rewriting of the
code.
 
4- Depending on
how deep the
directory tree
is, there might
be a lot of I/O
going on. A slow
disk might limit
how fast you can
go regardless of
how optimized
your code is.
 
I will share
results if get
around trying
any of these
options.
 
Cheers,
Jafar
 
 

On Sat, Jan 10,
2015 at 5:51 AM,
Sergey Logichev
<[email protected]>
wrote:
      Hello
      all!
 
Now I
investigate
the best
approach
to get
list of
files in
specified
directory
and
beneath in
Unicon.
I found
excellent
example at
rosettacode.org:
http://rosettacode.org/wiki/Walk_a_directory/Recursively#Icon_and_Unicon
 
I
reconstructed
this one
to
implement
matching
of
filenames
to
specified
pattern
(regular
expression).
My program
recursively
walks a
directory
and prints
appropriate
filenames.
The same
as dir
(ls) does.
All
working
fine
except
performance.
If
directory
has a lot
of subdirs
the search
may took
10-20
seconds
before
starting
output.
Could you
provide
some
advices
how to
enchance
the
performance?
 
Some notes
how to
make and
use.
Unpack
content of
udir.zip
to your
local
directory.
Define
which
environment
you use in
env.icn
file -
uncomment
line
"$define
_UNIX 1"
in the
case of
UNIX.
Nothing to
do in the
case of
Windows.
Make udir
program:
unicon -c
futils.icn
unicon -c
options.icn
unicon -c
regexp.icn
unicon
udir.icn
 
Usage:
udir
-f<filemask>
for
example:
udir
-f*.icn
shall list
of icn
files in
the
current
dir and
all its
subdirectories.
 
Best
regards,
Sergey
Logichev

---------------------------------------------------------------------------
---
Dive into
the World
of
Parallel
Programming!
The Go
Parallel
Website,
sponsored
by Intel
and
developed
in
partnership
with
Slashdot
Media, is
your
hub for
all things
parallel
software
development,
from
weekly
thought
leadership
blogs to
news,
videos,
case
studies,
tutorials
and more.
Take a
look and
join the
conversation
now.
http://goparallel.sourceforge.net
_______________________________________________
Unicon-group
mailing
list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group


---------------------------------------------------------------------------
---
Dive into the World of
Parallel Programming!
The Go Parallel
Website,
sponsored by Intel and
developed in
partnership with
Slashdot Media, is
your
hub for all things
parallel software
development, from
weekly thought
leadership blogs to
news, videos, case
studies, tutorials and
more. Take a
look and join the
conversation now.
http://goparallel.sourceforge.net

_______________________________________________
Unicon-group mailing
list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group






---------------------------------------------------------------------------
---
New Year. New Location. New Benefits. New Data Center in
Ashburn, VA.
GigeNET is offering a free month of service with a new server in
Ashburn.
Choose from 2 high performing configs, both with 100TB of
bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely
compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet

_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group

Re: [Unicon-group] [SPAM] Re: Walk of file directory

Reply via email to