Re: Can I "prune" directories with walkDirRect?

2020-06-08 Thread cblake
For what it's worth, at least for systems where one can assume post-POSIX.2008 
APIs like `openat` and `fstatat` (really any vaguely recent Linux/BSD), it is 
possible to roll your own recursion that, in my timings, is about 4x faster hot 
cache than `walkDirRec` (note no trailing 't'). What boost one gets depends on 
whether you need that `Stat` metadata (e.g. file times, sizes, owner, perms, 
etc.) or just path names. Those ideas are in the current 
`cligen/dents.nim:forPath` template for Unix users. (It could actually be sped 
up a couple ways still, but not very portably.)

Of course, depending on the scenario/hotness of caches, the boost may not 
matter much. Costs from the recursion may be tiny compared to IO/other work. Or 
it could dominate. Personally, I do a lot of work out of a `tmpfs /dev/shm` 
bind mount to `/tmp` which never has any IO.

Mostly I was just giving yet another syntax for packaging up recursions..one 
that lets the guts hang out more and the calling code has to/ ** _gets to_** be 
aware of that while maybe having delegated the low-level system stuff to the 
template author. Nim is pretty great like that.

BTW, I did re-arrange the order of the 4 event clauses to `always, preRec, 
postRec, recFail` and provide a `recFailDefault` template to make things read 
more nicely. So, my above code example won't quite work as written anymore. 
Best to start from one of the 4 worked out examples after the template in 
`dents.nim` if you want to use it.


Re: Can I "prune" directories with walkDirRect?

2020-06-01 Thread timothee
see also walkDirRecFilter since 
[https://github.com/nim-lang/Nim/pull/14501](https://github.com/nim-lang/Nim/pull/14501)
 but for now it's internal use only until API is deemed good 


Re: Can I "prune" directories with walkDirRect?

2020-05-27 Thread cblake
And just to close out the example more fully for @chalybeum, since he said he 
was a beginning programmer, he could probably start from the code below (after 
a `nimble install 'cligen@#head'`) to do whatever it was he wanted to six 
months ago if he even stuck with programming, with Nim and/or this forum: 


import sets, posix, cligen/[dents, statx, posixUt]
proc chalybeum(prunePath="", recurse=0, chase=false,
  xdev=false, roots: seq[string]) =
  ## ``prunePath`` file fmt is one base name per line.
  var prune: HashSet[string]
  for line in lines(prunePath): prune.incl line
  for root in roots:
forPath(root, recurse, false, chase, xdev, depth,
path, nameAt, ino, dt, lst, st, recFail):
  case errno
  of EXDEV, ENOTDIR: discard # Expected sometimes
  of EMFILE, ENFILE: return  # Too deep;Stop recurse
  else:
let m = "chalybeum: \"" & path & "\")"
perror cstring(m), m.len
do:
  echo path # chalybeum logic on `path` goes here
do: # Pre-recurse: skip dirs w/base names in prune
  if dt == DT_DIR and path[nameAt..^1] in prune:
continue
do: # Post-recurse; only `path` valid now
  if recFail: echo "did not recurse into ", path
when isMainModule:
  import cligen; cligen.dispatch(chalybeum)


Run

Replacing HashSet checking with regex prunes/excludes is not so hard. It is 
fast, does avoid symlink infinite loops (when optionally chasing), and 
conditionally avoids cross-device links which is kind of the standard set of 
Unix tree walk functionality, BUT I totally admit it's not a very easy to use 
programming interface. The logic of the recursive loop leaks out plenty. I just 
threw it together. I'm not sure it's much better than the example expanded 
recursion code would be. Maybe a little.


Re: Can I "prune" directories with walkDirRect?

2020-05-27 Thread cblake
I put various things that "might be useful" for Unix CLI utilities under 
`cligen/` so client code can just have `cligen` as a leaf/sole dependency. 
Directory tree recursion fits that pattern, and stdlib `walkDir*` never seemed 
quite right to me. There is already `cligen/posixUt.recEntries`, for example.


Re: Can I "prune" directories with walkDirRect?

2020-05-27 Thread kaushalmodi
Wow!.. why is it in **cligen** though? It looks like it can be a separate 
`find`-competition package :)


Re: Can I "prune" directories with walkDirRect?

2020-05-27 Thread cblake
Of course, that doesn't control recursive descent which was @chalybeum's 
driving use case but you did use a smiley. :-)

Based on possible broader interest and a general trend lately of trying to be 
less abstract, I just added a template-based tree iteration to 
`cligen/dents.nim`: 
[https://github.com/c-blake/cligen/commit/633da63a997269486f3e00432ec4ce37521fb530](https://github.com/c-blake/cligen/commit/633da63a997269486f3e00432ec4ce37521fb530)
 with a fully worked out example utility in `examples/chom.nim` as well as 4 
inline `cligen.dispatchMulti`-driven example usages.

The short of if it is that you can make things about 2x-8x faster on Linux if 
you just trust `d_type` and you only need path names, not, say, i-node data 
from lstat/stat/etc. Performance only matters for large directory hierarchies, 
obviously.


Re: Can I "prune" directories with walkDirRect?

2020-05-27 Thread juancarlospaco
... but walkPattern() does take a glob pattern. :) 


Re: Can I "prune" directories with walkDirRect?

2020-05-27 Thread cblake
@chalybeum ..the feature being mentioned seems to be the `FilterDescend` 
predicate function of the referenced package. You would just load up a Nim 
`HashSet` from `sets` with to be skipped paths and pass some predicate like 
`path notin blacklist`, with `blacklist` probably being a captured closure 
variable.

While we are resurrecting a zombie-ish thread to promote a package ;-), I can 
say something about performance expectations that may be uncommon knowledge. 
The GNU coreutils `find` goes through contortions to be able to traverse file 
hierarchies that are more deep than the limit on open file descriptors. This 
results in that `find` using like 3.5x the syscalls, 2.5x the CPU time, and 3x 
the RAM of more direct implementations. If that data must be read off a 
persistent IO device those usages will not be bottlenecks, but on a fully 
cached run they will be. So, a decent speed-up relative to GNU `find` on 
non-pathological file trees is sometimes possible, if that sort of speed-up 
motivates anyone.


Re: Can I "prune" directories with walkDirRect?

2020-05-26 Thread timothee
this is probably what you're looking for 
[https://github.com/citycide/glob](https://github.com/citycide/glob) but IMO 
there should be something equivalent in stdlib


Re: Can I "prune" directories with walkDirRect?

2020-05-25 Thread HVN
I was asking the same question on IRC and just found this. Come from Python, I 
tried my first Nim program by convert existing script which scans a directory 
of 300k files to filter out 25k files. The Nim version would takes ~ 17s to 
run, as it scans all the directories while using find -prune or python os.walk 
and remove excluded dirs from dirs, which run in 1s. This would be really great 
feature to have in stdlib.


Re: Can I "prune" directories with walkDirRect?

2019-11-17 Thread sschwarzer
I guess the link to the Python 2 version of the library was only by accident. 
If some new functionality in Nim should be modeled after Python, refer to the 
documentation for Python 3.

For most older libraries, there shouldn't be a big difference, but for newer 
libraries there may be, and even older modules might be improved in Python 3.

So, here's the link: 
[https://docs.python.org/3/library/os.html#os.walk](https://docs.python.org/3/library/os.html#os.walk)
 . Note that I used `/3/` in the URL, so you'll get the documentation for the 
most recent Python 3 version. If you select a specific version (e. g. 3.8) from 
the drop-down menu at the top of the page, you'll get the documenation as of 
this version. 


Re: Can I "prune" directories with walkDirRect?

2019-11-17 Thread cumulonimbus
The python os.walk is exceptionally convenient and supports such a use case - 
the iterator returns 3 components: "path", "dirs" and "files"; the user has to 
enumerate "files" (or dirs) themselves, and join themm to the "path" for the 
list of files, but can also ignore dirs or modify it - the iterator will only 
recurse into those still listed into dirs when re-called, so: if you ignore 
dirs, you get a standard recursion; if you empty it out, you get no recursion 
down from this path; and if you filter it, you get selective recursion. It has 
a few more bells and whistles that cover just about all use cases I've 
encountered: 
[https://docs.python.org/2/library/os.html?highlight=walk#os.walk](https://docs.python.org/2/library/os.html?highlight=walk#os.walk)

Worth adding to standard library, I think. 


Re: Can I "prune" directories with walkDirRect?

2019-11-15 Thread chalybeum
But that would still not omit certain directories, or am I mistaking s.th.?


Re: Can I "prune" directories with walkDirRect?

2019-11-14 Thread juancarlospaco
One of the overloads of walkdir takes a Posix Glob that can be recursive and 
filtering at the same time, kinda `"**/*.pyc"` or similar. 


Re: Can I "prune" directories with walkDirRect?

2019-11-14 Thread chalybeum
Thanks, I was hoping to avoid that. But it will be a good exercise. I am also 
thinking of I just call the existing find and read what I need from a temp 
file. Or could I get significantly faster with implementing it myself?


Re: Can I "prune" directories with walkDirRect?

2019-11-14 Thread sky_khan
walkDirRec does recursively search all files/dirs. If you dont want to enter 
some directories, I guess you need to implement your own recursive "walking" 
logic with walkDir, then you can have an "exclude_dirs : seq[string]" variable 


Can I "prune" directories with walkDirRect?

2019-11-13 Thread chalybeum
Hi there, aspiring programmer and total Nim-Noob is asking for your wisdom. In 
order to dive deeper in to the adventure that programming is I decided to have 
a go on Nim. Thought it would be a good idea to re implement s.the. I already 
did in Bash, just to have the logic out of the way. Now this project involves 
indexing large parts of / and ~ but I want to leave out logs, cashes and some 
other stuff.

Given following dir-structure, 


.__ file_a
|
|__folder_1
| |__file_1.a
| |__file_1.b
|
|__folder_2
   |__file_2.a
   |__file_2.b


Run

I tried the following: 


import os, re

for file in walkDirRec ".":
  if file.match (re"\S*folder_1\S*"):
echo "NO!"
continue
  echo file



Run

this would output: 


./file_a
NO!
NO!
./folder_2/file_2.a
./folder_2/file_2.b


Run

Now I can achieve my goal with that and e.g. write only the paths I want to a 
file or something. But there are two things bugging me about this: First: In 
Bash, I used find to pipe the paths into a file. With the prune flag I could 
stop find from descending into those directories completely and therefore save 
quite a bit of time. The above method would iterate over every single file 
anyway. Second: I'd prefer to have an array where the folders to be excluded 
are stored. As strings maybe? (I played a bit with Python and their os.walk can 
do that). But my attempts to get this working based on string comparison were 
futile, to say the least.

Now I know that as a newcomer to programming I might very well be of on a 
completely wrong track and I have to tackle things in another way to begin 
with. But it somehow stumps me, that I was able to figure this out in two other 
languages and am so lost here in Nim. I guess the easy to pick up part is more 
in the syntactic part than the approach?

Anyway, any little bit of guidance, regardless of direction, would be much 
appreciated. Greetings, Markus