Re: Can I "prune" directories with walkDirRect?
For what it's worth, at least for systems where one can assume post-POSIX.2008 APIs like `openat` and `fstatat` (really any vaguely recent Linux/BSD), it is possible to roll your own recursion that, in my timings, is about 4x faster hot cache than `walkDirRec` (note no trailing 't'). What boost one gets depends on whether you need that `Stat` metadata (e.g. file times, sizes, owner, perms, etc.) or just path names. Those ideas are in the current `cligen/dents.nim:forPath` template for Unix users. (It could actually be sped up a couple ways still, but not very portably.) Of course, depending on the scenario/hotness of caches, the boost may not matter much. Costs from the recursion may be tiny compared to IO/other work. Or it could dominate. Personally, I do a lot of work out of a `tmpfs /dev/shm` bind mount to `/tmp` which never has any IO. Mostly I was just giving yet another syntax for packaging up recursions..one that lets the guts hang out more and the calling code has to/ ** _gets to_** be aware of that while maybe having delegated the low-level system stuff to the template author. Nim is pretty great like that. BTW, I did re-arrange the order of the 4 event clauses to `always, preRec, postRec, recFail` and provide a `recFailDefault` template to make things read more nicely. So, my above code example won't quite work as written anymore. Best to start from one of the 4 worked out examples after the template in `dents.nim` if you want to use it.
Re: Can I "prune" directories with walkDirRect?
see also walkDirRecFilter since [https://github.com/nim-lang/Nim/pull/14501](https://github.com/nim-lang/Nim/pull/14501) but for now it's internal use only until API is deemed good
Re: Can I "prune" directories with walkDirRect?
And just to close out the example more fully for @chalybeum, since he said he was a beginning programmer, he could probably start from the code below (after a `nimble install 'cligen@#head'`) to do whatever it was he wanted to six months ago if he even stuck with programming, with Nim and/or this forum: import sets, posix, cligen/[dents, statx, posixUt] proc chalybeum(prunePath="", recurse=0, chase=false, xdev=false, roots: seq[string]) = ## ``prunePath`` file fmt is one base name per line. var prune: HashSet[string] for line in lines(prunePath): prune.incl line for root in roots: forPath(root, recurse, false, chase, xdev, depth, path, nameAt, ino, dt, lst, st, recFail): case errno of EXDEV, ENOTDIR: discard # Expected sometimes of EMFILE, ENFILE: return # Too deep;Stop recurse else: let m = "chalybeum: \"" & path & "\")" perror cstring(m), m.len do: echo path # chalybeum logic on `path` goes here do: # Pre-recurse: skip dirs w/base names in prune if dt == DT_DIR and path[nameAt..^1] in prune: continue do: # Post-recurse; only `path` valid now if recFail: echo "did not recurse into ", path when isMainModule: import cligen; cligen.dispatch(chalybeum) Run Replacing HashSet checking with regex prunes/excludes is not so hard. It is fast, does avoid symlink infinite loops (when optionally chasing), and conditionally avoids cross-device links which is kind of the standard set of Unix tree walk functionality, BUT I totally admit it's not a very easy to use programming interface. The logic of the recursive loop leaks out plenty. I just threw it together. I'm not sure it's much better than the example expanded recursion code would be. Maybe a little.
Re: Can I "prune" directories with walkDirRect?
I put various things that "might be useful" for Unix CLI utilities under `cligen/` so client code can just have `cligen` as a leaf/sole dependency. Directory tree recursion fits that pattern, and stdlib `walkDir*` never seemed quite right to me. There is already `cligen/posixUt.recEntries`, for example.
Re: Can I "prune" directories with walkDirRect?
Wow!.. why is it in **cligen** though? It looks like it can be a separate `find`-competition package :)
Re: Can I "prune" directories with walkDirRect?
Of course, that doesn't control recursive descent which was @chalybeum's driving use case but you did use a smiley. :-) Based on possible broader interest and a general trend lately of trying to be less abstract, I just added a template-based tree iteration to `cligen/dents.nim`: [https://github.com/c-blake/cligen/commit/633da63a997269486f3e00432ec4ce37521fb530](https://github.com/c-blake/cligen/commit/633da63a997269486f3e00432ec4ce37521fb530) with a fully worked out example utility in `examples/chom.nim` as well as 4 inline `cligen.dispatchMulti`-driven example usages. The short of if it is that you can make things about 2x-8x faster on Linux if you just trust `d_type` and you only need path names, not, say, i-node data from lstat/stat/etc. Performance only matters for large directory hierarchies, obviously.
Re: Can I "prune" directories with walkDirRect?
... but walkPattern() does take a glob pattern. :)
Re: Can I "prune" directories with walkDirRect?
@chalybeum ..the feature being mentioned seems to be the `FilterDescend` predicate function of the referenced package. You would just load up a Nim `HashSet` from `sets` with to be skipped paths and pass some predicate like `path notin blacklist`, with `blacklist` probably being a captured closure variable. While we are resurrecting a zombie-ish thread to promote a package ;-), I can say something about performance expectations that may be uncommon knowledge. The GNU coreutils `find` goes through contortions to be able to traverse file hierarchies that are more deep than the limit on open file descriptors. This results in that `find` using like 3.5x the syscalls, 2.5x the CPU time, and 3x the RAM of more direct implementations. If that data must be read off a persistent IO device those usages will not be bottlenecks, but on a fully cached run they will be. So, a decent speed-up relative to GNU `find` on non-pathological file trees is sometimes possible, if that sort of speed-up motivates anyone.
Re: Can I "prune" directories with walkDirRect?
this is probably what you're looking for [https://github.com/citycide/glob](https://github.com/citycide/glob) but IMO there should be something equivalent in stdlib
Re: Can I "prune" directories with walkDirRect?
I was asking the same question on IRC and just found this. Come from Python, I tried my first Nim program by convert existing script which scans a directory of 300k files to filter out 25k files. The Nim version would takes ~ 17s to run, as it scans all the directories while using find -prune or python os.walk and remove excluded dirs from dirs, which run in 1s. This would be really great feature to have in stdlib.
Re: Can I "prune" directories with walkDirRect?
I guess the link to the Python 2 version of the library was only by accident. If some new functionality in Nim should be modeled after Python, refer to the documentation for Python 3. For most older libraries, there shouldn't be a big difference, but for newer libraries there may be, and even older modules might be improved in Python 3. So, here's the link: [https://docs.python.org/3/library/os.html#os.walk](https://docs.python.org/3/library/os.html#os.walk) . Note that I used `/3/` in the URL, so you'll get the documentation for the most recent Python 3 version. If you select a specific version (e. g. 3.8) from the drop-down menu at the top of the page, you'll get the documenation as of this version.
Re: Can I "prune" directories with walkDirRect?
The python os.walk is exceptionally convenient and supports such a use case - the iterator returns 3 components: "path", "dirs" and "files"; the user has to enumerate "files" (or dirs) themselves, and join themm to the "path" for the list of files, but can also ignore dirs or modify it - the iterator will only recurse into those still listed into dirs when re-called, so: if you ignore dirs, you get a standard recursion; if you empty it out, you get no recursion down from this path; and if you filter it, you get selective recursion. It has a few more bells and whistles that cover just about all use cases I've encountered: [https://docs.python.org/2/library/os.html?highlight=walk#os.walk](https://docs.python.org/2/library/os.html?highlight=walk#os.walk) Worth adding to standard library, I think.
Re: Can I "prune" directories with walkDirRect?
But that would still not omit certain directories, or am I mistaking s.th.?
Re: Can I "prune" directories with walkDirRect?
One of the overloads of walkdir takes a Posix Glob that can be recursive and filtering at the same time, kinda `"**/*.pyc"` or similar.
Re: Can I "prune" directories with walkDirRect?
Thanks, I was hoping to avoid that. But it will be a good exercise. I am also thinking of I just call the existing find and read what I need from a temp file. Or could I get significantly faster with implementing it myself?
Re: Can I "prune" directories with walkDirRect?
walkDirRec does recursively search all files/dirs. If you dont want to enter some directories, I guess you need to implement your own recursive "walking" logic with walkDir, then you can have an "exclude_dirs : seq[string]" variable
Can I "prune" directories with walkDirRect?
Hi there, aspiring programmer and total Nim-Noob is asking for your wisdom. In order to dive deeper in to the adventure that programming is I decided to have a go on Nim. Thought it would be a good idea to re implement s.the. I already did in Bash, just to have the logic out of the way. Now this project involves indexing large parts of / and ~ but I want to leave out logs, cashes and some other stuff. Given following dir-structure, .__ file_a | |__folder_1 | |__file_1.a | |__file_1.b | |__folder_2 |__file_2.a |__file_2.b Run I tried the following: import os, re for file in walkDirRec ".": if file.match (re"\S*folder_1\S*"): echo "NO!" continue echo file Run this would output: ./file_a NO! NO! ./folder_2/file_2.a ./folder_2/file_2.b Run Now I can achieve my goal with that and e.g. write only the paths I want to a file or something. But there are two things bugging me about this: First: In Bash, I used find to pipe the paths into a file. With the prune flag I could stop find from descending into those directories completely and therefore save quite a bit of time. The above method would iterate over every single file anyway. Second: I'd prefer to have an array where the folders to be excluded are stored. As strings maybe? (I played a bit with Python and their os.walk can do that). But my attempts to get this working based on string comparison were futile, to say the least. Now I know that as a newcomer to programming I might very well be of on a completely wrong track and I have to tackle things in another way to begin with. But it somehow stumps me, that I was able to figure this out in two other languages and am so lost here in Nim. I guess the easy to pick up part is more in the syntactic part than the approach? Anyway, any little bit of guidance, regardless of direction, would be much appreciated. Greetings, Markus