Re: Python transpiler
I think part of the "sales pitch" for Nim out in the world, for good or ill, is "sorta like Python but compiled with static types". It isn't so strange that people show up asking how to automate "sorta like" into "more so". ;-) The vast majority of Python code that I have seen uses precious little of Python's very dynamic nature. So, doing a half-baked py2nim with a **_slew_** of caveats might have some value. It's not like `c2nim` is perfect, either. The questions are more A) if the caveats are beyond the grasp of the typical person porting such code and B) if the rather open-ended nature (ever changing Py3) would leave it half-done forever. @juancarlospaco may be right that a group effort here could yield the best results. It isn't my question to answer, but as to "why?", it is likely "easier good performance". Python unassisted by C/Cython/Pythran/etc. or Pypy is one of the slowest programming languages around. Combined with piles and piles of Python code out there doing xyz useful thing that could be safer/faster/etc. with Nim. Nim just invoking/calling that code is one way to access functionality. If it was too slow to begin with, it won't help. People may have misplaced expectations about how little they have to think in order to improve performance, of course.
Re: Python transpiler
@ShalokShalom can perhaps clarify what he meant, but I took him to be asking for a `c2nim`-like program `python2nim` to help facilitate porting pure Python code to pure Nim code (as opposed to using already written Nim code from within Python or already written Python code from within Nim). Some runtime libraries to bridge API differences might also help along those lines. Some quick web searchers turn up a few partial moves along this direction.
Re: Introducing --gc:arc
You should also try PGO both with gcc and clang. [https://forum.nim-lang.org/t/6295](https://forum.nim-lang.org/t/6295) { yes, I know you posted that. :-) xref for others. :-) }
Re: Why does wrapping the code in a top level procedure make it faster?
In my experience batteries are almost never fully charged and it's hard to get feedback if you only release a perfect cathedral. With no feedback, it's kind of unlikely you will build something popular. To take a topical example, even after 30 years of tuning, the core language `dict` in Python still seems to have no easy way to "pre-size" an instance. There is no power of 2, no `rightSize`, nuthin'. So, one of your rough edges literally cannot arise due to an inflexibility/API design flaw (IMO). Yes, there must be 3rd party replacements or ways to pre-size in some slow subclass or whatever, but you could also just write a `proc initTab` that always calls `rightSize` for your own code. What's at issue here is "out of the box" (and personally I think the Nim workaround is less onerous than workarounds in the Python world). Do you have to learn your way around? Sure. A sales pitch of "just like Python but compiled/with native types" is definitely not quite right. That's Cython/similar. Analogies and oversimplifications help some people while others want the real story, and you strike me as the latter. Nim gives programmers more power and flexibility, but responsibilities also come with that. Cue Spider Man's Uncle Ben. ;-) It is yet a comparatively tiny community. Bugs, workarounds, and rough edges come with that. Unused patterns like your `&` are just unused, unoptimized things waiting to be discovered/fixed. There was a time when `C` had no proper namespacing of struct members and that is why some core Unix types have prefixes like the `st_*` in `struct stat` (run "man 2 stat" on any unix machine). No one using Nim _wants_ it to be hard, but there may also be good reasons some things are the way they are (often flexibility, performance, or safety, but yes sometimes neglect). I'm really sorry to hear your entry has been tough, but the flip side of that is you could probably make a HUGE impact to future similar-but-not-quite-you's, and I again encourage you to try! Even just documenting everything somewhere would probably help at least a few other Python Refugees. :-) Cheers and best wishes whatever you decide.
Re: Why does wrapping the code in a top level procedure make it faster?
As the person who added `rightSize`, I agree with those points actually. The only slight gotcha is that doing as you suggest would result in old code that calls it with (the correct power of two to avoid a resize, or maybe already a `rightSize`) would allocate the _next higher_ power, wasting space. That might be a price worth paying ease-of-use wise. We _could_ only call `rightSize` if the argument is not a power of two, but it's probably simpler to just always use `rightSize` and just tell people. Wasting space isn't quite a "failure". Anyway, you should feel free to file a github issue and/or work on a pull request. Maybe I should have pushed harder at the time to change the parameter semantics. There is definitely a learning curve coming from other languages. There are definitely more examples of the stdlib not providing the "best in class" for various functionalities. There will likely be even more with time. Nim core only has so much manpower to write/maintain/support such things. We discussed doing a "distribution" over at [https://github.com/nim-lang/RFCs/issues/173](https://github.com/nim-lang/RFCs/issues/173) and elsewhere. This "fusion" repository (see end of that thread) seems to have been the output, but not much has been happening over there. Anyway, I think if you persevere there are good chances you could be a happy Nimmer once you've learned your way around. You seem pretty capable of finding these issues and you could probably help round some of the sharp edges, if you can sustain the patience. There is probably a Porting to Nim From Python guidebook or maybe you could help do one!
Re: Why does wrapping the code in a top level procedure make it faster?
Ah. Sorry that I misunderstood the sarcasm. Maybe my content/xref was not worthless anyway. ;-)
Re: Why does wrapping the code in a top level procedure make it faster?
@Stefan_Salewski \- The idea of defaulting to an optimizing mode has come up before. It seems contrary to what almost any user coming from a "compiled language" world would expect. For various reasons (debugability, compile-time performance, etc.), almost all compilers default to a non-optimized (or very weakly optimized) output and allow various knobs to crank up optimization, as does the current `nim` setup. There is even a whole dark art of "best optimization flags" which can be taken [to severe extremes](https://github.com/Acovea/libacovea). More simply/efficiently/some might say intelligently, you can often use PGO [https://forum.nim-lang.org/t/6295](https://forum.nim-lang.org/t/6295) to get 1.25..2.0x boosts on object code generated from nim-generated C. Some flags like `-ffast-math` can even change semantics in subtle ways that can impact correctness or not depending on use cases. I don't know what to do about people publicizing misleading benchmarks. That seems an ineliminable hazard, not only for Nim, but actually **_everywhere and all the time_** , and often not on purpose (though most "marketing" wants to do strawman comparisons on purpose). Besides compiling wrong, they could also use a bad algorithm, misrepresentative inputs, weird machines, benchmarks naive relative to intended measurements, and probably several other mistake categories. :-/ The best strategy may be trying to be supportive where & when we can, educating as we go, though I agree/admit that is a neverending battle.
Re: NvP: s.add('x') 100M times
Github issue: [https://github.com/nim-lang/Nim/issues/14811](https://github.com/nim-lang/Nim/issues/14811)
Re: NvP: s.add('x') 100M times
I was mistaken. I was compiling my `seq` test with `-d:useMalloc` which fixes the problem. Sorry..fiddling with too many knobs. `string` and `seq[char]` behave identically with `gc:arc` and both get fixed (139MB) with `--gc:arc -d:useMalloc`. Other GCs (including `none`) still beat `gc:arc`-without-`useMalloc` on `string`. However, other gc's spend more memory (like 420MB) than `gc:arc` on `seq[char]`. So, whatever is going on, `seq` is actually worse than `string`, not better. { But also a side note for @HashBackupJim to try `-d:useMalloc` with `--gc:arc`. } At this point, I should raise an issue over at Github. I'll link it back here.
Re: NvP: s.add('x') 100M times
One other Nim-level thing I can say is that things work as expected for `seq[int]` of the same memory scale (100MB). I.e., proc main() = var s: seq[int] for i in 0..12_500_000: s.add 1 echo len(s) main() Run produces a memory report (using /usr/bin/time on Linux) like: 187MB seq2-arc 250MB seq2-default 250MB seq2-msweep 265MB seq2-boehm 300MB seq2-none Run So, this problem is **_only_** for Nim `string`. Indeed, if one changes `string` to `seq[char]` in the original example, usage goes down to 139MB, roughly what one would expect for a 3/2 growth policy.
Re: NvP: s.add('x') 100M times
The same problem happens on nim-1.2.0 (well, nim-devel git hash ed44e524b055b985b0941227a61f84dd8fdfcb37). So, this is a long-lived, maybe since the beginning behavior of `gc:arc` (but we should still have a memory overuse regression test). Probably time to look at generated C.
Re: NvP: s.add('x') 100M times
@HashBackupJim \- `newSeqOfCap[T](someLen)` also exists and, yes, pre-sizing can help a lot in Nim (and almost any lang that supports it). Profile-guided-optimization at the gcc level can also help Nim run timings a lot..In this case 1.6x to 1.9x for various `gc` modes. [https://forum.nim-lang.org/t/6295](https://forum.nim-lang.org/t/6295) explains how. LTO also helps since most of the boost of PGO is probably from well chosen inlining. @sschwartzer \- not only string benchmarks...Interpreter start-up/etc. Anyway, this isn't a Python forum, and benchmarking "always depends". :-) :-) Someone else should reproduce my `--gc:arc` uses more memory than `gc:none` for the original str1.nim or one with a `main()` (or both). I think this kicking the tires has probably uncovered a real problem.
Re: NvP: s.add('x') 100M times
I also usually find Py2 much faster than Py3. Pypy usually helps. Cython more so, but with much more work. Anyway, the obvious better way to do it in Nim (which I always assumed was "never the point") is var s = newStringOfCap(100_000_001) # or whatever for i in 0..100_000_000: s.add('x') echo len(s) Run which runs 2x as fast as otherwise and uses exactly the right amount of memory. I mention it just in case @HashBackupJim was unaware.
Re: NvP: s.add('x') 100M times
I don't see how your linked algo explains deltas across gcs if that `3 div 2` growth happens for all of them. The memory `gc:arc` uses here seems more like the sum of all prior allocations, not "up to 1.5x what's needed". Actually, `gc:arc` uses 1.25x more mem than `gc:none` (312MB) in a test I just tried.
Re: NvP: s.add('x') 100M times
Yup. Just what I was seeing, @b3liever. No `main()`-difference to the RSS delta, and a very noticable delta in a non-intuitive direction. So, either our intuitions are wrong in a way which should be clarified or there's a problem which should be fixed. Maybe a github issue?
Re: NvP: s.add('x') 100M times
@cumulonimbus \- I tried that. Didn't alter the behavior I was seeing. If this behavior was not always there then my guess is that some arc bug was causing a crash, got fixed, and now the fix causes this. Regardless of whether it was always there or appeared by bug-jello-squishing accident as I theorize, we should probably have a little suite of "memory use regression" tests to prevent stuff like the scenario I described. Such a suite would be a kind of "correctness testing" for deterministic memory management. Could have a "fuzzy/ball park compare". Maybe we have such already, perhaps informally? If so, we should add this `str1` to it. If not, it can be the first test. :-)
Re: NvP: s.add('x') 100M times
I don't disagree. Might need delving into the generated C to figure out, but I'm guessing my results are not hard to reproduce. If they are let me know how I can best help.
Re: NvP: s.add('x') 100M times
Just did a non-PGO regular `-d:danger` run. Times went up 1.9x but memory usage patterns were the same with `gc:arc` using much more RSS than `gc:boehm` or `gc:markAndSweep`. It's a pretty tiny program.
Re: NvP: s.add('x') 100M times
For this particular benchmark `--gc:boehm` uses the least memory and time for me on nim 28510a9da9bf2a6b02590ba27b64e951a208b23d with gcc-10.1 and PGO but that least is still 2.5x the RSS of python-2.7.18. Not sure why, but yeah it is 35x faster than Python.
Re: Dictionary syntax
In other words, it seems "excessive" to you because you do not imagine that you would never use anything but the stdlib `Table` "behind" `{}`. Never switching out something like that is "famous last words" in some circles. :-) So, the verbosity guards against the regret of hard-coding.
Re: Dictionary syntax
The `{}` syntax is what the Lisp world calls an "association-list" and is more general than what the Python world calls a dictionary literal/constructor syntax. In Nim `{a:b,c:d, ...}` is just syntactic sugar for `[ (a,b), (c,d), ... ]` (or `@[]` if you prefer a seq analogy, but it's at compile time when either seq or array lengths are "known"). This is a good in that you can use the same syntax for multiple back-end associative array implementations (like a simple sorted array, trees, alternate hash tables, etc.). In a sense the syntax is only loosely coupled to the semantics here. The cost is the slight extra verbosity of tacking on a `.toTable`, `toSortedArray`, `.toBST`, `.toBTree` or whatever which is deemed an acceptable trade off for the generality. Nim is not Python and you do have to put types/type-related things in various places.
Re: bizarre name clash with template
Re: "shadow"...Eh, there are a few very specific usages of it like a local shadowing a parameter or nested scope shadowing outer scope and so on, but the word/idea generally captures the ' must both be "x" ' idea fundamental to this bug, at least to my ears/eyeballs. Anyway your reproductions are very small/self-contained. I expect Nim core can fix it or at least produce a better error message or at the very least document the limitation (unless it already is documented).
Re: bizarre name clash with template
Sorry - I read more your 2nd whole comment than your 2nd sentence and replied too quickly. Oops. It does indeed seem weird that `for y in [1]: bod(y)` works while `for x in [1]: bod(x)` fails. I'd file a bug called something like "template-local symbol mishandled when shadowing a sub-proc-scope inject" or something similar.
Re: bizarre name clash with template
You need another `{.inject.}` at the site of the 2nd let for the call site, i.e., inside the `template`: let x {.inject.} = 1 bod(x) Run Or you could mark the whole template `{.dirty.}`.
Re: Can I "prune" directories with walkDirRect?
For what it's worth, at least for systems where one can assume post-POSIX.2008 APIs like `openat` and `fstatat` (really any vaguely recent Linux/BSD), it is possible to roll your own recursion that, in my timings, is about 4x faster hot cache than `walkDirRec` (note no trailing 't'). What boost one gets depends on whether you need that `Stat` metadata (e.g. file times, sizes, owner, perms, etc.) or just path names. Those ideas are in the current `cligen/dents.nim:forPath` template for Unix users. (It could actually be sped up a couple ways still, but not very portably.) Of course, depending on the scenario/hotness of caches, the boost may not matter much. Costs from the recursion may be tiny compared to IO/other work. Or it could dominate. Personally, I do a lot of work out of a `tmpfs /dev/shm` bind mount to `/tmp` which never has any IO. Mostly I was just giving yet another syntax for packaging up recursions..one that lets the guts hang out more and the calling code has to/ ** _gets to_** be aware of that while maybe having delegated the low-level system stuff to the template author. Nim is pretty great like that. BTW, I did re-arrange the order of the 4 event clauses to `always, preRec, postRec, recFail` and provide a `recFailDefault` template to make things read more nicely. So, my above code example won't quite work as written anymore. Best to start from one of the 4 worked out examples after the template in `dents.nim` if you want to use it.
Re: Further development of a tree iterator that allows mutations
I think not descending into a just-inserted-node makes total sense. By definition, you had to either unlink such a subtree from somewhere else or you had to forge it/create it. So, treating it like an atom in this iterator makes sense just as not descending the deleted node does. This solves the delete-then-insert right at the same spot problem, too. I think if you treat the inserts as "atoms" like the "deletes" it works out. You might want to provide an API call named "replace" that does the delete/insert pair so client code can be clear about that. Not having the "iteration contract" mean descending into either seems the clean semantic. Just my opinion, of course. It might be worth searching for XML Tree (or HTML DOM) surgery APIs in other programming languages for ideas. Another possible use case of all this might be editing Nim ASTs in macros which I don't think has been mentioned thus far. (I didn't check IRC.)
Re: Walking trees without recursive iterators
Yeah..By "either order" I just meant inserting after deleting and vice-versa as you figured out. I like your idea of "do" being spelled differently for the various sections! (If that can be done easily.) No doubt modifying structures while iterating can be hairy tricky and is tricky to have good interfaces for client code.
Re: Walking trees without recursive iterators
I don't see anything super obviously wrong, but I only just read through it. Your test suite looks like it could maybe use a mixture of inserts & deletes in either order while iterating, but @e / @spip / @GordonBGood / @rayman22201 should probably take a look. It may not be so appropriate to your "tree structure surgery while iterating" use case, but I happened to do Yet Another Idea just the other day over at [https://forum.nim-lang.org/t/5506](https://forum.nim-lang.org/t/5506) which bypasses the Nim iterator syntax framework for a Nim template that accepts multiple body clauses. So, `for path in itr(params): perElement` becomes: forPath(params): cannotRecurseWork do: perElementWork do: preRecurseWork do: postRecurseWork Run Internally, that `template forPath` is just a regular recursive function that exposes these "hooks" for the clauses. The end of that other thread has a more complete example of using that `forPath` defined in `https://github.com/c-blake/cligen/` in the `cligen/dents.nim` submodule. Something like XML would not need to worry about file system permission issues and that first clause. It is not very easy to use/safe, but it is yet another way to package up walking trees without recursive iterators that "looks similar" to Nim iterators that a reader of this thread may care to be aware of.
Re: Can I "prune" directories with walkDirRect?
And just to close out the example more fully for @chalybeum, since he said he was a beginning programmer, he could probably start from the code below (after a `nimble install 'cligen@#head'`) to do whatever it was he wanted to six months ago if he even stuck with programming, with Nim and/or this forum: import sets, posix, cligen/[dents, statx, posixUt] proc chalybeum(prunePath="", recurse=0, chase=false, xdev=false, roots: seq[string]) = ## ``prunePath`` file fmt is one base name per line. var prune: HashSet[string] for line in lines(prunePath): prune.incl line for root in roots: forPath(root, recurse, false, chase, xdev, depth, path, nameAt, ino, dt, lst, st, recFail): case errno of EXDEV, ENOTDIR: discard # Expected sometimes of EMFILE, ENFILE: return # Too deep;Stop recurse else: let m = "chalybeum: \"" & path & "\")" perror cstring(m), m.len do: echo path # chalybeum logic on `path` goes here do: # Pre-recurse: skip dirs w/base names in prune if dt == DT_DIR and path[nameAt..^1] in prune: continue do: # Post-recurse; only `path` valid now if recFail: echo "did not recurse into ", path when isMainModule: import cligen; cligen.dispatch(chalybeum) Run Replacing HashSet checking with regex prunes/excludes is not so hard. It is fast, does avoid symlink infinite loops (when optionally chasing), and conditionally avoids cross-device links which is kind of the standard set of Unix tree walk functionality, BUT I totally admit it's not a very easy to use programming interface. The logic of the recursive loop leaks out plenty. I just threw it together. I'm not sure it's much better than the example expanded recursion code would be. Maybe a little.
Re: Can I "prune" directories with walkDirRect?
I put various things that "might be useful" for Unix CLI utilities under `cligen/` so client code can just have `cligen` as a leaf/sole dependency. Directory tree recursion fits that pattern, and stdlib `walkDir*` never seemed quite right to me. There is already `cligen/posixUt.recEntries`, for example.
Re: Can I "prune" directories with walkDirRect?
Of course, that doesn't control recursive descent which was @chalybeum's driving use case but you did use a smiley. :-) Based on possible broader interest and a general trend lately of trying to be less abstract, I just added a template-based tree iteration to `cligen/dents.nim`: [https://github.com/c-blake/cligen/commit/633da63a997269486f3e00432ec4ce37521fb530](https://github.com/c-blake/cligen/commit/633da63a997269486f3e00432ec4ce37521fb530) with a fully worked out example utility in `examples/chom.nim` as well as 4 inline `cligen.dispatchMulti`-driven example usages. The short of if it is that you can make things about 2x-8x faster on Linux if you just trust `d_type` and you only need path names, not, say, i-node data from lstat/stat/etc. Performance only matters for large directory hierarchies, obviously.
Re: Can I "prune" directories with walkDirRect?
@chalybeum ..the feature being mentioned seems to be the `FilterDescend` predicate function of the referenced package. You would just load up a Nim `HashSet` from `sets` with to be skipped paths and pass some predicate like `path notin blacklist`, with `blacklist` probably being a captured closure variable. While we are resurrecting a zombie-ish thread to promote a package ;-), I can say something about performance expectations that may be uncommon knowledge. The GNU coreutils `find` goes through contortions to be able to traverse file hierarchies that are more deep than the limit on open file descriptors. This results in that `find` using like 3.5x the syscalls, 2.5x the CPU time, and 3x the RAM of more direct implementations. If that data must be read off a persistent IO device those usages will not be bottlenecks, but on a fully cached run they will be. So, a decent speed-up relative to GNU `find` on non-pathological file trees is sometimes possible, if that sort of speed-up motivates anyone.
Re: Introducing therapist, a new commandline parsing library
You might also find [https://github.com/c-blake/cligen](https://github.com/c-blake/cligen) interesting. The central insight is that a Nim proc signature (or Python or C++ or any language with default values...) already contains an adequate "spec". More meaningful per parameter help is about all that is missing, but can be easily "added in" when identifier names & types alone are insufficient descriptions of their purpose. There are other bells & whistles in `cligen` like prefix matching (bridging the gap from short to long options) that might also inspire you.
Re: A good word for idiomatic nim?
For those who may not know, NIMBY is an acronym for Not In My BackYard that is used quite a lot in some domains. Could confuse. Another possibility along the fun dimension might be ["nimmy"](https://www.urbandictionary.com/define.php?term=Nimmy).
Re: HashSet performance
Cool. Glad you got it working. That multiply-rotate hash in my last post does yield a noticeable boost. I get Python-3.8.2 times of 1.255..1.337, Nim-1.3.3 with default hash times of 0.391..0.531, and Nim-1.3.3 with that multiply-rotate of 0.205..0.347. So, with a fast, good enough hash Nim is over 6x faster than Python here (1.255/.205 = 6.12x). Using a faster hash yielding 0.391/0.205 = 1.9x speed-up just Nim-to-Nim is unusual, though not shocking. Usually, more work per-lookup is happening that masks the cost of the hash function calculation. The new default `hash` is "safer" against many hostile situations, such as this one you stumbled upon in your first post. One pays a little for that safety/generality, but with Nim it's easy to claw back that performance whenever it matters. Just 3 lines of code in this case.
Re: HashSet performance
You seem to be using an older version of Nim which has as a default the identity hash function (hash(x) = x) which leads, in this case, to poor performance. Just add this code (after your imports and before `sumA`): import hashes, bitops proc hash*(x: int): Hash {.inline.} = Hash(rotateLeftBits(uint64(x) * 15241094284759029579'u64, 27)) Run Alternatively, you could update Nim-1.2 or the head of Nim-devel. Also, that hash is just one that works. There are several other integer hashes available at: [https://github.com/c-blake/adix](https://github.com/c-blake/adix) in the file `althash.nim`.
Re: How to use Clang LTO + PGO with Nim
Yeah. I'm not sure what it is, exactly, about Nim-generated C code, but PGO tends to benefit it more than "my typical C". So, it makes sense the compiler itself could get a 1.25x speed-up. Might be more|less on gcc. I've probably mentioned this about 10x over the past 5 years, but here's one from about a year ago: [https://forum.nim-lang.org/t/4517](https://forum.nim-lang.org/t/4517)
Re: How to use Clang LTO + PGO with Nim
(Of course at least two C++ backends can also do PGO..So, probably just `nim BE --pgo prog args` is probably best if slightly more verbose.)
Re: How to use Clang LTO + PGO with Nim
I think simplifying life with a couple of wrapper scripts (`nim-pgo-gcc` and `nim-pgo-clang`, say) would help would-be PGO'ers, even if said scripts were only a starting point for the users' own scripts. In theory, the Nim compiler could itself grow a 2-phase compilation mode. Things are already set up so you can say `nim c --run prog args`. So, all we would really need to do is add a `--pgo` that works just like `--run` but A) uses a modified namespace/extra name for `compiler.options` (a place to put `-fprofile-generate`) and B) re-compiles everything again afterward with another modified namespace/extra name for `compiler.options` (a place to put `-fprofile-use`). I'd say that kind of fits well with existing nim CLI workflow. There are only minor risks like biasing folks to write programs that do a benchmark run when given no args or accidental infinite loop/giant benchmarks but those are both intrinsic to PGO in general. Maybe there is already a way to hack `config.nims` to be this smart? I think a lot more people would use it if it were as simple as `nim c --pgo -d:danger prog args`.
Re: Performance issues with "complex" module
With the gcc backend doing "profile guided optimization" can often help (especially with measurements to drive inlining choices). e.g., doing just @Stefan_Salewski's command-line I get: julia1: 84 ms (sum of pixels: 27677748) julia2: 83 ms (sum of pixels: 27677748) julia3: 82 ms (sum of pixels: 27677748) Run while doing this nim c -d:danger --panics:on -c t.nim gcc -O3 -flto -fprofile-generate -I/usr/lib/nim/lib ~/.cache/nim/r/t/*.c -o pg ./pg gcc -O3 -flto -fprofile-use -I/usr/lib/nim/lib ~/.cache/nim/r/t/*.c -o t-final Run I get: julia1: 82 ms (sum of pixels: 27677748) julia2: 82 ms (sum of pixels: 27677748) julia3: 82 ms (sum of pixels: 27677748) Run So, the PGO "flattened" the performance a bit more. In this example the PGO speed boost was close to zero/measurement error, but I have seen as high as 2x speed-ups for more complicated programs. So, it's worth having some little "nim-pgo" wrapper script to automate the above if you are writing programs that have an easy "benchmark run".
Re: How to download a file concurrently?
Sounds like an application of the "byte range requests" introduced in HTTP/1.1. You can definitely use those + multiple TCP connections to be as self-optimized/unfriendly to other internet traffic/robust to unusual packet loss/etc. situations as desired, but I don't know what Nim ecosystem support for that there is. (Just providing keywords to search for really..)
Re: Nim's strutils.split() slower than Python's string split()?
If you are not redirecting to a file but letting it display to the terminal then a big variable is what GUI terminal emulator you are using - XTerm, rxvt-unicode, st, etc. I think that could create a lot of varying numbers..why it may even depend on if you use TrueType fonts or cheaper to render fonts. Not sure what the goals are here, but if you really want to figure out what's going on at the core Nim IO layer then you want clean separation of sources of time spent (like input/split IO and output IO). E.g, forget about locking stdout buffers - just don't output at all. And on the output side forget about reading input - just generate the data like your generator and output and realize that if you are outputting to a terminal then you are benchmarking how that terminal handles extreme high update rates.
Re: Nim's strutils.split() slower than Python's string split()?
Locking can indeed slowdown output as can exception handling and is unnecessary for single-threaded output and the exceptions may be unhelpful if there is nothing you can do on out of space errors. If you are on Linux/glibc then you can also just use `fwrite_unlocked` directly as in `cligen/osUt.nim:urite` or `cligen/mslice.nim:urite` as shown in the `examples/cols.nim` program in [https://github.com/c-blake/cligen](https://github.com/c-blake/cligen)
Re: Nim's strutils.split() slower than Python's string split()?
This comes up a lot. Here are a couple recent ones: [https://forum.nim-lang.org/t/4738](https://forum.nim-lang.org/t/4738) and [https://forum.nim-lang.org/t/5103](https://forum.nim-lang.org/t/5103).
Re: Why does &"""{"total":23}""" compile but &"""{"total:":23}""" not compile?
Quoted closers are indeed another wrinkle. If you want a one pass algo you can just track the index of the last `':'` and use it only after you pass the final unquoted closer having framed the two parts. Still doesn't sound hard to have less brittle framing, but effort worth is always subjective. Nim does use `':'` a lot. Maybe @TKurtBond could do the PR? :-)
Re: Why does &"""{"total":23}""" compile but &"""{"total:":23}""" not compile?
As long as `':'` is not allowed in `fmt` part of `{expr:fmt}`, you can probably just frame the `expr` by doing an `rfind` on ':' instead of a `find` (last rather than first), no?
Re: `{}` syntax
It's a good-ish summary and your range `[3..5]` is another great example why we might want two operators since one could also do a range of `{"RFCs" .. "the"}`. The augmented tree actually keeps things like "the number of nodes below a node" (or "to the left") and can do a height-of-tree time search to find a node of a given rank. You don't really need much more than a `SortedSeq` to really demonstrate the interface issue, as mentioned in the long discussion, though. (If you want a simpler algorithmic mental model.) Some clarifications. We don't ever need to break how `Table` is accessed (or `seq`). We could only do the `[]` and `{}` for the new bag/sorted bag thing (which is not yet impled in Nim (I have C impls to port)). I think the extent/impact of this idea was misunderstood because there is so little exposure to unkeyed "search" trees. (I've seen some AVL "rope" libraries, the Python blist, etc., but it's definitely off the beaten path). A lot of words were spilled with me trying to explain that. Mostly, what I think I am getting pushback on (it's not exactly clear) is providing `{}/{}=` for `Table` merely as an "alias" after these impls are all in play. That way `{}` could mean "associative-ish" while `[]` could mean `positional-ish` as a more global convention like the `add` vs `append` proc or other things in `doc/apis.rst`. We could preserve backward compatibility forever. Right now `{}=` is only used in the `json` module. The `{}` for `Table` is just to make it easier to have the object be either a tree or a table. `CritBitTree` and maybe something else might also benefit from that kind of operator alias (and might be algorithmically updatable to an instrumented tree, actually...Haven't thought about that). It's possible push back is more general, like "don't ever use two indexing operators like `[]/{}` for some core thing like a tree/sortedseq!" or "don't ever add an alias for `Table.[]!!`". I don't mean to speak for anyone. It's not like this is a huge deal to me (however much I may write). It's a neglected interface detail because few libraries have multiassociative things until late in the syntax game, but we soon will, and Nim's fabulously flexible syntax already supports it. It's more consistent with initializer syntax like `{:}` and highlights possibly slow associative lookup. I feel like if you like `{key:val, ..}` at all more than `@[(key,val),..` then this kind of move would be welcome. Tutorials/intros/explainers of the language do have to decide "Do I favor internal consistency/clarity of mentioning `{}` early or do I favor similarity to other prog.langs". I think that about covers the real questions. Oh, and the alternatives to these are to always use `[]` but with type constructors inside to distinguish the "access mode".
Re: Help with hash sets needed
Oh, and of course `a == b` must imply `hash(a) == hash(b)`.
Re: Help with hash sets needed
`hash` and `==` must agree. So, you need: proc `==`(a, b: ComponentPoint): bool = a.id == b.id Run
`{}` syntax
Some people don't read the RFCs often while everyone working with some new prog.lang _loves_ syntax debates. :-) So, to get broader exposure, here: [https://github.com/nim-lang/RFCs/issues/207](https://github.com/nim-lang/RFCs/issues/207)
Re: Compile time FFI
Slightly off topic, but as part of a recent job recruitment process I wrote my own libpcap-like packet processor and in benchmarking it I noticed it was about 6x faster than the actual libpcap which uses a bunch of expensive callbacks via function pointers internally. It's also pretty easy in Nim to define some `object` s with bitfields and do swapping as needed from network to host byte order, and might be surprisingly few lines of code. You can still use `tcpdump -w file` to capture data. Just pointing out there may be less value in libpcap vs writing it yourself than you might think at first glance.
Re: setting file owner and running process as another user
Also, in terms of `setuid`, you may also find `cligen/posixUt.dropPrivilegeTo` useful in the `cligen` nimble package (also at [https://github.com/c-blake/cligen](https://github.com/c-blake/cligen)) used for example by the nimble `kslog` package (also at [https://github.com/c-blake/kslog)](https://github.com/c-blake/kslog\)).
Re: setting file owner and running process as another user
You wan the `posix` module. It has `chown`, `chmod`, `setuid`, etc.
Re: Ternary conditional operator
Also see this recent thread [https://forum.nim-lang.org/t/6060](https://forum.nim-lang.org/t/6060) and note that "compile-time if" aka `when` can also be used in expressions as in `let i = when defined(nimRulez): 1 else: 2`.
Re: Unchecked arrays
As a trivial correction, @Stefan_Salewski wrote `add()` in one spot where he meant `addr`, but `add` is totally within the "safe zones" of Nim and even applies to `seq[T]` types when you cannot `setLen` ahead of time. (`setLen` is faster when you can use it, though.) Just trying to block potential confusion.
Re: Is it possible to share a Hashmap between threads?
Someday destructors may be defined/automatically called, but today there will be no clean-up without the `close` call. You may not need to/want to clean-up anyway. need to: `memfiles.open()` creates a memory mapping, but does not (by default) keep the file descriptor open. The only thing the `mf.close` above does is call `munmap`. Memory maps are a less scarce resource than file handles due more to default OS limits than anything else. Being much less scarce, it's not quite as bad to leak it. want to: You might even want to leak it on purpose if you wanted to keep the string data alive beyond the lifetime of the procedure invocation. E.g., you may want to return an `MemSlice` or a `Table` with them as keys or values. They are just pointers to that memory mapped region (hence zero copy) which means if you `mf.close/munmap` the pointers become invalid. Of course, if you are leaking millions of memory mappings then even small leaks can grow to be burdensome on the OS -- if you don't need continued access to the data. So, as with all resource management there is a balance of concerns and trade offs.
Re: Is it possible to share a Hashmap between threads?
Well, the data "on disk" is not NUL-terminated. So, though the type I am using is called "cstring" the string data is not terminated. If anything, the data is more "newline terminated" in your file format. Those newlines are mapped out by the `memSlices` iterator, though with the result stored in `.size`. So, I just use that to bound the loop. I used that type because Nim defines convenient `[]` style access to the bytes of a `cstring` giving back `char` s. It isn't strictly necessary, but that was my reason. (You would have to do your own pointer arithmetic and dereferencing otherwise).
Re: Is it possible to share a Hashmap between threads?
One vexing thing about modern CPUs is that a single CPU core cannot saturate RAM bandwidth. So, while throughput in L1 can be 100..200 GB/sec, and RAM bandwidth might be 50..150 GB/s, the amount just one L1 cache system can read from L2/L3/DIMMs might be only 5-30 GB/sec. This puts (often much simpler) single-threaded code that could otherwise saturate RAM throughput (thereby running as fast as possible anyway) at a major throughput disadvantage. But I totally agree with @mratsim here that staying with ST-code is the best idea. The cynic inside me guesses the throughput disparity incentivizes people to write more parallel code, thusly selling more CPUs. So, these days CPU manufacturers are sadly incentivized to complicate lives of programmers. At the dawn of the industry, it was the other way around. Maybe there is some less cynical "scaling" problem, but usually with throughput/bandwidth you can just add more lanes of traffic.
Re: Is it possible to share a Hashmap between threads?
Oh, and if you did want to use a zero copy IO interface like `memfiles` on this, you can and indeed it is over 3X faster, but you have to write your own integer parser because `parseUint` and all its analogues take a Nim `string` not `MemSlice`. You can use the dollar operator `$` to convert, but that is no faster than just using `lines` in the first place. Parsing an integer in ASCII is actually pretty easy, though. E.g., import tables, os, memfiles proc parseUInt(data: cstring, size: int): uint {.inline.} = let zero = ord('0') let nine = ord('9') for i in 0 ..< size: let code = ord(data[i]) if code >= zero and code <= nine: result *= 10 result += uint(code - zero) else: return 0 #non-digit; parse error proc ytz(path: string): uint = var table = initTable[uint, uint]() mf = memfiles.open(path) if mf.mem == nil: quit("Cannot open " & path) for s in memSlices(mf): let n = parseUInt(cast[cstring](s.data), s.size) if n == 0: quit("Parse error: " & $s) table.mgetOrPut(n, 0) += n for key, val in table: result = max(result, val) mf.close proc main() = echo ytz(paramStr(1)) main() Run Not a big deal here, but if the file were really large that factor of 3 might be more appreciated.
Re: Is it possible to share a Hashmap between threads?
This topic can be confusing to beginners because the word "read" is overloaded. There is a "system call" named "read", but also many higher level APIs that are "buffered" which are also called "read", and people often refer to the general activity as just "reading". The buffering means those higher level APIs are doing "big reads" into a buffer and then copy small parts of that out for you for each line. If data is copied out shortly after it was read into that big buffer with a system-call read then the copy is cheap because it was from CPU caches (L1/L2/L3). There are also "zero copy" APIs to do IO like "mmap" on Unix, or in the Nim world "memfiles". These let the OS itself do the "buffering" of what is in RAM vs "on disk" (or other persistent device like "flash memory" these days). Your linked yahtzee file is only 900 KB with average line length of about 9 bytes, but only 790 unique entries (so a small Table). With such a short line length you might profit from not creating new strings at all but re-using the same one over and over in the loop. That is because string creation and anything you might do with only 9 bytes take comparable time. The Nim stdlib also has a `split` iterator which lets you iterate over the split fields without creating a new `seq[string]`. Looking at your input file format there is no need to even `split` at all, though. So, there are simpler things than `memfiles` you can do to speed it up. Also, because you only have 790 unique lines, it should be much faster to get the max value by looping over the `Table` after you are done creating it (both no 2nd lookup, and 790 <<< 100,000). So, this is a faster version of your program: import tables, os, parseutils proc ytz(path: string): uint = var table = initTable[uint, uint]() n: uint for numStr in lines(path): if parseUInt(numStr, n) == 0: quit("Parse error: " & numStr) table.mgetOrPut(n, 0) += n # here for key, val in table: result = max(result, val) proc main() = echo ytz(paramStr(1)) main() Run Incidentally, when posting code to this forum, if you put `Nim` after the triple backquote it will be rendered with color syntax highlighting.
Re: Unknown performance pitfall in tables?
It was a tad melodramatic, but at least it had a '?'. And it even turned out to be 2-3x in the other direction (consistent with other Go table benchmarks I've tried).
Re: Unknown performance pitfall in tables?
PGO = profile-guided optimization. I mentioned how to use it already.. `-fprofile-generate...` . It's not on by default because how it works is gcc instruments your code, counts what happens based on your sample program, then uses this information to make good judgements (unlike usual guesswork heuristics). To be on by default would require a sample program often with sample input data which cannot be assumed by default. It wouldn't be crazy for gcc to accept some "sample program invocation" string as a command argument, but it does not. That's how my `nim-pgo` wrapper script works, though. I just say `nim-pgo prog './prog data' '--gc:arc'`.
Re: Unknown performance pitfall in tables?
It also wouldn't be crazy for the `nim c` / `nim cpp` compiler drivers to accept some sample program invocation and do the program - sample-run - re-compile three stage process more automatically, but that isn't in place either. My `nim-pgo` script just assumes the usual `~/.cache/nim/r` locations I mentioned. The short of my script is: nim $be -c --cc:gcc -d:release -d:danger $nimOpts "$prog.nim" || exit 1 inputs=$(echo $nimcache/*.c*) $cc -fprofile-generate $inputs -o $prog $ccL || exit 1 rm -f *.gcda eval "$cmd" $cc -fprofile-use $inputs -o $prog $ccL Run but I also echo some things like PWD and set up those `$` environment vars.
Re: Unknown performance pitfall in tables?
Anyway, once you reproduce my times you should maybe tell those Reddit people how fast Nim code can be. :-) Perusing the thread, I didn't see people doing PGO on the C/C++ solutions. (It may well make less of a difference for those solutions, too.)
Re: Unknown performance pitfall in tables?
Oh, and just for reproducibility's sake: myshell$ nim --version Nim Compiler Version 1.1.1 [Linux: amd64] Compiled at 2020-03-12 Copyright (c) 2006-2019 by Andreas Rumpf git hash: bbc231f8e06c18fe640fa56a927df476c2426ccb active boot switches: -d:release -d:nimUseLinenoise -d:nativeStackTrace -d:nimHasLibFFI Run
Re: Unknown performance pitfall in tables?
Oh, and just as a sanity check that I'm using the right data/algo, my output is that `@["estop", "pesto", "stope", "topes"]` which I think is right reading a little more of that original reddit thread you posted.
Re: Unknown performance pitfall in tables?
Anyway, my code with no PGO and `-d:danger --gc:arc` only takes 90 ms. My conclusion from this is that the Nim code can be 211/90 or 211/66 =~ 2.3x or 3.2x faster than the Go, depending on what you think is fare to compare. BUT you need to manually guide inlining, and if you want that last 90/66 = 1.36x boost you need PGO.
Re: Unknown performance pitfall in tables?
Actually, just taking your original code and adding `{.inline.}` to the `func` and compiling with `-d:danger --gc:arc` makes it only 118ms. So, that inlining must be the biggest boost the PGO is discovering.
Re: Unknown performance pitfall in tables?
BTW, I haven't tried "every version of the code in between" to isolate all the various causes & effects which might be instructive, but my version with only `-d:release` takes 630 ms while `-d:danger` takes 446 ms while `-d:danger --gc:arc` takes 255 ms, and then that 66 ms with PGO. So, in short, the performance "is there" in Nim, but you might need to compile with more optimization turned on. Just the above sequence is 9.5x and that last 4x comes from gcc PGO which I have been finding unusually effective on Nim generated C code. On that Go code with the same input and `gccgo -O3 nklc.go` I get 255 ms as well. I don't have a convenient shells script driver to do PGO with gccgo, though.
Re: Unknown performance pitfall in tables?
Oh, and one other number - your original code with PGO on that same input - 72 ms. So, the double/triple table lookup/slice stuff is really all only costing around 72/66 =~ 9%.
Re: Unknown performance pitfall in tables?
So, with your original version doing only `-d:release` I get 1030 ms (I.e. 1.03 sec) while on the same machine with my version doing gcc PGO and `--gc:arc` I get 66 ms, about 15x speed-up.
Re: Unknown performance pitfall in tables?
Well, maybe and via the `let` could be. I'm not sure the slice assignment is optimized down to a `copyMem`, though which works fine for your string data. Could be slice compare and `copyMem` in the if clause is best. Those Table use and how-to-compile modifications may well pack much more of a punch.
Re: Unknown performance pitfall in tables?
Do you have an example input file somewhere you could link to?
Re: Unknown performance pitfall in tables?
Besides @Stefan_Salewski's comments, it also bears mentioning that A) you need to be careful to compile the Nim with `-d:danger` before making Nim performance conclusions, B) `--gc:arc` may also help but may or may not be in your particular version of Nim and C) the algorithm in your Nim case is not quite the same - in the Go you just do an unconditional append and then a second loop while in the Nim you do a triple lookup when present and double lookup when missing . Avoiding string creation with the slicing and doing a more direct translation of the Go would look like: func smallestRepr(arg: string): string = let doubled = arg & arg result = arg var slice = result # I *think* I have this right but have not tested.. for i in 1 .. arg.high: copyMem slice[0].addr, doubled[i].addr, arg.len if slice < result: result = slice proc main() = var seen = initTable[string, seq[string]]() for word in paramStr(1).lines: let key = word.smallestRepr seen.mgetOrPut(key, @[]).add word for key, words in seen: if words.len == 4: echo words main() Run At least in the past when I've benchmarked things, Go's tables were much slower than Nim's. In the past I have noticed that, at least with the gcc backend, you can get a big boost (near 2X) from profile-guided optimization (`gcc -fprofile-generate ...; Run sample prog; gcc -fprofile-use ...`) where the `...` s would be all the C files Nim wants to compile and the sample prog will probably be named `a.out`. (Usually everything in `~/.cache/nim/r/` on Unix, but visible via `nim c --verbosity:2` everywhere.)
Re: Can if statements be used as expressions everywhere?
Hmm. Ok - it was not your **exact** form. Sorry! All of these also work, though: let test = 0 + (if true: 3 else: 4) let test = 0 + (if true: 3 else: 4) let test = 0 + (if true: 3 else: 4) let test = 0 + (if true: 3 else: 4) let test = 0 + (if true: 3 else: 4) Run but yeah, the one you list does not. Maybe this is an undocumented (or even documented?) parser limitation or maybe it's a bug? I suspect it's some hard to resolve ambiguity in that exact case. The `block` workaround you found isn't so bad.
Re: Can if statements be used as expressions everywhere?
That exact form works for me with nim-0.19.2, nim-0.20.0, nim-devel. Don't recall exactly what version it landed in, but it sounds like maybe some limitation in either a very old or a bug in a very specific intermediate version of Nim you may be using.
Re: Can if statements be used as expressions everywhere?
It is intentional and you just need () around the middle commented out example.
Re: How to use integer generics in types?
There's no easy way to provide default values for such integer generic parameters, is there? E.g. (this doesn't work): type MatchPool[N: static[int] = 3] = object Run
Re: How to use pointer to mmaped file?
You can probably use `cstring` if on a C backend, but if you wanted to index some other element type than `char` (like `int` or `Foo`) you can also do let foos = cast[ptr UncheckedArray[Foo]](mmaped_file.mem) if i * Foo.sizeof < mmapped_file.size: doSomethingWith foos[i] # e.g. i == 0 for 1st Foo Run You may well be able to skip the range check every time if you are, e.g. looping over the file & know you won't overflow from that, etc. To recover automatic bounds checks, you can write new/use existing procs accepting `openArray[Foo]` (e.g. `openArray[T]`) and pass via `toOpenArray` explicitly in the parameter list: fun(toOpenArray(mf.mem, 0, mf.size div Foo.size - 1), a, b, c) Run That can be abbreviated with a `template oaFoo(mf)` if you need to pass it or various files in several places or something. If such procs want to modify the data the file needs to have been opened writable or you will get segmentation violations.
Re: Why whitespace?
Sure. I follow @sschwarzer's approach personally. The reasons for @Tutbadut or (someone else passionate about just one issue) **not** to fork something go back to network effects and the value of communities. Sometimes reiteration of pros & cons, etc. has value for new people.
Re: Why whitespace?
Well, an exclusive indent char might be nice, but TAB is very easy for almost everyone to enter by default already. Used the way @Tutbadut and @Araq were discussing only at the beginning of lines ('^TAB*[^TAB]' in Regex-ese) it could function that way. It's true that a brand new character might side-step the diverse ways to configure the amount of rendered blank space per tab, but at the non-zero cost of mandating unicode. While source code "in the large" is not tabular data, in a leading-whitespace-for-block-structure language like Python or Nim, that part of the text has "regimented spacing concerns" much like a table. TAB characters of any fixed width are actually less useful for tables than the old early 20th century mechanical typewriter tab-stops that inspired the TAB character. Tab-stops were totally configurable - first at 4, next at 8, next at 10 or 20, etc. That flexibility was largely lost in the transition to ASCII-oriented software. { You think _one_ conversion ratio is a pain in the ass...try _dozens_! ;-) } Anyway, detecting the very first '^[some-kinda-white]' and mandating whole-file consistency afterward seems "not so hard", especially if you allow comment-only lines as candidate leading white-space determiners. In spite of not being so hard, most text editors punt on it, only doing line-global/file-global type transformations leading to Araq's observations. It would be a really easy CLI text filter to do leading-space only..probably less work than reading this whole thread and also of value to non-`.nim` files. Probably exists already, too. Using hard tab for leading indent only and letting divergent users adjust as they like seems philosophically much like Nim's style insensitivity, only with far more common environmental support for tunable rendering. The only real problems are defaults of 8-spaces per TAB being 4x more than what Nim people are used to and diversity/unavailability of TAB size setting. Combined that creates trouble for the many to please the few. I'm pretty sure the diversity/unavailability of adjustment/having to tell Python TAB-size conversion complexity was what led to Guido's regret, mixed in with some Wadler's Law can't-believe-we're-arguing-about-it. Anyway, just food for thought. I'm not really advocating anything.
Re: Why whitespace?
Well, only something like 20% of the `17 shl 16` code points are allocated. So, there's surely hope in the future. (A disaster of address space overallocation if you ask me..A round 5 nibbles/16 shl 16 wasn' t enough? Really? Oh well.)
Re: Why whitespace?
Communities like consistency in many forms both explicit and implicit almost by their very nature. There are always cooperators and mavericks. Building consensus is often hard to impossible, and if you reflect upon it, consensus building is **not** an enviable task - programmer's are a very opinionated demographic, often with incompatible concerns. Nim is more flexible & powerful than most prog languages/compilers. I hope that you give it a serious try regardless of any initial misgivings about this or that specific inflexibility here or there. If you like writing little CLI tools as a way to get your feet wet in a language then [https://github.com/c-blake/cligen](https://github.com/c-blake/cligen) might save you some time.
Re: Why whitespace?
[https://stackoverflow.com/questions/18962581/why-is-it-recommended-that-i-replace-tabs-by-spaces](https://stackoverflow.com/questions/18962581/why-is-it-recommended-that-i-replace-tabs-by-spaces) If you follow the link in the top scored answer, you will see an email thread from 1994. There are definitely pros & cons. Insisting on only either spaces or only TABs for indent in a given file might be workable in terms of preventing half & half messed up files. Choosing just one is surely the simplest approach, though.
Re: Why whitespace?
Line length/text density is mostly limited by visual acuity in arc seconds or arc minutes, not by pixels on a display. There is, of course, doubtless a wide range of human visual acuity, but most people wear corrective lenses to target a much narrower range. Hence you hear "120 vs 80" and never "800 vs 40". Font choices influence this, too. It's just a "complicated unit" to translate into how things are selected and even varies by "sitting distance" from displays. So, people don't use it, but it's absolutely what drives all this. Book/Newspaper/magazines have had very high DPI (really dots per arc degree!) for many more decades or even centuries. They limit to what is "easy to read" for their market, but their problem is a little different than "logically stiff" source code. Computer displays (combined with font designs) basically got to 80 columns by the mid to late 1980s and stayed there even as pixel densities have ever increased since then. People do wider aspect ratio displays now and so a pair of side by side 80 column windows is not so rare. Had "book page" aspect ratio like "portait mode" displays been more common earlier on, things might have settled on 64 character wide windows. So, maybe be happy it's at least 80. ;-)
Re: Suggestions for optimization?
Incidentally, @lagerratrobe, once you save a lot of time by replacing sorting by a perfect commutative hash, the next optimization is making the IO part faster (you need cligen-0.9.43 for this): import strutils, tables, cligen/[mfile, mslice] const prime: array[26, uint64] = [ #9/267751 oflow 7'u64, 61, 41, 53, 2, 71, 47, 29, 3, 97, 89, 17, 59, 19, 5, 31, 101, 11, 13, 23, 37, 79, 73, 67, 43, 83 ] proc product(word: MSlice): uint64 = result = 1'u64 #Assumes whole file is uppercase for ch in word: #iterator needs cligen>=0.9.43 result *= prime[ord(ch) - ord('A')] proc pBuildQry(dict="words", query: seq[string]) = var anas = initTable[uint64, seq[MSlice]]() let mf = mopen(dict) if mf == nil: return defer: mf.close for word in mf.mSlices(): anas.mgetOrPut(word.product, @[]).add word for word in query: let word = word.toUpperAscii let prod = word.toMSlice.product echo word, ":" try: for ana in anas[prod]: echo " ", ana except: discard import cligen; dispatch(pBuildQry) Run With that my 129 ms goes down by 57ms to 72 ms (1.8x better) on that SOWPODS dictionary. Then if you Nim compile with `--gc:arc`, it goes down to about 47 ms (1.53x better). Then if you use gcc's PGO it goes down to 42 ms (1.12x better). I suspect with your dictionary and CPU that you would see a similar overall 4x to 5x improvement taking you from 286 ms down to more like <70ms. Language comparison-wise, I do not believe R or CPython would be able to be sped up similarly. (Perhaps in PyPy the product might be able to become fast, but perhaps not if they are worrying about overflow switching to arbitrary precision ints.)
Re: Why whitespace?
I don't really disagree. I only really use it for patches/diffs, myself. I think the real issue is just the the wild inconsistency of both how to adjust tab-size settings and whether it is even possible across a wide universe of text rendering situations. Avoidance lets you bypass that complexity from the "abstractness" of how much space a TAB represents, but of course, you do lose the abstractness, too.
Re: Why whitespace?
Another argument not yet mentioned is that having spaces rather than TABs gives one flexibility in (probably only temporarily) wedging in a control line governing a set of sub-blocks at some "half-indent" amount (such as only 1 or only 2 spaces). You can do some `if` or `when` for example. It could also make certain patches that effect such changes smaller (by adding one line instead of adding 1-line and shifting a ton of stuff) and more robust (since diffs are usually contextual and this would require less context to match). Admittedly, not a huge deal and a very rare case compared to reading code everyday, but still something concrete which would not be possible with exclusively TAB indents.
Re: Why whitespace?
I think you are misinformed. Any number of spaces is fine. Tabs are not allowed.
Re: Nimrod Combinatorics Module of Reimer Behrends does not work with --gc:arc
Like this, just for 100% clarity: assert toSeq(permutations([1,2,3,4,5])).len == fac(5) Run
Re: Nimrod Combinatorics Module of Reimer Behrends does not work with --gc:arc
Yeah. Sorry if I was unclear. I meant just ditch the `count` template entirely and change all the asserts to call `toSeq` directly. There's probably another way to do it, too. It's not like the lengths of any of those test cases warrant conserving memory..biggest is like 1024.
Re: Nimrod Combinatorics Module of Reimer Behrends does not work with --gc:arc
If you change the `doAssert count(...) == number` to `assert toSeq(...).len == number` it all works (in ARC mode or not) except for the first test case with an empty sequence which has `.len == 0` (which actually seems right to me, though it's a degenerate case kind of like `0!`..personally I would probably change that to check for 0 not 1 and have the semantics be to be an empty loop/empty iterator unless that breaks something else).
Re: Nimrod Combinatorics Module of Reimer Behrends does not work with --gc:arc
stdlib `algorithm` has `nextPermutation` (and `prevPermutation`), but not combinations. Combinations might make sense to add. Not sure about the full spectrum of iterators in that package.
Re: Access to Iterators of Sequences
You can also avoid sorting the entire table if you only care about the top 2 or 10 values by using the Nim stdlib's spiffy `heapqueue` module as in [https://github.com/c-blake/cligen/blob/master/examples/newest.nim](https://github.com/c-blake/cligen/blob/master/examples/newest.nim) \-- might be worth adding a `top`/`bottom`/`most` type procs to `heapqueue`.
Re: How does one check if a string is numeric?
Minor nitpick - you should call it `isFloat` so `isInt` can also work since both take just a `string`, or maybe `isNumber(string, enum)` with different kinds of number.
Re: Suggestions for optimization?
Incidentally, in the small alphabet case (and realistically anagrams are usually related to word games over small alphabets) there is a neat trick for rapid signature computation: map the 26 letters to the first 26 primes (2, 3, 5, 7, .., 97, 101). Then make the signature of a word the product of the primes. If you never overflow the product this is guaranteed to be a valid anagram signature by the uniqueness of prime factorization and commutativity of multiplication. Note that this **removes sorting from the equation entirely**. In the worst case, that product can start to overflow a 64-bit integer past about 9 letters (since 101^9 =~ 2^60 and 101^10 > 2^66). However, neither worst nor average cases really matter since we have a static dictionary of every possible case. So, it's actually easy to just test a given dictionary to see if any overflows collide in our exact case. While it is easy to defeat by an attacker, for "organic/natural" dictionaries this is likely to be very rare because 2^64 is much more than the square of the number of words in most dictionaries. So, we are in a regime very far from birthday paradox collisions. Really, we aren't random, but also really the "load on the address space" is much less than the square of the number of words - it is the product of the number of non-product overflowing words and product overflowing words which is much smaller (assuming most words do not overflow). So, this is really a somewhat empirical question about words and dictionaries and letters. With just a naive 'A' .. 'Z' \--> 2 .. 101 mapping and this dictionary: [https://github.com/jesstess/Scrabble/blob/master/scrabble/sowpods.txt](https://github.com/jesstess/Scrabble/blob/master/scrabble/sowpods.txt) I got 15_294 words that overflowed (and no collisions!) for 267_751 words. One can do better, though. At least in English, letters have a very skewed usage distribution. So, we can make overflows even less likely. You can just look that frequency up on Wikipedia ([https://en.wikipedia.org/wiki/Letter_frequency](https://en.wikipedia.org/wiki/Letter_frequency)) and sort the first 26 primes by that frequency. Doing that with the above SOWPODS dictionary, we reduce the number of overflows to 223. You can do better still with a pre-pass to _measure_ this frequency in the dictionary, sort the primes list by that measured frequency. I get 175 overflows that way. You can do better still by measuring only the frequency of letter usage in words _longer than 9 letters_ (where it actually matters to contain overflow). I get only 136 overflows (over 100x better than the initial 15,000) that way. You might be able to do slightly better by measuring the frequency of letters in just the 136 overflow words and adjusting, but that is more work and upsets the apple cart of the prior measurement perhaps requiring iteration. I think that 136 yields some a priori probability of any collision of under 2e-12 (under certain not quite right randomness assumptions). So, while you might worry this "product signature" technique is not applicable due to overflow, it probably is in a language where anagrams are interesting Applying all these ideas (except the measurement which I did in a 15 line Python script for its convenient arbitrary precision arithmetic) in Nim we get: import strutils,tables,sets #toUpperAscii,Table,HashSet const prime: array[26, uint] = [ # >.len==9 frequency 7'u, 59, 29, 31, 2, 67, 47, 53, 5, 101, 73, 23, 43, 13, 17, 41, 97, 11, 3, 19, 37, 71, 79, 89, 61, 83 ] proc product(word: string): uint = result = 1 #You might worry about overflow, BUT for ch in word: #..long anagrams are so rare it's ok! result *= prime[ord(ch) - ord('A')] #223/267751oflow proc sortByLetter(word: string): string = var cnt: array[26, int16] for ch in word: cnt[ord(ch) - ord('A')].inc for i, c in cnt: for n in 0 ..< c: result.add chr(ord('A') + i) proc pBuildQry(dict="words", query: seq[string]) = var anas = initTable[uint, seq[string]]() for line in lines(dict): let word = line.toUpperAscii anas.mgetOrPut(word.product, @[]).add word for word in query: let word = word.toUpperAscii echo word, ":" try: for ana in anas[word.product]: echo " ", ana except: discard proc sBuildQry(dict="words", query: seq[string]) = var anas = initTable[string, seq[string]]() for line in lines(dict): let word = line.toUpperAscii anas.mgetOrPut(word.sortByLetter, @[]).add word for word in query: let word = word.toUpperAscii echo word, ":" try: for ana in anas[word.sortByLetter]: echo " ", ana except: discard proc collisions(dict="words") = var t = initTable[uint, seq[string]]() for line in lines(dict): let word =
Re: Programming help - Binary Search
I think you just need `hi >= lo` instead of just `>` (the Java code you posted has that). You should also be told that this is in the Nim stdlib as `algorithm.binarySearch`, although it does use the simpler -1 return to signal "not found" rather than `Option[T]` stuff.
Re: Suggestions for optimization?
To clarify how @cdome's comment differs from @SolitudeSF's and @lagerratrobe's own observation, the intent was always for `-d:danger` to imply `-d:release`, but I guess there are several versions of Nim out there for which this implication does not hold. You have to define both for those Nim versions to disable all checking. Going forward (from `devel` and probably from version 1.2) that will be redundant upon simply `-d:danger`. Another thing you can do along the lines of just how you build like `-d:release -d:danger`, though is using GCC's profile-guided optimization in the C backend. I've seen up to 2X speed boosts for Nim generated code with that. I suspect there is much more improvement to be had by the various algorithmic improvements I mentioned, though, and some are pretty easy. They do start to seem more like "cheating" in cross-programming language comparisons, especially moving all the hard work to compile-time. So, it depends if you're actually doing anything real with this code.
Re: Suggestions for optimization?
BTW, the Unicode version of this with 65536 to 2e6 "letters" (depending upon if you can restrict to "one Unicode plane") is definitely trickier. You might think a 2 pass radix sort would help, but since most words are short most of your sorts are _very_ small. Indeed, even insertion sort might well beat the `algorithm.sort` merge sort in some average sense for this problem. Insertion sort tends to beat almost everything for N < 16..32 on modern CPUs due to cache effects. Almost any dictionary will have average, median, and mode word lengths below 16 just because long words are unpopular. So, just having a "wrapper sort" that switches to insertion for N < 20 and falls back to `algorithm.sort` for bigger is probably best. It would also be possibly valuable to the community if you looked into the stdlib `algorithm.sort` and had it switch to insertion at small N. It doesn't right now. The trouble you run into with most array/table based approaches is that the alphabet is so much larger than the word length. So, a Fenwick Tree or anything with "implicit order" from the indices is just too big to iterate over effectively. Without implicit order you wind up still having to sort "used letters" and letter repeats are not usually very dramatic. About the only thing I can think of that _might_ help Unicode (beyond just in-place/d-danger tricks) in that counting-sort-style approach is another structure from Preston Briggs in a 1993 Rice University tech report/thesis. That sparse-dense thing can sometimes be useful to manage iteration over very sparse subsets of small-ish universes like this Unicode code point space. It's conceivable you could use that as a Unicode histogram, but you would still have to sort a small array (number of _unique_ letters). So, it's hard to say it'd be faster than the insertion-or-merge idea, but it's obscure enough to be worth mentioning as also neglected. Of course, **if you find yourself running this calculation a lot on a small-ish fixed set of basically static dictionaries** , the single biggest optimization you might do is to simply **save the signature index** to a file. You could extend @juancarlospaco 's idea and have a `const Table` built into the binary executable on a per dictionary basis (e.g. 1 program file per dict). You would absolutely have to time several hundred or even thousand anagram queries to get a reading. Or if you wanted a generic program to work with many dictionary files then instead of a `Table` you could sort the signatures themselves paired with their words. Then you could just save that sorted list to a file and do lookup via binary search on the file with at most O(lg(Nwords)) disk probes per anagram query (plus time to open/mmap the file). You could also hash-structure that file to get that down to 1 probe at substantial code baggage. ([https://github.com/c-blake/suggest](https://github.com/c-blake/suggest)/ has a fully worked out example of a much more complex persistent hash-structure store along those lines.) You _could_ try using Nim's `marshal` module to save your `Table` to disk and load it back whenever you want it, but in this case I think the loading of the table would be no faster than just building it from scratch. You could also "save in memory" by having a long-running server that processes each dictionary just once and then answers anagram queries over a local network or pipe or something. Personally, I think _all_ of the above are good coding exercises.
Re: Suggestions for optimization?
Besides @SolitudeSF's important suggestion, an algorithmic improvement you can use is to replace `algorithm.sorted(lower, system.cmp)` with a hand-written "counting sort". For ASCII (i.e. 1 byte textual characters) this is (very?) easy and should be quite a bit faster than `algorithm.sorted`. Something like this: proc sortByLetter(word: string): string = var cnt: array[26, int8] for ch in word: #simple counting sort cnt[ord(ch) - ord('A')].inc for i, c in cnt: for n in 0 ..< c: result.add chr(ord('A') + i) Run You probably need some `toUpper` in there if your dictionary is stored in lowercase (although that could also be done prior to entry to the above `proc` or you could also just change `A` to `a`). That `int8` type covers words up to 255 characters long, but I think the longest real word in any spoken language with dictionaries is less than that. You could always count and print a warning or switch that to `int16` or `int32` which could help if chemical elements or really crazy stuff is possibly in play. The same idea could be adapted to unicode but then you would probably want a `Table`, not an `array` which would slow it down more than the above minor modifications. I actually think this is a good problem context in which to introduce the oft neglected counting sort which can also be used with (unlike here) non-empty satellite data (at some slightly increased code baggage).