Re: Python transpiler

2020-07-13 Thread cblake
I think part of the "sales pitch" for Nim out in the world, for good or ill, is 
"sorta like Python but compiled with static types". It isn't so strange that 
people show up asking how to automate "sorta like" into "more so". ;-)

The vast majority of Python code that I have seen uses precious little of 
Python's very dynamic nature. So, doing a half-baked py2nim with a **_slew_** 
of caveats might have some value. It's not like `c2nim` is perfect, either. The 
questions are more A) if the caveats are beyond the grasp of the typical person 
porting such code and B) if the rather open-ended nature (ever changing Py3) 
would leave it half-done forever. @juancarlospaco may be right that a group 
effort here could yield the best results.

It isn't my question to answer, but as to "why?", it is likely "easier good 
performance". Python unassisted by C/Cython/Pythran/etc. or Pypy is one of the 
slowest programming languages around. Combined with piles and piles of Python 
code out there doing xyz useful thing that could be safer/faster/etc. with Nim. 
Nim just invoking/calling that code is one way to access functionality. If it 
was too slow to begin with, it won't help. People may have misplaced 
expectations about how little they have to think in order to improve 
performance, of course.


Re: Python transpiler

2020-07-13 Thread cblake
@ShalokShalom can perhaps clarify what he meant, but I took him to be asking 
for a `c2nim`-like program `python2nim` to help facilitate porting pure Python 
code to pure Nim code (as opposed to using already written Nim code from within 
Python or already written Python code from within Nim). Some runtime libraries 
to bridge API differences might also help along those lines. Some quick web 
searchers turn up a few partial moves along this direction.


Re: Introducing --gc:arc

2020-07-05 Thread cblake
You should also try PGO both with gcc and clang. 
[https://forum.nim-lang.org/t/6295](https://forum.nim-lang.org/t/6295) { yes, I 
know you posted that. :-) xref for others. :-) }


Re: Why does wrapping the code in a top level procedure make it faster?

2020-07-03 Thread cblake
In my experience batteries are almost never fully charged and it's hard to get 
feedback if you only release a perfect cathedral. With no feedback, it's kind 
of unlikely you will build something popular.

To take a topical example, even after 30 years of tuning, the core language 
`dict` in Python still seems to have no easy way to "pre-size" an instance. 
There is no power of 2, no `rightSize`, nuthin'. So, one of your rough edges 
literally cannot arise due to an inflexibility/API design flaw (IMO).

Yes, there must be 3rd party replacements or ways to pre-size in some slow 
subclass or whatever, but you could also just write a `proc initTab` that 
always calls `rightSize` for your own code. What's at issue here is "out of the 
box" (and personally I think the Nim workaround is less onerous than 
workarounds in the Python world).

Do you have to learn your way around? Sure. A sales pitch of "just like Python 
but compiled/with native types" is definitely not quite right. That's 
Cython/similar. Analogies and oversimplifications help some people while others 
want the real story, and you strike me as the latter.

Nim gives programmers more power and flexibility, but responsibilities also 
come with that. Cue Spider Man's Uncle Ben. ;-) It is yet a comparatively tiny 
community. Bugs, workarounds, and rough edges come with that. Unused patterns 
like your `&` are just unused, unoptimized things waiting to be 
discovered/fixed. There was a time when `C` had no proper namespacing of struct 
members and that is why some core Unix types have prefixes like the `st_*` in 
`struct stat` (run "man 2 stat" on any unix machine).

No one using Nim _wants_ it to be hard, but there may also be good reasons some 
things are the way they are (often flexibility, performance, or safety, but yes 
sometimes neglect). I'm really sorry to hear your entry has been tough, but the 
flip side of that is you could probably make a HUGE impact to future 
similar-but-not-quite-you's, and I again encourage you to try! Even just 
documenting everything somewhere would probably help at least a few other 
Python Refugees. :-) Cheers and best wishes whatever you decide.


Re: Why does wrapping the code in a top level procedure make it faster?

2020-07-03 Thread cblake
As the person who added `rightSize`, I agree with those points actually. The 
only slight gotcha is that doing as you suggest would result in old code that 
calls it with (the correct power of two to avoid a resize, or maybe already a 
`rightSize`) would allocate the _next higher_ power, wasting space. That might 
be a price worth paying ease-of-use wise.

We _could_ only call `rightSize` if the argument is not a power of two, but 
it's probably simpler to just always use `rightSize` and just tell people. 
Wasting space isn't quite a "failure". Anyway, you should feel free to file a 
github issue and/or work on a pull request. Maybe I should have pushed harder 
at the time to change the parameter semantics.

There is definitely a learning curve coming from other languages. There are 
definitely more examples of the stdlib not providing the "best in class" for 
various functionalities. There will likely be even more with time. Nim core 
only has so much manpower to write/maintain/support such things.

We discussed doing a "distribution" over at 
[https://github.com/nim-lang/RFCs/issues/173](https://github.com/nim-lang/RFCs/issues/173)
 and elsewhere. This "fusion" repository (see end of that thread) seems to have 
been the output, but not much has been happening over there.

Anyway, I think if you persevere there are good chances you could be a happy 
Nimmer once you've learned your way around. You seem pretty capable of finding 
these issues and you could probably help round some of the sharp edges, if you 
can sustain the patience. There is probably a Porting to Nim From Python 
guidebook or maybe you could help do one!


Re: Why does wrapping the code in a top level procedure make it faster?

2020-07-03 Thread cblake
Ah. Sorry that I misunderstood the sarcasm. Maybe my content/xref was not 
worthless anyway. ;-)


Re: Why does wrapping the code in a top level procedure make it faster?

2020-07-03 Thread cblake
@Stefan_Salewski \- The idea of defaulting to an optimizing mode has come up 
before. It seems contrary to what almost any user coming from a "compiled 
language" world would expect. For various reasons (debugability, compile-time 
performance, etc.), almost all compilers default to a non-optimized (or very 
weakly optimized) output and allow various knobs to crank up optimization, as 
does the current `nim` setup.

There is even a whole dark art of "best optimization flags" which can be taken 
[to severe extremes](https://github.com/Acovea/libacovea). More 
simply/efficiently/some might say intelligently, you can often use PGO 
[https://forum.nim-lang.org/t/6295](https://forum.nim-lang.org/t/6295) to get 
1.25..2.0x boosts on object code generated from nim-generated C. Some flags 
like `-ffast-math` can even change semantics in subtle ways that can impact 
correctness or not depending on use cases.

I don't know what to do about people publicizing misleading benchmarks. That 
seems an ineliminable hazard, not only for Nim, but actually **_everywhere and 
all the time_** , and often not on purpose (though most "marketing" wants to do 
strawman comparisons on purpose). Besides compiling wrong, they could also use 
a bad algorithm, misrepresentative inputs, weird machines, benchmarks naive 
relative to intended measurements, and probably several other mistake 
categories. :-/ The best strategy may be trying to be supportive where & when 
we can, educating as we go, though I agree/admit that is a neverending battle.


Re: NvP: s.add('x') 100M times

2020-06-25 Thread cblake
Github issue: 
[https://github.com/nim-lang/Nim/issues/14811](https://github.com/nim-lang/Nim/issues/14811)


Re: NvP: s.add('x') 100M times

2020-06-25 Thread cblake
I was mistaken. I was compiling my `seq` test with `-d:useMalloc` which fixes 
the problem. Sorry..fiddling with too many knobs.

`string` and `seq[char]` behave identically with `gc:arc` and both get fixed 
(139MB) with `--gc:arc -d:useMalloc`. Other GCs (including `none`) still beat 
`gc:arc`-without-`useMalloc` on `string`. However, other gc's spend more memory 
(like 420MB) than `gc:arc` on `seq[char]`. So, whatever is going on, `seq` is 
actually worse than `string`, not better. { But also a side note for 
@HashBackupJim to try `-d:useMalloc` with `--gc:arc`. }

At this point, I should raise an issue over at Github. I'll link it back here.


Re: NvP: s.add('x') 100M times

2020-06-25 Thread cblake
One other Nim-level thing I can say is that things work as expected for 
`seq[int]` of the same memory scale (100MB). I.e., 


proc main() =
  var s: seq[int]
  for i in 0..12_500_000: s.add 1
  echo len(s)
main()


Run

produces a memory report (using /usr/bin/time on Linux) like: 


187MB seq2-arc
250MB seq2-default
250MB seq2-msweep
265MB seq2-boehm
300MB seq2-none


Run

So, this problem is **_only_** for Nim `string`. Indeed, if one changes 
`string` to `seq[char]` in the original example, usage goes down to 139MB, 
roughly what one would expect for a 3/2 growth policy.


Re: NvP: s.add('x') 100M times

2020-06-25 Thread cblake
The same problem happens on nim-1.2.0 (well, nim-devel git hash 
ed44e524b055b985b0941227a61f84dd8fdfcb37). So, this is a long-lived, maybe 
since the beginning behavior of `gc:arc` (but we should still have a memory 
overuse regression test). Probably time to look at generated C.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
@HashBackupJim \- `newSeqOfCap[T](someLen)` also exists and, yes, pre-sizing 
can help a lot in Nim (and almost any lang that supports it).

Profile-guided-optimization at the gcc level can also help Nim run timings a 
lot..In this case 1.6x to 1.9x for various `gc` modes. 
[https://forum.nim-lang.org/t/6295](https://forum.nim-lang.org/t/6295) explains 
how. LTO also helps since most of the boost of PGO is probably from well chosen 
inlining.

@sschwartzer \- not only string benchmarks...Interpreter start-up/etc. Anyway, 
this isn't a Python forum, and benchmarking "always depends". :-) :-)

Someone else should reproduce my `--gc:arc` uses more memory than `gc:none` for 
the original str1.nim or one with a `main()` (or both). I think this kicking 
the tires has probably uncovered a real problem.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
I also usually find Py2 much faster than Py3. Pypy usually helps. Cython more 
so, but with much more work.

Anyway, the obvious better way to do it in Nim (which I always assumed was 
"never the point") is 


var s = newStringOfCap(100_000_001) # or whatever
for i in 0..100_000_000: s.add('x')
echo len(s)


Run

which runs 2x as fast as otherwise and uses exactly the right amount of memory. 
I mention it just in case @HashBackupJim was unaware.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
I don't see how your linked algo explains deltas across gcs if that `3 div 2` 
growth happens for all of them. The memory `gc:arc` uses here seems more like 
the sum of all prior allocations, not "up to 1.5x what's needed". Actually, 
`gc:arc` uses 1.25x more mem than `gc:none` (312MB) in a test I just tried.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
Yup. Just what I was seeing, @b3liever. No `main()`-difference to the RSS 
delta, and a very noticable delta in a non-intuitive direction. So, either our 
intuitions are wrong in a way which should be clarified or there's a problem 
which should be fixed. Maybe a github issue?


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
@cumulonimbus \- I tried that. Didn't alter the behavior I was seeing.

If this behavior was not always there then my guess is that some arc bug was 
causing a crash, got fixed, and now the fix causes this. Regardless of whether 
it was always there or appeared by bug-jello-squishing accident as I theorize, 
we should probably have a little suite of "memory use regression" tests to 
prevent stuff like the scenario I described. Such a suite would be a kind of 
"correctness testing" for deterministic memory management. Could have a 
"fuzzy/ball park compare".

Maybe we have such already, perhaps informally? If so, we should add this 
`str1` to it. If not, it can be the first test. :-)


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
I don't disagree. Might need delving into the generated C to figure out, but 
I'm guessing my results are not hard to reproduce. If they are let me know how 
I can best help.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
Just did a non-PGO regular `-d:danger` run. Times went up 1.9x but memory usage 
patterns were the same with `gc:arc` using much more RSS than `gc:boehm` or 
`gc:markAndSweep`. It's a pretty tiny program.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
For this particular benchmark `--gc:boehm` uses the least memory and time for 
me on nim 28510a9da9bf2a6b02590ba27b64e951a208b23d with gcc-10.1 and PGO but 
that least is still 2.5x the RSS of python-2.7.18. Not sure why, but yeah it is 
35x faster than Python.


Re: Dictionary syntax

2020-06-24 Thread cblake
In other words, it seems "excessive" to you because you do not imagine that you 
would never use anything but the stdlib `Table` "behind" `{}`. Never switching 
out something like that is "famous last words" in some circles. :-) So, the 
verbosity guards against the regret of hard-coding.


Re: Dictionary syntax

2020-06-24 Thread cblake
The `{}` syntax is what the Lisp world calls an "association-list" and is more 
general than what the Python world calls a dictionary literal/constructor 
syntax. In Nim `{a:b,c:d, ...}` is just syntactic sugar for `[ (a,b), (c,d), 
... ]` (or `@[]` if you prefer a seq analogy, but it's at compile time when 
either seq or array lengths are "known").

This is a good in that you can use the same syntax for multiple back-end 
associative array implementations (like a simple sorted array, trees, alternate 
hash tables, etc.). In a sense the syntax is only loosely coupled to the 
semantics here. The cost is the slight extra verbosity of tacking on a 
`.toTable`, `toSortedArray`, `.toBST`, `.toBTree` or whatever which is deemed 
an acceptable trade off for the generality. Nim is not Python and you do have 
to put types/type-related things in various places.


Re: bizarre name clash with template

2020-06-10 Thread cblake
Re: "shadow"...Eh, there are a few very specific usages of it like a local 
shadowing a parameter or nested scope shadowing outer scope and so on, but the 
word/idea generally captures the ' must both be "x" ' idea fundamental to this 
bug, at least to my ears/eyeballs.

Anyway your reproductions are very small/self-contained. I expect Nim core can 
fix it or at least produce a better error message or at the very least document 
the limitation (unless it already is documented).


Re: bizarre name clash with template

2020-06-09 Thread cblake
Sorry - I read more your 2nd whole comment than your 2nd sentence and replied 
too quickly. Oops.

It does indeed seem weird that `for y in [1]: bod(y)` works while `for x in 
[1]: bod(x)` fails.

I'd file a bug called something like "template-local symbol mishandled when 
shadowing a sub-proc-scope inject" or something similar.


Re: bizarre name clash with template

2020-06-09 Thread cblake
You need another `{.inject.}` at the site of the 2nd let for the call site, 
i.e., inside the `template`: 


let x {.inject.} = 1
bod(x)


Run

Or you could mark the whole template `{.dirty.}`.


Re: Can I "prune" directories with walkDirRect?

2020-06-08 Thread cblake
For what it's worth, at least for systems where one can assume post-POSIX.2008 
APIs like `openat` and `fstatat` (really any vaguely recent Linux/BSD), it is 
possible to roll your own recursion that, in my timings, is about 4x faster hot 
cache than `walkDirRec` (note no trailing 't'). What boost one gets depends on 
whether you need that `Stat` metadata (e.g. file times, sizes, owner, perms, 
etc.) or just path names. Those ideas are in the current 
`cligen/dents.nim:forPath` template for Unix users. (It could actually be sped 
up a couple ways still, but not very portably.)

Of course, depending on the scenario/hotness of caches, the boost may not 
matter much. Costs from the recursion may be tiny compared to IO/other work. Or 
it could dominate. Personally, I do a lot of work out of a `tmpfs /dev/shm` 
bind mount to `/tmp` which never has any IO.

Mostly I was just giving yet another syntax for packaging up recursions..one 
that lets the guts hang out more and the calling code has to/ ** _gets to_** be 
aware of that while maybe having delegated the low-level system stuff to the 
template author. Nim is pretty great like that.

BTW, I did re-arrange the order of the 4 event clauses to `always, preRec, 
postRec, recFail` and provide a `recFailDefault` template to make things read 
more nicely. So, my above code example won't quite work as written anymore. 
Best to start from one of the 4 worked out examples after the template in 
`dents.nim` if you want to use it.


Re: Further development of a tree iterator that allows mutations

2020-05-31 Thread cblake
I think not descending into a just-inserted-node makes total sense. By 
definition, you had to either unlink such a subtree from somewhere else or you 
had to forge it/create it. So, treating it like an atom in this iterator makes 
sense just as not descending the deleted node does. This solves the 
delete-then-insert right at the same spot problem, too. I think if you treat 
the inserts as "atoms" like the "deletes" it works out.

You might want to provide an API call named "replace" that does the 
delete/insert pair so client code can be clear about that. Not having the 
"iteration contract" mean descending into either seems the clean semantic. Just 
my opinion, of course.

It might be worth searching for XML Tree (or HTML DOM) surgery APIs in other 
programming languages for ideas. Another possible use case of all this might be 
editing Nim ASTs in macros which I don't think has been mentioned thus far. (I 
didn't check IRC.)


Re: Walking trees without recursive iterators

2020-05-30 Thread cblake
Yeah..By "either order" I just meant inserting after deleting and vice-versa as 
you figured out.

I like your idea of "do" being spelled differently for the various sections! 
(If that can be done easily.)

No doubt modifying structures while iterating can be hairy tricky and is tricky 
to have good interfaces for client code.


Re: Walking trees without recursive iterators

2020-05-30 Thread cblake
I don't see anything super obviously wrong, but I only just read through it. 
Your test suite looks like it could maybe use a mixture of inserts & deletes in 
either order while iterating, but @e / @spip / @GordonBGood / @rayman22201 
should probably take a look.

It may not be so appropriate to your "tree structure surgery while iterating" 
use case, but I happened to do Yet Another Idea just the other day over at 
[https://forum.nim-lang.org/t/5506](https://forum.nim-lang.org/t/5506) which 
bypasses the Nim iterator syntax framework for a Nim template that accepts 
multiple body clauses. So, `for path in itr(params): perElement` becomes: 


forPath(params): cannotRecurseWork
do: perElementWork
do: preRecurseWork
do: postRecurseWork


Run

Internally, that `template forPath` is just a regular recursive function that 
exposes these "hooks" for the clauses. The end of that other thread has a more 
complete example of using that `forPath` defined in 
`https://github.com/c-blake/cligen/` in the `cligen/dents.nim` submodule. 
Something like XML would not need to worry about file system permission issues 
and that first clause.

It is not very easy to use/safe, but it is yet another way to package up 
walking trees without recursive iterators that "looks similar" to Nim iterators 
that a reader of this thread may care to be aware of.


Re: Can I "prune" directories with walkDirRect?

2020-05-27 Thread cblake
And just to close out the example more fully for @chalybeum, since he said he 
was a beginning programmer, he could probably start from the code below (after 
a `nimble install 'cligen@#head'`) to do whatever it was he wanted to six 
months ago if he even stuck with programming, with Nim and/or this forum: 


import sets, posix, cligen/[dents, statx, posixUt]
proc chalybeum(prunePath="", recurse=0, chase=false,
  xdev=false, roots: seq[string]) =
  ## ``prunePath`` file fmt is one base name per line.
  var prune: HashSet[string]
  for line in lines(prunePath): prune.incl line
  for root in roots:
forPath(root, recurse, false, chase, xdev, depth,
path, nameAt, ino, dt, lst, st, recFail):
  case errno
  of EXDEV, ENOTDIR: discard # Expected sometimes
  of EMFILE, ENFILE: return  # Too deep;Stop recurse
  else:
let m = "chalybeum: \"" & path & "\")"
perror cstring(m), m.len
do:
  echo path # chalybeum logic on `path` goes here
do: # Pre-recurse: skip dirs w/base names in prune
  if dt == DT_DIR and path[nameAt..^1] in prune:
continue
do: # Post-recurse; only `path` valid now
  if recFail: echo "did not recurse into ", path
when isMainModule:
  import cligen; cligen.dispatch(chalybeum)


Run

Replacing HashSet checking with regex prunes/excludes is not so hard. It is 
fast, does avoid symlink infinite loops (when optionally chasing), and 
conditionally avoids cross-device links which is kind of the standard set of 
Unix tree walk functionality, BUT I totally admit it's not a very easy to use 
programming interface. The logic of the recursive loop leaks out plenty. I just 
threw it together. I'm not sure it's much better than the example expanded 
recursion code would be. Maybe a little.


Re: Can I "prune" directories with walkDirRect?

2020-05-27 Thread cblake
I put various things that "might be useful" for Unix CLI utilities under 
`cligen/` so client code can just have `cligen` as a leaf/sole dependency. 
Directory tree recursion fits that pattern, and stdlib `walkDir*` never seemed 
quite right to me. There is already `cligen/posixUt.recEntries`, for example.


Re: Can I "prune" directories with walkDirRect?

2020-05-27 Thread cblake
Of course, that doesn't control recursive descent which was @chalybeum's 
driving use case but you did use a smiley. :-)

Based on possible broader interest and a general trend lately of trying to be 
less abstract, I just added a template-based tree iteration to 
`cligen/dents.nim`: 
[https://github.com/c-blake/cligen/commit/633da63a997269486f3e00432ec4ce37521fb530](https://github.com/c-blake/cligen/commit/633da63a997269486f3e00432ec4ce37521fb530)
 with a fully worked out example utility in `examples/chom.nim` as well as 4 
inline `cligen.dispatchMulti`-driven example usages.

The short of if it is that you can make things about 2x-8x faster on Linux if 
you just trust `d_type` and you only need path names, not, say, i-node data 
from lstat/stat/etc. Performance only matters for large directory hierarchies, 
obviously.


Re: Can I "prune" directories with walkDirRect?

2020-05-27 Thread cblake
@chalybeum ..the feature being mentioned seems to be the `FilterDescend` 
predicate function of the referenced package. You would just load up a Nim 
`HashSet` from `sets` with to be skipped paths and pass some predicate like 
`path notin blacklist`, with `blacklist` probably being a captured closure 
variable.

While we are resurrecting a zombie-ish thread to promote a package ;-), I can 
say something about performance expectations that may be uncommon knowledge. 
The GNU coreutils `find` goes through contortions to be able to traverse file 
hierarchies that are more deep than the limit on open file descriptors. This 
results in that `find` using like 3.5x the syscalls, 2.5x the CPU time, and 3x 
the RAM of more direct implementations. If that data must be read off a 
persistent IO device those usages will not be bottlenecks, but on a fully 
cached run they will be. So, a decent speed-up relative to GNU `find` on 
non-pathological file trees is sometimes possible, if that sort of speed-up 
motivates anyone.


Re: Introducing therapist, a new commandline parsing library

2020-05-25 Thread cblake
You might also find 
[https://github.com/c-blake/cligen](https://github.com/c-blake/cligen) 
interesting. The central insight is that a Nim proc signature (or Python or C++ 
or any language with default values...) already contains an adequate "spec".

More meaningful per parameter help is about all that is missing, but can be 
easily "added in" when identifier names & types alone are insufficient 
descriptions of their purpose.

There are other bells & whistles in `cligen` like prefix matching (bridging the 
gap from short to long options) that might also inspire you.


Re: A good word for idiomatic nim?

2020-05-18 Thread cblake
For those who may not know, NIMBY is an acronym for Not In My BackYard that is 
used quite a lot in some domains. Could confuse. Another possibility along the 
fun dimension might be 
["nimmy"](https://www.urbandictionary.com/define.php?term=Nimmy).


Re: HashSet performance

2020-05-12 Thread cblake
Cool. Glad you got it working.

That multiply-rotate hash in my last post does yield a noticeable boost. I get 
Python-3.8.2 times of 1.255..1.337, Nim-1.3.3 with default hash times of 
0.391..0.531, and Nim-1.3.3 with that multiply-rotate of 0.205..0.347. So, with 
a fast, good enough hash Nim is over 6x faster than Python here (1.255/.205 = 
6.12x).

Using a faster hash yielding 0.391/0.205 = 1.9x speed-up just Nim-to-Nim is 
unusual, though not shocking. Usually, more work per-lookup is happening that 
masks the cost of the hash function calculation. The new default `hash` is 
"safer" against many hostile situations, such as this one you stumbled upon in 
your first post. One pays a little for that safety/generality, but with Nim 
it's easy to claw back that performance whenever it matters. Just 3 lines of 
code in this case.


Re: HashSet performance

2020-05-10 Thread cblake
You seem to be using an older version of Nim which has as a default the 
identity hash function (hash(x) = x) which leads, in this case, to poor 
performance. Just add this code (after your imports and before `sumA`): 


import hashes, bitops
proc hash*(x: int): Hash {.inline.} =
  Hash(rotateLeftBits(uint64(x) * 15241094284759029579'u64, 27))


Run

Alternatively, you could update Nim-1.2 or the head of Nim-devel. Also, that 
hash is just one that works. There are several other integer hashes available 
at: [https://github.com/c-blake/adix](https://github.com/c-blake/adix) in the 
file `althash.nim`.


Re: How to use Clang LTO + PGO with Nim

2020-05-06 Thread cblake
Yeah. I'm not sure what it is, exactly, about Nim-generated C code, but PGO 
tends to benefit it more than "my typical C". So, it makes sense the compiler 
itself could get a 1.25x speed-up. Might be more|less on gcc. I've probably 
mentioned this about 10x over the past 5 years, but here's one from about a 
year ago: [https://forum.nim-lang.org/t/4517](https://forum.nim-lang.org/t/4517)


Re: How to use Clang LTO + PGO with Nim

2020-05-06 Thread cblake
(Of course at least two C++ backends can also do PGO..So, probably just `nim BE 
--pgo prog args` is probably best if slightly more verbose.)


Re: How to use Clang LTO + PGO with Nim

2020-05-06 Thread cblake
I think simplifying life with a couple of wrapper scripts (`nim-pgo-gcc` and 
`nim-pgo-clang`, say) would help would-be PGO'ers, even if said scripts were 
only a starting point for the users' own scripts.

In theory, the Nim compiler could itself grow a 2-phase compilation mode. 
Things are already set up so you can say `nim c --run prog args`. So, all we 
would really need to do is add a `--pgo` that works just like `--run` but A) 
uses a modified namespace/extra name for `compiler.options` (a place to put 
`-fprofile-generate`) and B) re-compiles everything again afterward with 
another modified namespace/extra name for `compiler.options` (a place to put 
`-fprofile-use`).

I'd say that kind of fits well with existing nim CLI workflow. There are only 
minor risks like biasing folks to write programs that do a benchmark run when 
given no args or accidental infinite loop/giant benchmarks but those are both 
intrinsic to PGO in general.

Maybe there is already a way to hack `config.nims` to be this smart? I think a 
lot more people would use it if it were as simple as `nim c --pgo -d:danger 
prog args`.


Re: Performance issues with "complex" module

2020-05-02 Thread cblake
With the gcc backend doing "profile guided optimization" can often help 
(especially with measurements to drive inlining choices). e.g., doing just 
@Stefan_Salewski's command-line I get: 


julia1: 84 ms (sum of pixels: 27677748)
julia2: 83 ms (sum of pixels: 27677748)
julia3: 82 ms (sum of pixels: 27677748)


Run

while doing this 


nim c -d:danger --panics:on -c t.nim
gcc -O3 -flto -fprofile-generate -I/usr/lib/nim/lib ~/.cache/nim/r/t/*.c -o 
pg
./pg
gcc -O3 -flto -fprofile-use -I/usr/lib/nim/lib ~/.cache/nim/r/t/*.c -o 
t-final


Run

I get: 


julia1: 82 ms (sum of pixels: 27677748)
julia2: 82 ms (sum of pixels: 27677748)
julia3: 82 ms (sum of pixels: 27677748)


Run

So, the PGO "flattened" the performance a bit more. In this example the PGO 
speed boost was close to zero/measurement error, but I have seen as high as 2x 
speed-ups for more complicated programs. So, it's worth having some little 
"nim-pgo" wrapper script to automate the above if you are writing programs that 
have an easy "benchmark run".


Re: How to download a file concurrently?

2020-04-26 Thread cblake
Sounds like an application of the "byte range requests" introduced in HTTP/1.1. 
You can definitely use those + multiple TCP connections to be as 
self-optimized/unfriendly to other internet traffic/robust to unusual packet 
loss/etc. situations as desired, but I don't know what Nim ecosystem support 
for that there is. (Just providing keywords to search for really..)


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-26 Thread cblake
If you are not redirecting to a file but letting it display to the terminal 
then a big variable is what GUI terminal emulator you are using - XTerm, 
rxvt-unicode, st, etc. I think that could create a lot of varying numbers..why 
it may even depend on if you use TrueType fonts or cheaper to render fonts.

Not sure what the goals are here, but if you really want to figure out what's 
going on at the core Nim IO layer then you want clean separation of sources of 
time spent (like input/split IO and output IO). E.g, forget about locking 
stdout buffers - just don't output at all. And on the output side forget about 
reading input - just generate the data like your generator and output and 
realize that if you are outputting to a terminal then you are benchmarking how 
that terminal handles extreme high update rates.


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-25 Thread cblake
Locking can indeed slowdown output as can exception handling and is unnecessary 
for single-threaded output and the exceptions may be unhelpful if there is 
nothing you can do on out of space errors. If you are on Linux/glibc then you 
can also just use `fwrite_unlocked` directly as in `cligen/osUt.nim:urite` or 
`cligen/mslice.nim:urite` as shown in the `examples/cols.nim` program in 
[https://github.com/c-blake/cligen](https://github.com/c-blake/cligen)


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-24 Thread cblake
This comes up a lot. Here are a couple recent ones: 
[https://forum.nim-lang.org/t/4738](https://forum.nim-lang.org/t/4738) and 
[https://forum.nim-lang.org/t/5103](https://forum.nim-lang.org/t/5103).


Re: Why does &"""{"total":23}""" compile but &"""{"total:":23}""" not compile?

2020-04-24 Thread cblake
Quoted closers are indeed another wrinkle. If you want a one pass algo you can 
just track the index of the last `':'` and use it only after you pass the final 
unquoted closer having framed the two parts. Still doesn't sound hard to have 
less brittle framing, but effort worth is always subjective. Nim does use `':'` 
a lot. Maybe @TKurtBond could do the PR? :-)


Re: Why does &"""{"total":23}""" compile but &"""{"total:":23}""" not compile?

2020-04-23 Thread cblake
As long as `':'` is not allowed in `fmt` part of `{expr:fmt}`, you can probably 
just frame the `expr` by doing an `rfind` on ':' instead of a `find` (last 
rather than first), no?


Re: `{}` syntax

2020-04-08 Thread cblake
It's a good-ish summary and your range `[3..5]` is another great example why we 
might want two operators since one could also do a range of `{"RFCs" .. 
"the"}`. The augmented tree actually keeps things like "the number of nodes 
below a node" (or "to the left") and can do a height-of-tree time search to 
find a node of a given rank. You don't really need much more than a `SortedSeq` 
to really demonstrate the interface issue, as mentioned in the long discussion, 
though. (If you want a simpler algorithmic mental model.)

Some clarifications. We don't ever need to break how `Table` is accessed (or 
`seq`). We could only do the `[]` and `{}` for the new bag/sorted bag thing 
(which is not yet impled in Nim (I have C impls to port)). I think the 
extent/impact of this idea was misunderstood because there is so little 
exposure to unkeyed "search" trees. (I've seen some AVL "rope" libraries, the 
Python blist, etc., but it's definitely off the beaten path). A lot of words 
were spilled with me trying to explain that.

Mostly, what I think I am getting pushback on (it's not exactly clear) is 
providing `{}/{}=` for `Table` merely as an "alias" after these impls are all 
in play. That way `{}` could mean "associative-ish" while `[]` could mean 
`positional-ish` as a more global convention like the `add` vs `append` proc or 
other things in `doc/apis.rst`. We could preserve backward compatibility 
forever. Right now `{}=` is only used in the `json` module. The `{}` for 
`Table` is just to make it easier to have the object be either a tree or a 
table. `CritBitTree` and maybe something else might also benefit from that kind 
of operator alias (and might be algorithmically updatable to an instrumented 
tree, actually...Haven't thought about that).

It's possible push back is more general, like "don't ever use two indexing 
operators like `[]/{}` for some core thing like a tree/sortedseq!" or "don't 
ever add an alias for `Table.[]!!`". I don't mean to speak for anyone.

It's not like this is a huge deal to me (however much I may write). It's a 
neglected interface detail because few libraries have multiassociative things 
until late in the syntax game, but we soon will, and Nim's fabulously flexible 
syntax already supports it. It's more consistent with initializer syntax like 
`{:}` and highlights possibly slow associative lookup. I feel like if you like 
`{key:val, ..}` at all more than `@[(key,val),..` then this kind of move would 
be welcome.

Tutorials/intros/explainers of the language do have to decide "Do I favor 
internal consistency/clarity of mentioning `{}` early or do I favor similarity 
to other prog.langs". I think that about covers the real questions. Oh, and the 
alternatives to these are to always use `[]` but with type constructors inside 
to distinguish the "access mode".


Re: Help with hash sets needed

2020-04-07 Thread cblake
Oh, and of course `a == b` must imply `hash(a) == hash(b)`.


Re: Help with hash sets needed

2020-04-07 Thread cblake
`hash` and `==` must agree. So, you need: 


proc `==`(a, b: ComponentPoint): bool =
  a.id == b.id


Run


`{}` syntax

2020-04-07 Thread cblake
Some people don't read the RFCs often while everyone working with some new 
prog.lang _loves_ syntax debates. :-) So, to get broader exposure, here: 
[https://github.com/nim-lang/RFCs/issues/207](https://github.com/nim-lang/RFCs/issues/207)


Re: Compile time FFI

2020-04-04 Thread cblake
Slightly off topic, but as part of a recent job recruitment process I wrote my 
own libpcap-like packet processor and in benchmarking it I noticed it was about 
6x faster than the actual libpcap which uses a bunch of expensive callbacks via 
function pointers internally. It's also pretty easy in Nim to define some 
`object` s with bitfields and do swapping as needed from network to host byte 
order, and might be surprisingly few lines of code. You can still use `tcpdump 
-w file` to capture data. Just pointing out there may be less value in libpcap 
vs writing it yourself than you might think at first glance.


Re: setting file owner and running process as another user

2020-04-03 Thread cblake
Also, in terms of `setuid`, you may also find `cligen/posixUt.dropPrivilegeTo` 
useful in the `cligen` nimble package (also at 
[https://github.com/c-blake/cligen](https://github.com/c-blake/cligen)) used 
for example by the nimble `kslog` package (also at 
[https://github.com/c-blake/kslog)](https://github.com/c-blake/kslog\)).


Re: setting file owner and running process as another user

2020-04-03 Thread cblake
You wan the `posix` module. It has `chown`, `chmod`, `setuid`, etc.


Re: Ternary conditional operator

2020-03-26 Thread cblake
Also see this recent thread 
[https://forum.nim-lang.org/t/6060](https://forum.nim-lang.org/t/6060) and note 
that "compile-time if" aka `when` can also be used in expressions as in `let i 
= when defined(nimRulez): 1 else: 2`.


Re: Unchecked arrays

2020-03-23 Thread cblake
As a trivial correction, @Stefan_Salewski wrote `add()` in one spot where he 
meant `addr`, but `add` is totally within the "safe zones" of Nim and even 
applies to `seq[T]` types when you cannot `setLen` ahead of time. (`setLen` is 
faster when you can use it, though.) Just trying to block potential confusion.


Re: Is it possible to share a Hashmap between threads?

2020-03-13 Thread cblake
Someday destructors may be defined/automatically called, but today there will 
be no clean-up without the `close` call. You may not need to/want to clean-up 
anyway.

need to: `memfiles.open()` creates a memory mapping, but does not (by default) 
keep the file descriptor open. The only thing the `mf.close` above does is call 
`munmap`. Memory maps are a less scarce resource than file handles due more to 
default OS limits than anything else. Being much less scarce, it's not quite as 
bad to leak it.

want to: You might even want to leak it on purpose if you wanted to keep the 
string data alive beyond the lifetime of the procedure invocation. E.g., you 
may want to return an `MemSlice` or a `Table` with them as keys or values. They 
are just pointers to that memory mapped region (hence zero copy) which means if 
you `mf.close/munmap` the pointers become invalid.

Of course, if you are leaking millions of memory mappings then even small leaks 
can grow to be burdensome on the OS -- if you don't need continued access to 
the data. So, as with all resource management there is a balance of concerns 
and trade offs.


Re: Is it possible to share a Hashmap between threads?

2020-03-13 Thread cblake
Well, the data "on disk" is not NUL-terminated. So, though the type I am using 
is called "cstring" the string data is not terminated. If anything, the data is 
more "newline terminated" in your file format. Those newlines are mapped out by 
the `memSlices` iterator, though with the result stored in `.size`. So, I just 
use that to bound the loop.

I used that type because Nim defines convenient `[]` style access to the bytes 
of a `cstring` giving back `char` s. It isn't strictly necessary, but that was 
my reason. (You would have to do your own pointer arithmetic and dereferencing 
otherwise).


Re: Is it possible to share a Hashmap between threads?

2020-03-13 Thread cblake
One vexing thing about modern CPUs is that a single CPU core cannot saturate 
RAM bandwidth. So, while throughput in L1 can be 100..200 GB/sec, and RAM 
bandwidth might be 50..150 GB/s, the amount just one L1 cache system can read 
from L2/L3/DIMMs might be only 5-30 GB/sec. This puts (often much simpler) 
single-threaded code that could otherwise saturate RAM throughput (thereby 
running as fast as possible anyway) at a major throughput disadvantage. But I 
totally agree with @mratsim here that staying with ST-code is the best idea.

The cynic inside me guesses the throughput disparity incentivizes people to 
write more parallel code, thusly selling more CPUs. So, these days CPU 
manufacturers are sadly incentivized to complicate lives of programmers. At the 
dawn of the industry, it was the other way around. Maybe there is some less 
cynical "scaling" problem, but usually with throughput/bandwidth you can just 
add more lanes of traffic.


Re: Is it possible to share a Hashmap between threads?

2020-03-13 Thread cblake
Oh, and if you did want to use a zero copy IO interface like `memfiles` on 
this, you can and indeed it is over 3X faster, but you have to write your own 
integer parser because `parseUint` and all its analogues take a Nim `string` 
not `MemSlice`. You can use the dollar operator `$` to convert, but that is no 
faster than just using `lines` in the first place. Parsing an integer in ASCII 
is actually pretty easy, though. E.g., 


import tables, os, memfiles

proc parseUInt(data: cstring, size: int): uint {.inline.} =
  let zero = ord('0')
  let nine = ord('9')
  for i in 0 ..< size:
let code = ord(data[i])
if code >= zero and code <= nine:
  result *= 10
  result += uint(code - zero)
else:
  return 0  #non-digit; parse error

proc ytz(path: string): uint =
var
  table = initTable[uint, uint]()
  mf = memfiles.open(path)
if mf.mem == nil:
quit("Cannot open " & path)
for s in memSlices(mf):
let n = parseUInt(cast[cstring](s.data), s.size)
if n == 0:
quit("Parse error: " & $s)
table.mgetOrPut(n, 0) += n
for key, val in table:
result = max(result, val)
mf.close

proc main() =
echo ytz(paramStr(1))

main()


Run

Not a big deal here, but if the file were really large that factor of 3 might 
be more appreciated.


Re: Is it possible to share a Hashmap between threads?

2020-03-13 Thread cblake
This topic can be confusing to beginners because the word "read" is overloaded. 
There is a "system call" named "read", but also many higher level APIs that are 
"buffered" which are also called "read", and people often refer to the general 
activity as just "reading". The buffering means those higher level APIs are 
doing "big reads" into a buffer and then copy small parts of that out for you 
for each line. If data is copied out shortly after it was read into that big 
buffer with a system-call read then the copy is cheap because it was from CPU 
caches (L1/L2/L3).

There are also "zero copy" APIs to do IO like "mmap" on Unix, or in the Nim 
world "memfiles". These let the OS itself do the "buffering" of what is in RAM 
vs "on disk" (or other persistent device like "flash memory" these days).

Your linked yahtzee file is only 900 KB with average line length of about 9 
bytes, but only 790 unique entries (so a small Table). With such a short line 
length you might profit from not creating new strings at all but re-using the 
same one over and over in the loop. That is because string creation and 
anything you might do with only 9 bytes take comparable time. The Nim stdlib 
also has a `split` iterator which lets you iterate over the split fields 
without creating a new `seq[string]`. Looking at your input file format there 
is no need to even `split` at all, though. So, there are simpler things than 
`memfiles` you can do to speed it up.

Also, because you only have 790 unique lines, it should be much faster to get 
the max value by looping over the `Table` after you are done creating it (both 
no 2nd lookup, and 790 <<< 100,000).

So, this is a faster version of your program: 


import tables, os, parseutils

proc ytz(path: string): uint =
var
table = initTable[uint, uint]()
n: uint
for numStr in lines(path):
if parseUInt(numStr, n) == 0:
quit("Parse error: " & numStr)
table.mgetOrPut(n, 0) += n # here
for key, val in table:
result = max(result, val)

proc main() =
echo ytz(paramStr(1))

main()


Run

Incidentally, when posting code to this forum, if you put `Nim` after the 
triple backquote it will be rendered with color syntax highlighting.


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
It was a tad melodramatic, but at least it had a '?'. And it even turned out to 
be 2-3x in the other direction (consistent with other Go table benchmarks I've 
tried).


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
PGO = profile-guided optimization. I mentioned how to use it already.. 
`-fprofile-generate...` .

It's not on by default because how it works is gcc instruments your code, 
counts what happens based on your sample program, then uses this information to 
make good judgements (unlike usual guesswork heuristics). To be on by default 
would require a sample program often with sample input data which cannot be 
assumed by default.

It wouldn't be crazy for gcc to accept some "sample program invocation" string 
as a command argument, but it does not. That's how my `nim-pgo` wrapper script 
works, though. I just say `nim-pgo prog './prog data' '--gc:arc'`.


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
It also wouldn't be crazy for the `nim c` / `nim cpp` compiler drivers to 
accept some sample program invocation and do the program - sample-run - 
re-compile three stage process more automatically, but that isn't in place 
either. My `nim-pgo` script just assumes the usual `~/.cache/nim/r` locations I 
mentioned. The short of my script is: 


nim $be -c --cc:gcc -d:release -d:danger $nimOpts "$prog.nim" || exit 1
inputs=$(echo $nimcache/*.c*)
$cc -fprofile-generate $inputs -o $prog $ccL || exit 1
rm -f *.gcda
eval "$cmd"
$cc -fprofile-use  $inputs -o $prog $ccL


Run

but I also echo some things like PWD and set up those `$` environment vars.


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
Anyway, once you reproduce my times you should maybe tell those Reddit people 
how fast Nim code can be. :-) Perusing the thread, I didn't see people doing 
PGO on the C/C++ solutions. (It may well make less of a difference for those 
solutions, too.)


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
Oh, and just for reproducibility's sake: 


myshell$ nim --version
Nim Compiler Version 1.1.1 [Linux: amd64]
Compiled at 2020-03-12
Copyright (c) 2006-2019 by Andreas Rumpf

git hash: bbc231f8e06c18fe640fa56a927df476c2426ccb
active boot switches: -d:release -d:nimUseLinenoise -d:nativeStackTrace 
-d:nimHasLibFFI

Run


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
Oh, and just as a sanity check that I'm using the right data/algo, my output is 
that `@["estop", "pesto", "stope", "topes"]` which I think is right reading a 
little more of that original reddit thread you posted.


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
Anyway, my code with no PGO and `-d:danger --gc:arc` only takes 90 ms. My 
conclusion from this is that the Nim code can be 211/90 or 211/66 =~ 2.3x or 
3.2x faster than the Go, depending on what you think is fare to compare. BUT 
you need to manually guide inlining, and if you want that last 90/66 = 1.36x 
boost you need PGO.


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
Actually, just taking your original code and adding `{.inline.}` to the `func` 
and compiling with `-d:danger --gc:arc` makes it only 118ms. So, that inlining 
must be the biggest boost the PGO is discovering.


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
BTW, I haven't tried "every version of the code in between" to isolate all the 
various causes & effects which might be instructive, but my version with only 
`-d:release` takes 630 ms while `-d:danger` takes 446 ms while `-d:danger 
--gc:arc` takes 255 ms, and then that 66 ms with PGO. So, in short, the 
performance "is there" in Nim, but you might need to compile with more 
optimization turned on. Just the above sequence is 9.5x and that last 4x comes 
from gcc PGO which I have been finding unusually effective on Nim generated C 
code. On that Go code with the same input and `gccgo -O3 nklc.go` I get 255 ms 
as well. I don't have a convenient shells script driver to do PGO with gccgo, 
though.


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
Oh, and one other number - your original code with PGO on that same input - 72 
ms. So, the double/triple table lookup/slice stuff is really all only costing 
around 72/66 =~ 9%.


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
So, with your original version doing only `-d:release` I get 1030 ms (I.e. 1.03 
sec) while on the same machine with my version doing gcc PGO and `--gc:arc` I 
get 66 ms, about 15x speed-up.


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
Well, maybe and via the `let` could be. I'm not sure the slice assignment is 
optimized down to a `copyMem`, though which works fine for your string data. 
Could be slice compare and `copyMem` in the if clause is best. Those Table use 
and how-to-compile modifications may well pack much more of a punch.


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
Do you have an example input file somewhere you could link to?


Re: Unknown performance pitfall in tables?

2020-03-12 Thread cblake
Besides @Stefan_Salewski's comments, it also bears mentioning that A) you need 
to be careful to compile the Nim with `-d:danger` before making Nim performance 
conclusions, B) `--gc:arc` may also help but may or may not be in your 
particular version of Nim and C) the algorithm in your Nim case is not quite 
the same - in the Go you just do an unconditional append and then a second loop 
while in the Nim you do a triple lookup when present and double lookup when 
missing . Avoiding string creation with the slicing and doing a more direct 
translation of the Go would look like: 


func smallestRepr(arg: string): string =
  let doubled = arg & arg
  result = arg
  var slice = result # I *think* I have this right but have not tested..
  for i in 1 .. arg.high:
copyMem slice[0].addr, doubled[i].addr, arg.len
if slice < result:
  result = slice

proc main() =
  var seen = initTable[string, seq[string]]()
  for word in paramStr(1).lines:
let key = word.smallestRepr
seen.mgetOrPut(key, @[]).add word
  for key, words in seen:
if words.len == 4: echo words
main()


Run

At least in the past when I've benchmarked things, Go's tables were much slower 
than Nim's.

In the past I have noticed that, at least with the gcc backend, you can get a 
big boost (near 2X) from profile-guided optimization (`gcc -fprofile-generate 
...; Run sample prog; gcc -fprofile-use ...`) where the `...` s would be all 
the C files Nim wants to compile and the sample prog will probably be named 
`a.out`. (Usually everything in `~/.cache/nim/r/` on Unix, but visible via `nim 
c --verbosity:2` everywhere.)


Re: Can if statements be used as expressions everywhere?

2020-03-10 Thread cblake
Hmm. Ok - it was not your **exact** form. Sorry! All of these also work, 
though: 


let test = 0 + (if true: 3 else: 4)
let test = 0 + (if
true: 3 else: 4)
let test = 0 + (if true:
  3 else: 4)
let test = 0 + (if true: 3 else:
  4)
let test = 0 + (if true: 3
else: 4)


Run

but yeah, the one you list does not.

Maybe this is an undocumented (or even documented?) parser limitation or maybe 
it's a bug? I suspect it's some hard to resolve ambiguity in that exact case. 
The `block` workaround you found isn't so bad.


Re: Can if statements be used as expressions everywhere?

2020-03-09 Thread cblake
That exact form works for me with nim-0.19.2, nim-0.20.0, nim-devel. Don't 
recall exactly what version it landed in, but it sounds like maybe some 
limitation in either a very old or a bug in a very specific intermediate 
version of Nim you may be using.


Re: Can if statements be used as expressions everywhere?

2020-03-09 Thread cblake
It is intentional and you just need () around the middle commented out example.


Re: How to use integer generics in types?

2020-02-24 Thread cblake
There's no easy way to provide default values for such integer generic 
parameters, is there? E.g. (this doesn't work): 


type MatchPool[N: static[int] = 3] = object 


Run


Re: How to use pointer to mmaped file?

2020-02-20 Thread cblake
You can probably use `cstring` if on a C backend, but if you wanted to index 
some other element type than `char` (like `int` or `Foo`) you can also do 


let foos = cast[ptr UncheckedArray[Foo]](mmaped_file.mem)
if i * Foo.sizeof < mmapped_file.size:
  doSomethingWith foos[i]  # e.g. i == 0 for 1st Foo


Run

You may well be able to skip the range check every time if you are, e.g. 
looping over the file & know you won't overflow from that, etc. To recover 
automatic bounds checks, you can write new/use existing procs accepting 
`openArray[Foo]` (e.g. `openArray[T]`) and pass via `toOpenArray` explicitly in 
the parameter list: 


fun(toOpenArray(mf.mem, 0, mf.size div Foo.size - 1), a, b, c)


Run

That can be abbreviated with a `template oaFoo(mf)` if you need to pass it or 
various files in several places or something. If such procs want to modify the 
data the file needs to have been opened writable or you will get segmentation 
violations.


Re: Why whitespace?

2020-02-19 Thread cblake
Sure. I follow @sschwarzer's approach personally. The reasons for @Tutbadut or 
(someone else passionate about just one issue) **not** to fork something go 
back to network effects and the value of communities. Sometimes reiteration of 
pros & cons, etc. has value for new people.


Re: Why whitespace?

2020-02-18 Thread cblake
Well, an exclusive indent char might be nice, but TAB is very easy for almost 
everyone to enter by default already. Used the way @Tutbadut and @Araq were 
discussing only at the beginning of lines ('^TAB*[^TAB]' in Regex-ese) it could 
function that way. It's true that a brand new character might side-step the 
diverse ways to configure the amount of rendered blank space per tab, but at 
the non-zero cost of mandating unicode.

While source code "in the large" is not tabular data, in a 
leading-whitespace-for-block-structure language like Python or Nim, that part 
of the text has "regimented spacing concerns" much like a table. TAB characters 
of any fixed width are actually less useful for tables than the old early 20th 
century mechanical typewriter tab-stops that inspired the TAB character. 
Tab-stops were totally configurable - first at 4, next at 8, next at 10 or 20, 
etc. That flexibility was largely lost in the transition to ASCII-oriented 
software. { You think _one_ conversion ratio is a pain in the ass...try 
_dozens_! ;-) }

Anyway, detecting the very first '^[some-kinda-white]' and mandating whole-file 
consistency afterward seems "not so hard", especially if you allow comment-only 
lines as candidate leading white-space determiners. In spite of not being so 
hard, most text editors punt on it, only doing line-global/file-global type 
transformations leading to Araq's observations. It would be a really easy CLI 
text filter to do leading-space only..probably less work than reading this 
whole thread and also of value to non-`.nim` files. Probably exists already, 
too. Using hard tab for leading indent only and letting divergent users adjust 
as they like seems philosophically much like Nim's style insensitivity, only 
with far more common environmental support for tunable rendering.

The only real problems are defaults of 8-spaces per TAB being 4x more than what 
Nim people are used to and diversity/unavailability of TAB size setting. 
Combined that creates trouble for the many to please the few. I'm pretty sure 
the diversity/unavailability of adjustment/having to tell Python TAB-size 
conversion complexity was what led to Guido's regret, mixed in with some 
Wadler's Law can't-believe-we're-arguing-about-it.

Anyway, just food for thought. I'm not really advocating anything.


Re: Why whitespace?

2020-02-18 Thread cblake
Well, only something like 20% of the `17 shl 16` code points are allocated. So, 
there's surely hope in the future. (A disaster of address space overallocation 
if you ask me..A round 5 nibbles/16 shl 16 wasn' t enough? Really? Oh well.)


Re: Why whitespace?

2020-02-18 Thread cblake
Communities like consistency in many forms both explicit and implicit almost by 
their very nature. There are always cooperators and mavericks. Building 
consensus is often hard to impossible, and if you reflect upon it, consensus 
building is **not** an enviable task - programmer's are a very opinionated 
demographic, often with incompatible concerns.

Nim is more flexible & powerful than most prog languages/compilers. I hope that 
you give it a serious try regardless of any initial misgivings about this or 
that specific inflexibility here or there. If you like writing little CLI tools 
as a way to get your feet wet in a language then 
[https://github.com/c-blake/cligen](https://github.com/c-blake/cligen) might 
save you some time.


Re: Why whitespace?

2020-02-18 Thread cblake
[https://stackoverflow.com/questions/18962581/why-is-it-recommended-that-i-replace-tabs-by-spaces](https://stackoverflow.com/questions/18962581/why-is-it-recommended-that-i-replace-tabs-by-spaces)

If you follow the link in the top scored answer, you will see an email thread 
from 1994. There are definitely pros & cons. Insisting on only either spaces or 
only TABs for indent in a given file might be workable in terms of preventing 
half & half messed up files. Choosing just one is surely the simplest approach, 
though.


Re: Why whitespace?

2020-02-18 Thread cblake
Line length/text density is mostly limited by visual acuity in arc seconds or 
arc minutes, not by pixels on a display. There is, of course, doubtless a wide 
range of human visual acuity, but most people wear corrective lenses to target 
a much narrower range. Hence you hear "120 vs 80" and never "800 vs 40". Font 
choices influence this, too.

It's just a "complicated unit" to translate into how things are selected and 
even varies by "sitting distance" from displays. So, people don't use it, but 
it's absolutely what drives all this. Book/Newspaper/magazines have had very 
high DPI (really dots per arc degree!) for many more decades or even centuries. 
They limit to what is "easy to read" for their market, but their problem is a 
little different than "logically stiff" source code.

Computer displays (combined with font designs) basically got to 80 columns by 
the mid to late 1980s and stayed there even as pixel densities have ever 
increased since then. People do wider aspect ratio displays now and so a pair 
of side by side 80 column windows is not so rare. Had "book page" aspect ratio 
like "portait mode" displays been more common earlier on, things might have 
settled on 64 character wide windows. So, maybe be happy it's at least 80. ;-)


Re: Suggestions for optimization?

2020-02-18 Thread cblake
Incidentally, @lagerratrobe, once you save a lot of time by replacing sorting 
by a perfect commutative hash, the next optimization is making the IO part 
faster (you need cligen-0.9.43 for this): 


import strutils, tables, cligen/[mfile, mslice]

const prime: array[26, uint64] = [ #9/267751 oflow
  7'u64, 61, 41, 53, 2, 71, 47, 29, 3, 97, 89, 17, 59,
  19, 5, 31, 101, 11, 13, 23, 37, 79, 73, 67, 43, 83 ]

proc product(word: MSlice): uint64 =
  result = 1'u64  #Assumes whole file is uppercase
  for ch in word: #iterator needs cligen>=0.9.43
result *= prime[ord(ch) - ord('A')]

proc pBuildQry(dict="words", query: seq[string]) =
  var anas = initTable[uint64, seq[MSlice]]()
  let mf = mopen(dict)
  if mf == nil: return
  defer: mf.close
  for word in mf.mSlices():
anas.mgetOrPut(word.product, @[]).add word
  for word in query:
let word = word.toUpperAscii
let prod = word.toMSlice.product
echo word, ":"
try:
  for ana in anas[prod]: echo "  ", ana
except: discard
import cligen; dispatch(pBuildQry)


Run

With that my 129 ms goes down by 57ms to 72 ms (1.8x better) on that SOWPODS 
dictionary. Then if you Nim compile with `--gc:arc`, it goes down to about 47 
ms (1.53x better). Then if you use gcc's PGO it goes down to 42 ms (1.12x 
better).

I suspect with your dictionary and CPU that you would see a similar overall 4x 
to 5x improvement taking you from 286 ms down to more like <70ms. Language 
comparison-wise, I do not believe R or CPython would be able to be sped up 
similarly. (Perhaps in PyPy the product might be able to become fast, but 
perhaps not if they are worrying about overflow switching to arbitrary 
precision ints.)


Re: Why whitespace?

2020-02-18 Thread cblake
I don't really disagree. I only really use it for patches/diffs, myself. I 
think the real issue is just the the wild inconsistency of both how to adjust 
tab-size settings and whether it is even possible across a wide universe of 
text rendering situations. Avoidance lets you bypass that complexity from the 
"abstractness" of how much space a TAB represents, but of course, you do lose 
the abstractness, too.


Re: Why whitespace?

2020-02-18 Thread cblake
Another argument not yet mentioned is that having spaces rather than TABs gives 
one flexibility in (probably only temporarily) wedging in a control line 
governing a set of sub-blocks at some "half-indent" amount (such as only 1 or 
only 2 spaces). You can do some `if` or `when` for example. It could also make 
certain patches that effect such changes smaller (by adding one line instead of 
adding 1-line and shifting a ton of stuff) and more robust (since diffs are 
usually contextual and this would require less context to match). Admittedly, 
not a huge deal and a very rare case compared to reading code everyday, but 
still something concrete which would not be possible with exclusively TAB 
indents. 


Re: Why whitespace?

2020-02-18 Thread cblake
I think you are misinformed. Any number of spaces is fine. Tabs are not allowed.


Re: Nimrod Combinatorics Module of Reimer Behrends does not work with --gc:arc

2020-02-17 Thread cblake
Like this, just for 100% clarity: 


assert toSeq(permutations([1,2,3,4,5])).len == fac(5)


Run


Re: Nimrod Combinatorics Module of Reimer Behrends does not work with --gc:arc

2020-02-17 Thread cblake
Yeah. Sorry if I was unclear. I meant just ditch the `count` template entirely 
and change all the asserts to call `toSeq` directly. There's probably another 
way to do it, too. It's not like the lengths of any of those test cases warrant 
conserving memory..biggest is like 1024.


Re: Nimrod Combinatorics Module of Reimer Behrends does not work with --gc:arc

2020-02-17 Thread cblake
If you change the `doAssert count(...) == number` to `assert toSeq(...).len == 
number` it all works (in ARC mode or not) except for the first test case with 
an empty sequence which has `.len == 0` (which actually seems right to me, 
though it's a degenerate case kind of like `0!`..personally I would probably 
change that to check for 0 not 1 and have the semantics be to be an empty 
loop/empty iterator unless that breaks something else).


Re: Nimrod Combinatorics Module of Reimer Behrends does not work with --gc:arc

2020-02-17 Thread cblake
stdlib `algorithm` has `nextPermutation` (and `prevPermutation`), but not 
combinations. Combinations might make sense to add. Not sure about the full 
spectrum of iterators in that package.


Re: Access to Iterators of Sequences

2020-02-17 Thread cblake
You can also avoid sorting the entire table if you only care about the top 2 or 
10 values by using the Nim stdlib's spiffy `heapqueue` module as in 
[https://github.com/c-blake/cligen/blob/master/examples/newest.nim](https://github.com/c-blake/cligen/blob/master/examples/newest.nim)
 \-- might be worth adding a `top`/`bottom`/`most` type procs to `heapqueue`.


Re: How does one check if a string is numeric?

2020-02-17 Thread cblake
Minor nitpick - you should call it `isFloat` so `isInt` can also work since 
both take just a `string`, or maybe `isNumber(string, enum)` with different 
kinds of number.


Re: Suggestions for optimization?

2020-02-14 Thread cblake
Incidentally, in the small alphabet case (and realistically anagrams are 
usually related to word games over small alphabets) there is a neat trick for 
rapid signature computation: map the 26 letters to the first 26 primes (2, 3, 
5, 7, .., 97, 101). Then make the signature of a word the product of the 
primes. If you never overflow the product this is guaranteed to be a valid 
anagram signature by the uniqueness of prime factorization and commutativity of 
multiplication. Note that this **removes sorting from the equation entirely**.

In the worst case, that product can start to overflow a 64-bit integer past 
about 9 letters (since 101^9 =~ 2^60 and 101^10 > 2^66). However, neither worst 
nor average cases really matter since we have a static dictionary of every 
possible case. So, it's actually easy to just test a given dictionary to see if 
any overflows collide in our exact case. While it is easy to defeat by an 
attacker, for "organic/natural" dictionaries this is likely to be very rare 
because 2^64 is much more than the square of the number of words in most 
dictionaries. So, we are in a regime very far from birthday paradox collisions. 
Really, we aren't random, but also really the "load on the address space" is 
much less than the square of the number of words - it is the product of the 
number of non-product overflowing words and product overflowing words which is 
much smaller (assuming most words do not overflow). So, this is really a 
somewhat empirical question about words and dictionaries and letters.

With just a naive 'A' .. 'Z' \--> 2 .. 101 mapping and this dictionary: 
[https://github.com/jesstess/Scrabble/blob/master/scrabble/sowpods.txt](https://github.com/jesstess/Scrabble/blob/master/scrabble/sowpods.txt)
 I got 15_294 words that overflowed (and no collisions!) for 267_751 words. One 
can do better, though. At least in English, letters have a very skewed usage 
distribution. So, we can make overflows even less likely. You can just look 
that frequency up on Wikipedia 
([https://en.wikipedia.org/wiki/Letter_frequency](https://en.wikipedia.org/wiki/Letter_frequency))
 and sort the first 26 primes by that frequency. Doing that with the above 
SOWPODS dictionary, we reduce the number of overflows to 223.

You can do better still with a pre-pass to _measure_ this frequency in the 
dictionary, sort the primes list by that measured frequency. I get 175 
overflows that way. You can do better still by measuring only the frequency of 
letter usage in words _longer than 9 letters_ (where it actually matters to 
contain overflow). I get only 136 overflows (over 100x better than the initial 
15,000) that way. You might be able to do slightly better by measuring the 
frequency of letters in just the 136 overflow words and adjusting, but that is 
more work and upsets the apple cart of the prior measurement perhaps requiring 
iteration. I think that 136 yields some a priori probability of any collision 
of under 2e-12 (under certain not quite right randomness assumptions).

So, while you might worry this "product signature" technique is not applicable 
due to overflow, it probably is in a language where anagrams are interesting 
Applying all these ideas (except the measurement which I did in a 15 line 
Python script for its convenient arbitrary precision arithmetic) in Nim we get: 


import strutils,tables,sets #toUpperAscii,Table,HashSet

const prime: array[26, uint] = [  # >.len==9 frequency
  7'u, 59, 29, 31, 2, 67, 47, 53, 5, 101, 73, 23, 43,
  13, 17, 41, 97, 11, 3, 19, 37, 71, 79, 89, 61, 83 ]

proc product(word: string): uint =
  result = 1   #You might worry about overflow, BUT
  for ch in word:  #..long anagrams are so rare it's ok!
result *= prime[ord(ch) - ord('A')] #223/267751oflow

proc sortByLetter(word: string): string =
  var cnt: array[26, int16]
  for ch in word: cnt[ord(ch) - ord('A')].inc
  for i, c in cnt:
for n in 0 ..< c: result.add chr(ord('A') + i)

proc pBuildQry(dict="words", query: seq[string]) =
  var anas = initTable[uint, seq[string]]()
  for line in lines(dict):
let word = line.toUpperAscii
anas.mgetOrPut(word.product, @[]).add word
  for word in query:
let word = word.toUpperAscii
echo word, ":"
try:
  for ana in anas[word.product]: echo "  ", ana
except: discard

proc sBuildQry(dict="words", query: seq[string]) =
  var anas = initTable[string, seq[string]]()
  for line in lines(dict):
let word = line.toUpperAscii
anas.mgetOrPut(word.sortByLetter, @[]).add word
  for word in query:
let word = word.toUpperAscii
echo word, ":"
try:
  for ana in anas[word.sortByLetter]: echo "  ", ana
except: discard

proc collisions(dict="words") =
  var t = initTable[uint, seq[string]]()
  for line in lines(dict):
let word = 

Re: Programming help - Binary Search

2020-02-13 Thread cblake
I think you just need `hi >= lo` instead of just `>` (the Java code you posted 
has that). You should also be told that this is in the Nim stdlib as 
`algorithm.binarySearch`, although it does use the simpler -1 return to signal 
"not found" rather than `Option[T]` stuff.


Re: Suggestions for optimization?

2020-02-11 Thread cblake
To clarify how @cdome's comment differs from @SolitudeSF's and @lagerratrobe's 
own observation, the intent was always for `-d:danger` to imply `-d:release`, 
but I guess there are several versions of Nim out there for which this 
implication does not hold. You have to define both for those Nim versions to 
disable all checking. Going forward (from `devel` and probably from version 
1.2) that will be redundant upon simply `-d:danger`.

Another thing you can do along the lines of just how you build like `-d:release 
-d:danger`, though is using GCC's profile-guided optimization in the C backend. 
I've seen up to 2X speed boosts for Nim generated code with that.

I suspect there is much more improvement to be had by the various algorithmic 
improvements I mentioned, though, and some are pretty easy. They do start to 
seem more like "cheating" in cross-programming language comparisons, especially 
moving all the hard work to compile-time. So, it depends if you're actually 
doing anything real with this code.


Re: Suggestions for optimization?

2020-02-11 Thread cblake
BTW, the Unicode version of this with 65536 to 2e6 "letters" (depending upon if 
you can restrict to "one Unicode plane") is definitely trickier. You might 
think a 2 pass radix sort would help, but since most words are short most of 
your sorts are _very_ small. Indeed, even insertion sort might well beat the 
`algorithm.sort` merge sort in some average sense for this problem. Insertion 
sort tends to beat almost everything for N < 16..32 on modern CPUs due to cache 
effects. Almost any dictionary will have average, median, and mode word lengths 
below 16 just because long words are unpopular. So, just having a "wrapper 
sort" that switches to insertion for N < 20 and falls back to `algorithm.sort` 
for bigger is probably best. It would also be possibly valuable to the 
community if you looked into the stdlib `algorithm.sort` and had it switch to 
insertion at small N. It doesn't right now.

The trouble you run into with most array/table based approaches is that the 
alphabet is so much larger than the word length. So, a Fenwick Tree or anything 
with "implicit order" from the indices is just too big to iterate over 
effectively. Without implicit order you wind up still having to sort "used 
letters" and letter repeats are not usually very dramatic. About the only thing 
I can think of that _might_ help Unicode (beyond just in-place/d-danger tricks) 
in that counting-sort-style approach is another structure from Preston Briggs 
in a 1993 Rice University tech report/thesis. That sparse-dense thing can 
sometimes be useful to manage iteration over very sparse subsets of small-ish 
universes like this Unicode code point space. It's conceivable you could use 
that as a Unicode histogram, but you would still have to sort a small array 
(number of _unique_ letters). So, it's hard to say it'd be faster than the 
insertion-or-merge idea, but it's obscure enough to be worth mentioning as also 
neglected.

Of course, **if you find yourself running this calculation a lot on a small-ish 
fixed set of basically static dictionaries** , the single biggest optimization 
you might do is to simply **save the signature index** to a file. You could 
extend @juancarlospaco 's idea and have a `const Table` built into the binary 
executable on a per dictionary basis (e.g. 1 program file per dict). You would 
absolutely have to time several hundred or even thousand anagram queries to get 
a reading. Or if you wanted a generic program to work with many dictionary 
files then instead of a `Table` you could sort the signatures themselves paired 
with their words. Then you could just save that sorted list to a file and do 
lookup via binary search on the file with at most O(lg(Nwords)) disk probes per 
anagram query (plus time to open/mmap the file). You could also hash-structure 
that file to get that down to 1 probe at substantial code baggage. 
([https://github.com/c-blake/suggest](https://github.com/c-blake/suggest)/ has 
a fully worked out example of a much more complex persistent hash-structure 
store along those lines.) You _could_ try using Nim's `marshal` module to save 
your `Table` to disk and load it back whenever you want it, but in this case I 
think the loading of the table would be no faster than just building it from 
scratch. You could also "save in memory" by having a long-running server that 
processes each dictionary just once and then answers anagram queries over a 
local network or pipe or something.

Personally, I think _all_ of the above are good coding exercises.


Re: Suggestions for optimization?

2020-02-11 Thread cblake
Besides @SolitudeSF's important suggestion, an algorithmic improvement you can 
use is to replace `algorithm.sorted(lower, system.cmp)` with a hand-written 
"counting sort". For ASCII (i.e. 1 byte textual characters) this is (very?) 
easy and should be quite a bit faster than `algorithm.sorted`. Something like 
this:


proc sortByLetter(word: string): string =
var cnt: array[26, int8]
for ch in word:   #simple counting sort
  cnt[ord(ch) - ord('A')].inc
for i, c in cnt:
  for n in 0 ..< c:
result.add chr(ord('A') + i)


Run

You probably need some `toUpper` in there if your dictionary is stored in 
lowercase (although that could also be done prior to entry to the above `proc` 
or you could also just change `A` to `a`). That `int8` type covers words up to 
255 characters long, but I think the longest real word in any spoken language 
with dictionaries is less than that. You could always count and print a warning 
or switch that to `int16` or `int32` which could help if chemical elements or 
really crazy stuff is possibly in play.

The same idea could be adapted to unicode but then you would probably want a 
`Table`, not an `array` which would slow it down more than the above minor 
modifications.

I actually think this is a good problem context in which to introduce the oft 
neglected counting sort which can also be used with (unlike here) non-empty 
satellite data (at some slightly increased code baggage).


  1   2   3   >