Re: NvP: s.add('x') 100M times

2020-06-25 Thread cblake
Github issue: 
[https://github.com/nim-lang/Nim/issues/14811](https://github.com/nim-lang/Nim/issues/14811)


Re: NvP: s.add('x') 100M times

2020-06-25 Thread cblake
I was mistaken. I was compiling my `seq` test with `-d:useMalloc` which fixes 
the problem. Sorry..fiddling with too many knobs.

`string` and `seq[char]` behave identically with `gc:arc` and both get fixed 
(139MB) with `--gc:arc -d:useMalloc`. Other GCs (including `none`) still beat 
`gc:arc`-without-`useMalloc` on `string`. However, other gc's spend more memory 
(like 420MB) than `gc:arc` on `seq[char]`. So, whatever is going on, `seq` is 
actually worse than `string`, not better. { But also a side note for 
@HashBackupJim to try `-d:useMalloc` with `--gc:arc`. }

At this point, I should raise an issue over at Github. I'll link it back here.


Re: NvP: s.add('x') 100M times

2020-06-25 Thread Araq
Nim's strings do a mild form of copy-on-write so that string literals can be 
copied in O(1). Probably the logic is flawed.


Re: NvP: s.add('x') 100M times

2020-06-25 Thread cblake
One other Nim-level thing I can say is that things work as expected for 
`seq[int]` of the same memory scale (100MB). I.e., 


proc main() =
  var s: seq[int]
  for i in 0..12_500_000: s.add 1
  echo len(s)
main()


Run

produces a memory report (using /usr/bin/time on Linux) like: 


187MB seq2-arc
250MB seq2-default
250MB seq2-msweep
265MB seq2-boehm
300MB seq2-none


Run

So, this problem is **_only_** for Nim `string`. Indeed, if one changes 
`string` to `seq[char]` in the original example, usage goes down to 139MB, 
roughly what one would expect for a 3/2 growth policy.


Re: NvP: s.add('x') 100M times

2020-06-25 Thread cblake
The same problem happens on nim-1.2.0 (well, nim-devel git hash 
ed44e524b055b985b0941227a61f84dd8fdfcb37). So, this is a long-lived, maybe 
since the beginning behavior of `gc:arc` (but we should still have a memory 
overuse regression test). Probably time to look at generated C.


Re: NvP: s.add('x') 100M times

2020-06-25 Thread Araq
> I think this kicking the tires has probably uncovered a real problem.

Indeed, there is a high priority bug lurking here, please keep investigating!


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
@HashBackupJim \- `newSeqOfCap[T](someLen)` also exists and, yes, pre-sizing 
can help a lot in Nim (and almost any lang that supports it).

Profile-guided-optimization at the gcc level can also help Nim run timings a 
lot..In this case 1.6x to 1.9x for various `gc` modes. 
[https://forum.nim-lang.org/t/6295](https://forum.nim-lang.org/t/6295) explains 
how. LTO also helps since most of the boost of PGO is probably from well chosen 
inlining.

@sschwartzer \- not only string benchmarks...Interpreter start-up/etc. Anyway, 
this isn't a Python forum, and benchmarking "always depends". :-) :-)

Someone else should reproduce my `--gc:arc` uses more memory than `gc:none` for 
the original str1.nim or one with a `main()` (or both). I think this kicking 
the tires has probably uncovered a real problem.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread HashBackupJim
Thanks for the tip. I knew about this sizing trick for tables, and it did save 
a lot of RAM ina small test (large table) because it avoided resizes, but 
wasn't aware strings had a similar thing. I read the Nim manual, but stuff only 
sticks when doing a lot of coding in a new language and I'm not there yet.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread sschwarzer
The reason I suggest comparing against Python 3 is that Python 2 is no longer 
supported by the CPython project. Also, by far most of the people who start 
with Python will use Python 3.

If Python 2 is faster in many string benchmarks that's most likely because the 
default string type in Python 2 is simpler (just bytes) vs. Python 3 (code 
points). If you see your data as just bytes and want to compare on these 
grounds, compare with Python 3's `bytes` type.

Now, when benchmarking Nim vs. Python, should you use a Python version and/or 
code style because it's more similar in implementation to Nim or should you use 
a Python version and/or code style because that's how most people would use 
Python? :-)

By the way, I think it's similar to the question: When benchmarking Nim, should 
you use the fastest implementation or the most idiomatic/straightforward 
implementation? I guess it depends.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
I also usually find Py2 much faster than Py3. Pypy usually helps. Cython more 
so, but with much more work.

Anyway, the obvious better way to do it in Nim (which I always assumed was 
"never the point") is 


var s = newStringOfCap(100_000_001) # or whatever
for i in 0..100_000_000: s.add('x')
echo len(s)


Run

which runs 2x as fast as otherwise and uses exactly the right amount of memory. 
I mention it just in case @HashBackupJim was unaware.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread HashBackupJim
Thanks for the tips. As I mentioned, I am kicking the tires with Nim to see how 
it behaves. My goal isn't to find the fastest way to create a 100M string 
containing all x's in either Python or Nim, but rather for me to get a feel for 
how one behaves vs the other. If I run stupid tests like these and they all 
come out great with Nim, fantastic! That gives me a lot of confidence in it. If 
I get unexpected results, I'd like to understand why.

I have a 200K line Python 2.7 app. When I have run small tests comparing Python 
2 vs 3, Python 2 is often 50% faster (I don't need or want Unicode). Maybe if 
the whole app was running in Py3 it would overall be faster, but based on what 
I've seen, that seems uncertain. So for my particular situation, I don't care 
how Nim compares to Py3.

For grins, I ran the s = s + 'x' string test on Python2 and Python3.6.8 (all I 
have), and Py3 was 44% slower.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread sschwarzer
For the record: In Python 3, "some string" is a unicode string where the items 
are code points. The model more similar in semantics to the Nim version is the 
`bytes` type. That said, I get the same time for multiplying `b"x"` (`bytes`) 
as for `"x"` (`str`).


Re: NvP: s.add('x') 100M times

2020-06-24 Thread sschwarzer
Two things about the Python version:

  * Using `xrange` tells me you're on Python 2. I suggest you use a 
current/recent Python 3 version for your benchmarks.
  * The recommended way to concatenate a big number of strings is with 
`separator.join(iterable)`.



So you could use: `s = "".join(("x" for _ in range(100_000_000)))`, - but the 
Pythonic version would actually be `s = 100_000_000 * "x"`.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
I don't see how your linked algo explains deltas across gcs if that `3 div 2` 
growth happens for all of them. The memory `gc:arc` uses here seems more like 
the sum of all prior allocations, not "up to 1.5x what's needed". Actually, 
`gc:arc` uses 1.25x more mem than `gc:none` (312MB) in a test I just tried.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread b3liever
I don't know much about Python but it seems strings are immutable. Meaning each 
time you add it allocates a new string with `len+1`, which explains why memory 
usage is about 100MB and its slow.

In Nim on the other hand strings are mutable. They are 
[resized](https://github.com/nim-lang/Nim/blob/devel/lib/system/seqs_v2.nim#L103)
 only when len+1 becames bigger than capacity. And the new cap is follows this 
[algo](https://github.com/nim-lang/Nim/blob/devel/lib/system/sysstr.nim#L25) 
Explains the extra space and why it's faster.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
Yup. Just what I was seeing, @b3liever. No `main()`-difference to the RSS 
delta, and a very noticable delta in a non-intuitive direction. So, either our 
intuitions are wrong in a way which should be clarified or there's a problem 
which should be fixed. Maybe a github issue?


Re: NvP: s.add('x') 100M times

2020-06-24 Thread b3liever
nim -v


Compiled at 2020-06-23
git hash: c3459c7b14

With `nim c -d:danger --panics:on --gc:arc`:


Maximum resident set size (kbytes): 395352
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 99524
Voluntary context switches: 1
Involuntary context switches: 20

With `nim c -d:danger --panics:on` (default gc)


Maximum resident set size (kbytes): 282964
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 70471
Voluntary context switches: 1
Involuntary context switches: 10

Both tests are with a main function and it makes no difference.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
@cumulonimbus \- I tried that. Didn't alter the behavior I was seeing.

If this behavior was not always there then my guess is that some arc bug was 
causing a crash, got fixed, and now the fix causes this. Regardless of whether 
it was always there or appeared by bug-jello-squishing accident as I theorize, 
we should probably have a little suite of "memory use regression" tests to 
prevent stuff like the scenario I described. Such a suite would be a kind of 
"correctness testing" for deterministic memory management. Could have a 
"fuzzy/ball park compare".

Maybe we have such already, perhaps informally? If so, we should add this 
`str1` to it. If not, it can be the first test. :-)


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cumulonimbus
Possibly something to do with this being main() and not inside a function? 
Can't think of a reason why for this one, but many benchmarks change 
significantly (for the better) when put inside a function


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
I don't disagree. Might need delving into the generated C to figure out, but 
I'm guessing my results are not hard to reproduce. If they are let me know how 
I can best help.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
Just did a non-PGO regular `-d:danger` run. Times went up 1.9x but memory usage 
patterns were the same with `gc:arc` using much more RSS than `gc:boehm` or 
`gc:markAndSweep`. It's a pretty tiny program.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread cblake
For this particular benchmark `--gc:boehm` uses the least memory and time for 
me on nim 28510a9da9bf2a6b02590ba27b64e951a208b23d with gcc-10.1 and PGO but 
that least is still 2.5x the RSS of python-2.7.18. Not sure why, but yeah it is 
35x faster than Python.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread Araq
Huh? Tracing GCs should never win this. Something strange is going on... :-) 


Re: NvP: s.add('x') 100M times

2020-06-24 Thread HashBackupJim
Thanks. I tried that just now: 


ms:nim jim$ nim c -d:danger --gc:arc str1
Hint: 11937 LOC; 0.390 sec; 12.988MiB peakmem; Dangerous Release build; 
proj: /Users/jim/nim/str1; out: /Users\
/jim/nim/str1 [SuccessX]

ms:nim jim$ /usr/bin/time -l ./str1
10001
0.90 real 0.73 user 0.15 sys
 440176640  maximum resident set size
107478  page reclaims
 5  page faults
 1  voluntary context switches
 4  involuntary context switches


Run

Does this need 1.3x?


NvP: s.add('x') 100M times

2020-06-24 Thread HashBackupJim
This string test uses s.add('x') instead of s = s & x for Nim, and s += 'x' for 
Python. 


ms:nim jim$ cat str1.nim
var
  s: string

for i in 0..100_000_000:
  s.add('x')
echo len(s)

ms:nim jim$ nim c -d:danger str1
Hint: 14210 LOC; 0.275 sec; 15.977MiB peakmem; Dangerous Release build; 
proj: /Users/jim/nim/str1; out: /Users\
/jim/nim/str1 [SuccessX]

ms:nim jim$ /usr/bin/time -l ./str1
10001
0.68 real 0.56 user 0.10 sys
 326627328  maximum resident set size
 79753  page reclaims
 8  page faults
 1  voluntary context switches
 6  involuntary context switches

ms:nim jim$ cat str1.py
s = ''
for i in xrange(1):
  s += 'x'
print len(s)

ms:nim jim$ /usr/bin/time -l py str1.py
1
   20.74 real20.67 user 0.06 sys
 105099264  maximum resident set size
 25834  page reclaims
 9  involuntary context switches


Run

Nim blows Python out of the water on this, though it uses 326M of RAM to create 
a 100M string.

Python's memory use is good, only 105M for a 100M string, but it's slow.

For these tests, I'm not so much looking to find the best way to create a 100M 
string in Nim or Python. I'm comparing the two to find out where there may be 
large performance differences, hopefully in Nim's favor, and to get a better 
understanding of how Nim works.


Re: NvP: s.add('x') 100M times

2020-06-24 Thread Araq
Memory consumption is usually _much_ better with `--gc:arc`.