Re: NvP: s.add('x') 100M times
Github issue: [https://github.com/nim-lang/Nim/issues/14811](https://github.com/nim-lang/Nim/issues/14811)
Re: NvP: s.add('x') 100M times
I was mistaken. I was compiling my `seq` test with `-d:useMalloc` which fixes the problem. Sorry..fiddling with too many knobs. `string` and `seq[char]` behave identically with `gc:arc` and both get fixed (139MB) with `--gc:arc -d:useMalloc`. Other GCs (including `none`) still beat `gc:arc`-without-`useMalloc` on `string`. However, other gc's spend more memory (like 420MB) than `gc:arc` on `seq[char]`. So, whatever is going on, `seq` is actually worse than `string`, not better. { But also a side note for @HashBackupJim to try `-d:useMalloc` with `--gc:arc`. } At this point, I should raise an issue over at Github. I'll link it back here.
Re: NvP: s.add('x') 100M times
Nim's strings do a mild form of copy-on-write so that string literals can be copied in O(1). Probably the logic is flawed.
Re: NvP: s.add('x') 100M times
One other Nim-level thing I can say is that things work as expected for `seq[int]` of the same memory scale (100MB). I.e., proc main() = var s: seq[int] for i in 0..12_500_000: s.add 1 echo len(s) main() Run produces a memory report (using /usr/bin/time on Linux) like: 187MB seq2-arc 250MB seq2-default 250MB seq2-msweep 265MB seq2-boehm 300MB seq2-none Run So, this problem is **_only_** for Nim `string`. Indeed, if one changes `string` to `seq[char]` in the original example, usage goes down to 139MB, roughly what one would expect for a 3/2 growth policy.
Re: NvP: s.add('x') 100M times
The same problem happens on nim-1.2.0 (well, nim-devel git hash ed44e524b055b985b0941227a61f84dd8fdfcb37). So, this is a long-lived, maybe since the beginning behavior of `gc:arc` (but we should still have a memory overuse regression test). Probably time to look at generated C.
Re: NvP: s.add('x') 100M times
> I think this kicking the tires has probably uncovered a real problem. Indeed, there is a high priority bug lurking here, please keep investigating!
Re: NvP: s.add('x') 100M times
@HashBackupJim \- `newSeqOfCap[T](someLen)` also exists and, yes, pre-sizing can help a lot in Nim (and almost any lang that supports it). Profile-guided-optimization at the gcc level can also help Nim run timings a lot..In this case 1.6x to 1.9x for various `gc` modes. [https://forum.nim-lang.org/t/6295](https://forum.nim-lang.org/t/6295) explains how. LTO also helps since most of the boost of PGO is probably from well chosen inlining. @sschwartzer \- not only string benchmarks...Interpreter start-up/etc. Anyway, this isn't a Python forum, and benchmarking "always depends". :-) :-) Someone else should reproduce my `--gc:arc` uses more memory than `gc:none` for the original str1.nim or one with a `main()` (or both). I think this kicking the tires has probably uncovered a real problem.
Re: NvP: s.add('x') 100M times
Thanks for the tip. I knew about this sizing trick for tables, and it did save a lot of RAM ina small test (large table) because it avoided resizes, but wasn't aware strings had a similar thing. I read the Nim manual, but stuff only sticks when doing a lot of coding in a new language and I'm not there yet.
Re: NvP: s.add('x') 100M times
The reason I suggest comparing against Python 3 is that Python 2 is no longer supported by the CPython project. Also, by far most of the people who start with Python will use Python 3. If Python 2 is faster in many string benchmarks that's most likely because the default string type in Python 2 is simpler (just bytes) vs. Python 3 (code points). If you see your data as just bytes and want to compare on these grounds, compare with Python 3's `bytes` type. Now, when benchmarking Nim vs. Python, should you use a Python version and/or code style because it's more similar in implementation to Nim or should you use a Python version and/or code style because that's how most people would use Python? :-) By the way, I think it's similar to the question: When benchmarking Nim, should you use the fastest implementation or the most idiomatic/straightforward implementation? I guess it depends.
Re: NvP: s.add('x') 100M times
I also usually find Py2 much faster than Py3. Pypy usually helps. Cython more so, but with much more work. Anyway, the obvious better way to do it in Nim (which I always assumed was "never the point") is var s = newStringOfCap(100_000_001) # or whatever for i in 0..100_000_000: s.add('x') echo len(s) Run which runs 2x as fast as otherwise and uses exactly the right amount of memory. I mention it just in case @HashBackupJim was unaware.
Re: NvP: s.add('x') 100M times
Thanks for the tips. As I mentioned, I am kicking the tires with Nim to see how it behaves. My goal isn't to find the fastest way to create a 100M string containing all x's in either Python or Nim, but rather for me to get a feel for how one behaves vs the other. If I run stupid tests like these and they all come out great with Nim, fantastic! That gives me a lot of confidence in it. If I get unexpected results, I'd like to understand why. I have a 200K line Python 2.7 app. When I have run small tests comparing Python 2 vs 3, Python 2 is often 50% faster (I don't need or want Unicode). Maybe if the whole app was running in Py3 it would overall be faster, but based on what I've seen, that seems uncertain. So for my particular situation, I don't care how Nim compares to Py3. For grins, I ran the s = s + 'x' string test on Python2 and Python3.6.8 (all I have), and Py3 was 44% slower.
Re: NvP: s.add('x') 100M times
For the record: In Python 3, "some string" is a unicode string where the items are code points. The model more similar in semantics to the Nim version is the `bytes` type. That said, I get the same time for multiplying `b"x"` (`bytes`) as for `"x"` (`str`).
Re: NvP: s.add('x') 100M times
Two things about the Python version: * Using `xrange` tells me you're on Python 2. I suggest you use a current/recent Python 3 version for your benchmarks. * The recommended way to concatenate a big number of strings is with `separator.join(iterable)`. So you could use: `s = "".join(("x" for _ in range(100_000_000)))`, - but the Pythonic version would actually be `s = 100_000_000 * "x"`.
Re: NvP: s.add('x') 100M times
I don't see how your linked algo explains deltas across gcs if that `3 div 2` growth happens for all of them. The memory `gc:arc` uses here seems more like the sum of all prior allocations, not "up to 1.5x what's needed". Actually, `gc:arc` uses 1.25x more mem than `gc:none` (312MB) in a test I just tried.
Re: NvP: s.add('x') 100M times
I don't know much about Python but it seems strings are immutable. Meaning each time you add it allocates a new string with `len+1`, which explains why memory usage is about 100MB and its slow. In Nim on the other hand strings are mutable. They are [resized](https://github.com/nim-lang/Nim/blob/devel/lib/system/seqs_v2.nim#L103) only when len+1 becames bigger than capacity. And the new cap is follows this [algo](https://github.com/nim-lang/Nim/blob/devel/lib/system/sysstr.nim#L25) Explains the extra space and why it's faster.
Re: NvP: s.add('x') 100M times
Yup. Just what I was seeing, @b3liever. No `main()`-difference to the RSS delta, and a very noticable delta in a non-intuitive direction. So, either our intuitions are wrong in a way which should be clarified or there's a problem which should be fixed. Maybe a github issue?
Re: NvP: s.add('x') 100M times
nim -v Compiled at 2020-06-23 git hash: c3459c7b14 With `nim c -d:danger --panics:on --gc:arc`: Maximum resident set size (kbytes): 395352 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 99524 Voluntary context switches: 1 Involuntary context switches: 20 With `nim c -d:danger --panics:on` (default gc) Maximum resident set size (kbytes): 282964 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 70471 Voluntary context switches: 1 Involuntary context switches: 10 Both tests are with a main function and it makes no difference.
Re: NvP: s.add('x') 100M times
@cumulonimbus \- I tried that. Didn't alter the behavior I was seeing. If this behavior was not always there then my guess is that some arc bug was causing a crash, got fixed, and now the fix causes this. Regardless of whether it was always there or appeared by bug-jello-squishing accident as I theorize, we should probably have a little suite of "memory use regression" tests to prevent stuff like the scenario I described. Such a suite would be a kind of "correctness testing" for deterministic memory management. Could have a "fuzzy/ball park compare". Maybe we have such already, perhaps informally? If so, we should add this `str1` to it. If not, it can be the first test. :-)
Re: NvP: s.add('x') 100M times
Possibly something to do with this being main() and not inside a function? Can't think of a reason why for this one, but many benchmarks change significantly (for the better) when put inside a function
Re: NvP: s.add('x') 100M times
I don't disagree. Might need delving into the generated C to figure out, but I'm guessing my results are not hard to reproduce. If they are let me know how I can best help.
Re: NvP: s.add('x') 100M times
Just did a non-PGO regular `-d:danger` run. Times went up 1.9x but memory usage patterns were the same with `gc:arc` using much more RSS than `gc:boehm` or `gc:markAndSweep`. It's a pretty tiny program.
Re: NvP: s.add('x') 100M times
For this particular benchmark `--gc:boehm` uses the least memory and time for me on nim 28510a9da9bf2a6b02590ba27b64e951a208b23d with gcc-10.1 and PGO but that least is still 2.5x the RSS of python-2.7.18. Not sure why, but yeah it is 35x faster than Python.
Re: NvP: s.add('x') 100M times
Huh? Tracing GCs should never win this. Something strange is going on... :-)
Re: NvP: s.add('x') 100M times
Thanks. I tried that just now: ms:nim jim$ nim c -d:danger --gc:arc str1 Hint: 11937 LOC; 0.390 sec; 12.988MiB peakmem; Dangerous Release build; proj: /Users/jim/nim/str1; out: /Users\ /jim/nim/str1 [SuccessX] ms:nim jim$ /usr/bin/time -l ./str1 10001 0.90 real 0.73 user 0.15 sys 440176640 maximum resident set size 107478 page reclaims 5 page faults 1 voluntary context switches 4 involuntary context switches Run Does this need 1.3x?
NvP: s.add('x') 100M times
This string test uses s.add('x') instead of s = s & x for Nim, and s += 'x' for Python. ms:nim jim$ cat str1.nim var s: string for i in 0..100_000_000: s.add('x') echo len(s) ms:nim jim$ nim c -d:danger str1 Hint: 14210 LOC; 0.275 sec; 15.977MiB peakmem; Dangerous Release build; proj: /Users/jim/nim/str1; out: /Users\ /jim/nim/str1 [SuccessX] ms:nim jim$ /usr/bin/time -l ./str1 10001 0.68 real 0.56 user 0.10 sys 326627328 maximum resident set size 79753 page reclaims 8 page faults 1 voluntary context switches 6 involuntary context switches ms:nim jim$ cat str1.py s = '' for i in xrange(1): s += 'x' print len(s) ms:nim jim$ /usr/bin/time -l py str1.py 1 20.74 real20.67 user 0.06 sys 105099264 maximum resident set size 25834 page reclaims 9 involuntary context switches Run Nim blows Python out of the water on this, though it uses 326M of RAM to create a 100M string. Python's memory use is good, only 105M for a 100M string, but it's slow. For these tests, I'm not so much looking to find the best way to create a 100M string in Nim or Python. I'm comparing the two to find out where there may be large performance differences, hopefully in Nim's favor, and to get a better understanding of how Nim works.
Re: NvP: s.add('x') 100M times
Memory consumption is usually _much_ better with `--gc:arc`.