Re: Nim's strutils.split() slower than Python's string split()?
You are right. That's why outputs are sent to /dev/null in my tests.
Re: Nim's strutils.split() slower than Python's string split()?
If you are not redirecting to a file but letting it display to the terminal then a big variable is what GUI terminal emulator you are using - XTerm, rxvt-unicode, st, etc. I think that could create a lot of varying numbers..why it may even depend on if you use TrueType fonts or cheaper to render fonts. Not sure what the goals are here, but if you really want to figure out what's going on at the core Nim IO layer then you want clean separation of sources of time spent (like input/split IO and output IO). E.g, forget about locking stdout buffers - just don't output at all. And on the output side forget about reading input - just generate the data like your generator and output and realize that if you are outputting to a terminal then you are benchmarking how that terminal handles extreme high update rates.
Re: Nim's strutils.split() slower than Python's string split()?
But OP is not going to divert the output into `/dev/null`. In a Linux terminal you don't really gain much benefits from avoiding flushing because it will flush on a newline regardless. $ time cat bigfile.txt | ./mine ; cat -n mine.nim # ... real0m3.707s user0m0.662s sys 0m1.494s 1 import strutils 2 3 proc main() = 4for ln in stdin.lines: 5 var i = 0 6 for f in ln.split('\t'): 7if i == 4: 8 echo f 9 break 10else: 11 inc(i) Run $ time cat bigfile.txt | ./krtekz ; cat -n krtekz.nim # ... real0m4.328s user0m0.654s sys 0m1.656s 1 import strutils 2 3 proc main() = 4for ln in stdin.lines: 5 var i = 0 6 for f in ln.split('\t'): 7if i == 4: 8 writeLine(stdout, f) 9 break 10else: 11 inc(i) 12 13 main() Run $ cat -n filegen.nim 1 import os 2 3 var file = open(paramStr(1), fmWrite) 4 5 for lineNum in 0 .. 100: 6 for wordNum in 0 .. 10: 7 file.write(wordNum, '\t') 8 file.writeLine("") Run * * * Ok I checked OP's code against the bigfile and it's around the same time in my machine (4.8s). All codes are compiled with the `-d:danger` flag. Python code is ~(4.1s) in my machine. I can't seem to figure out how do you guys get such varying numbers. All the codes provide the same output too.
Re: Nim's strutils.split() slower than Python's string split()?
All my tests were done by redirecting stdout to /dev/null, so the performance of terminal doesn't matter.
Re: Nim's strutils.split() slower than Python's string split()?
I was under the impression that Linux terminals usually flush on newlines regardless. There's a chance that the main performance benefit you are gaining is not from splitting the strings and using the split iter instead. I could be wrong.
Re: Nim's strutils.split() slower than Python's string split()?
You could also concatenate all entries to a string variable and then do a single writeLine to avoid I/O traffic -- a classic optimization technique.
Re: Nim's strutils.split() slower than Python's string split()?
Locking can indeed slowdown output as can exception handling and is unnecessary for single-threaded output and the exceptions may be unhelpful if there is nothing you can do on out of space errors. If you are on Linux/glibc then you can also just use `fwrite_unlocked` directly as in `cligen/osUt.nim:urite` or `cligen/mslice.nim:urite` as shown in the `examples/cols.nim` program in [https://github.com/c-blake/cligen](https://github.com/c-blake/cligen)
Re: Nim's strutils.split() slower than Python's string split()?
See updated code above
Re: Nim's strutils.split() slower than Python's string split()?
@krtekz Awesome! Also make sure you share your optimized final code here for anyone who ends up on this thread in future :)
Re: Nim's strutils.split() slower than Python's string split()?
Also try `--gc:arc --panics:on`
Re: Nim's strutils.split() slower than Python's string split()?
Just found out something people haven't mentioned yet (or I missed it?). According to [https://nim-lang.org/docs/system.html](https://nim-lang.org/docs/system.html) `echo` is equivalent to a `writeLine` and a `flushFile`, so each time I call `echo` it flushes the output, which is not necessary. After I replaced `echo x` in my code with `writeLine(stdout, x)`, it helped quite a bit. Using using split iterator instead of assigning split() return to a variable helps too.
Re: Nim's strutils.split() slower than Python's string split()?
In Rust, sometimes the biggest performance overhead is repeated locking of stdout. To avoid it, one would lock the stdout before the loop. I wonder Nim requires locking as well. { let mut out = File::new("test.out"); let mut buf = BufWriter::new(out); let mut lock = io::stdout().lock(); writeln!(lock, "{}", header); for line in lines { writeln!(lock, "{}", line); writeln!(buf, "{}", line); } writeln!(lock, "{}", footer); } // end scope to unlock stdout and flush/close buf Run
Re: Nim's strutils.split() slower than Python's string split()?
This comes up a lot. Here are a couple recent ones: [https://forum.nim-lang.org/t/4738](https://forum.nim-lang.org/t/4738) and [https://forum.nim-lang.org/t/5103](https://forum.nim-lang.org/t/5103).
Re: Nim's strutils.split() slower than Python's string split()?
Thanks a lot! Total newbie here
Re: Nim's strutils.split() slower than Python's string split()?
Your code is slow for several reasons: 1\. You should wrap your code in a proc, otherwise all variables are globals and globals are much harder to optimize, in particular they have the same lifetime as your program so you can't reuse their memory 2\. `split` in each loop, you are allocating 1 string for the line, then 4 more strings for your split and 1 sequence to hold those strings. Memory allocation is accompanied with reseting your strings and sequence to binary zero. This is a recipe for slowness. Python is faster because its GC reuses already allocated memory. Unfortunately, following the Python code works for quick scripting but is a performance pitfall. If you want a fast csv/tsv parser you can use the tips from this blogpost [https://nim-lang.org/blog/2017/05/25/faster-command-line-tools-in-nim.html](https://nim-lang.org/blog/2017/05/25/faster-command-line-tools-in-nim.html) I should note that the issue with `split` is true for all languages with "vanilla" memory management (C or C++ as well require the same techniques). This might change in the future if strutils is rewritten with more in-place procedures and the `dup` and `collect` macro for the functional high-level API without its cost (see v1.2.0 announcement [https://nim-lang.org/blog/2020/04/03/version-120-released.html](https://nim-lang.org/blog/2020/04/03/version-120-released.html))
Re: Nim's strutils.split() slower than Python's string split()?
With `-d:danger`, it improves a bit but still much slower than the Python version. Nim: 2.65s, Python: 1.26s
Re: Nim's strutils.split() slower than Python's string split()?
Compile with `-d:danger`.
Nim's strutils.split() slower than Python's string split()?
My Nim code: import strutils for ln in stdin.lines: echo ln.split('\t')[4] Run And the Python code: import sys for ln in sys.stdin: print(ln.split('\t')[4]) Run Tested on a TSV file of 1M lines, the Nim version (compiled with -d:release) consistently uses more than twice the time required by the Python version. Is it because of split() or something else?