Re: Nim's strutils.split() slower than Python's string split()?

2020-04-26 Thread krtekz
You are right. That's why outputs are sent to /dev/null in my tests.


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-26 Thread cblake
If you are not redirecting to a file but letting it display to the terminal 
then a big variable is what GUI terminal emulator you are using - XTerm, 
rxvt-unicode, st, etc. I think that could create a lot of varying numbers..why 
it may even depend on if you use TrueType fonts or cheaper to render fonts.

Not sure what the goals are here, but if you really want to figure out what's 
going on at the core Nim IO layer then you want clean separation of sources of 
time spent (like input/split IO and output IO). E.g, forget about locking 
stdout buffers - just don't output at all. And on the output side forget about 
reading input - just generate the data like your generator and output and 
realize that if you are outputting to a terminal then you are benchmarking how 
that terminal handles extreme high update rates.


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-26 Thread adnan
But OP is not going to divert the output into `/dev/null`. In a Linux terminal 
you don't really gain much benefits from avoiding flushing because it will 
flush on a newline regardless.


$ time cat bigfile.txt | ./mine ; cat -n mine.nim
# ...
real0m3.707s
user0m0.662s
sys 0m1.494s
 1  import strutils
 2
 3  proc main() =
 4for ln in stdin.lines:
 5  var i = 0
 6  for f in ln.split('\t'):
 7if i == 4:
 8  echo f
 9  break
10else:
11  inc(i)


Run


$ time cat bigfile.txt | ./krtekz ; cat -n krtekz.nim
# ...
real0m4.328s
user0m0.654s
sys 0m1.656s
 1  import strutils
 2
 3  proc main() =
 4for ln in stdin.lines:
 5  var i = 0
 6  for f in ln.split('\t'):
 7if i == 4:
 8  writeLine(stdout, f)
 9  break
10else:
11  inc(i)
12
13  main()


Run


$ cat -n filegen.nim
 1  import os
 2
 3  var file = open(paramStr(1), fmWrite)
 4
 5  for lineNum in 0 .. 100:
 6  for wordNum in 0 .. 10:
 7  file.write(wordNum, '\t')
 8  file.writeLine("")


Run

* * *

Ok I checked OP's code against the bigfile and it's around the same time in my 
machine (4.8s). All codes are compiled with the `-d:danger` flag. Python code 
is ~(4.1s) in my machine. I can't seem to figure out how do you guys get such 
varying numbers. All the codes provide the same output too.


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-25 Thread krtekz
All my tests were done by redirecting stdout to /dev/null, so the performance 
of terminal doesn't matter.


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-25 Thread adnan
I was under the impression that Linux terminals usually flush on newlines 
regardless. There's a chance that the main performance benefit you are gaining 
is not from splitting the strings and using the split iter instead. I could be 
wrong.


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-25 Thread ksandvik
You could also concatenate all entries to a string variable and then do a 
single writeLine to avoid I/O traffic -- a classic optimization technique.


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-25 Thread cblake
Locking can indeed slowdown output as can exception handling and is unnecessary 
for single-threaded output and the exceptions may be unhelpful if there is 
nothing you can do on out of space errors. If you are on Linux/glibc then you 
can also just use `fwrite_unlocked` directly as in `cligen/osUt.nim:urite` or 
`cligen/mslice.nim:urite` as shown in the `examples/cols.nim` program in 
[https://github.com/c-blake/cligen](https://github.com/c-blake/cligen)


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-25 Thread krtekz
See updated code above


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-25 Thread kaushalmodi
@krtekz Awesome! Also make sure you share your optimized final code here for 
anyone who ends up on this thread in future :)


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-25 Thread Araq
Also try `--gc:arc --panics:on`


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-25 Thread krtekz
Just found out something people haven't mentioned yet (or I missed it?). 
According to 
[https://nim-lang.org/docs/system.html](https://nim-lang.org/docs/system.html) 
`echo` is equivalent to a `writeLine` and a `flushFile`, so each time I call 
`echo` it flushes the output, which is not necessary. After I replaced `echo x` 
in my code with `writeLine(stdout, x)`, it helped quite a bit.

Using using split iterator instead of assigning split() return to a variable 
helps too. 


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-25 Thread adnan
In Rust, sometimes the biggest performance overhead is repeated locking of 
stdout. To avoid it, one would lock the stdout before the loop. I wonder Nim 
requires locking as well.


{
let mut out = File::new("test.out");
let mut buf = BufWriter::new(out);
let mut lock = io::stdout().lock();
writeln!(lock, "{}", header);
for line in lines {
writeln!(lock, "{}", line);
writeln!(buf, "{}", line);
}
writeln!(lock, "{}", footer);
}   // end scope to unlock stdout and flush/close buf


Run


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-24 Thread cblake
This comes up a lot. Here are a couple recent ones: 
[https://forum.nim-lang.org/t/4738](https://forum.nim-lang.org/t/4738) and 
[https://forum.nim-lang.org/t/5103](https://forum.nim-lang.org/t/5103).


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-24 Thread krtekz
Thanks a lot! Total newbie here


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-24 Thread mratsim
Your code is slow for several reasons:

1\. You should wrap your code in a proc, otherwise all variables are globals 
and globals are much harder to optimize, in particular they have the same 
lifetime as your program so you can't reuse their memory

2\. `split` in each loop, you are allocating 1 string for the line, then 4 more 
strings for your split and 1 sequence to hold those strings. Memory allocation 
is accompanied with reseting your strings and sequence to binary zero. This is 
a recipe for slowness. Python is faster because its GC reuses already allocated 
memory.

Unfortunately, following the Python code works for quick scripting but is a 
performance pitfall. If you want a fast csv/tsv parser you can use the tips 
from this blogpost 
[https://nim-lang.org/blog/2017/05/25/faster-command-line-tools-in-nim.html](https://nim-lang.org/blog/2017/05/25/faster-command-line-tools-in-nim.html)

I should note that the issue with `split` is true for all languages with 
"vanilla" memory management (C or C++ as well require the same techniques).

This might change in the future if strutils is rewritten with more in-place 
procedures and the `dup` and `collect` macro for the functional high-level API 
without its cost (see v1.2.0 announcement 
[https://nim-lang.org/blog/2020/04/03/version-120-released.html](https://nim-lang.org/blog/2020/04/03/version-120-released.html))


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-24 Thread krtekz
With `-d:danger`, it improves a bit but still much slower than the Python 
version.

Nim: 2.65s, Python: 1.26s


Re: Nim's strutils.split() slower than Python's string split()?

2020-04-24 Thread juancarlospaco
Compile with `-d:danger`. 


Nim's strutils.split() slower than Python's string split()?

2020-04-24 Thread krtekz
My Nim code: 


import strutils

for ln in stdin.lines:
  echo ln.split('\t')[4]


Run

And the Python code: 


import sys

for ln in sys.stdin:
print(ln.split('\t')[4])


Run

Tested on a TSV file of 1M lines, the Nim version (compiled with -d:release) 
consistently uses more than twice the time required by the Python version.

Is it because of split() or something else?