Re: beat c in fefes "competition"

2019-03-25 Thread Araq
Reusing the string used up for the word inside `for word in line.split():` is rather easy to do and can be a stdlib patch. It's not hard and Nim's strings are awesome because allocations are expensive and mutability helps for re-using memory aggressively.

Re: beat c in fefes "competition"

2019-03-25 Thread cblake
I wrote: "most lookups are failed lookups", but Oops! 291e3 unique/4.8e6 total = 6% ==> 94% of lookups are successful. So, the `memcmp` that happens only after hash codes match does happen almost all the time, and so my particular data set is more like 16B/(32B+9B) = 39% cached, not 50%. This do

Re: beat c in fefes "competition"

2019-03-25 Thread cblake
Oh, and two more stats about my input data important to reason about my timings - 43 MB and 4.8e6 total words total (coincidentally close to 1e-3 times my 1/4.8GHz clock cycle). So, average string length around 43e6/4.8e6 =~ 9 bytes (also a web server log), and about 150e-3/4.8e6 =~ 31 nsec =~

Re: beat c in fefes "competition"

2019-03-25 Thread cblake
Just a quick follow-up, with gcc-8.3 on Linux x86_64 Skylake CPU, profile-guided optimization did get that 176ms time down to 150 ms (With -d:useNimHash it was 152 ms), a 1.17x boost to 1.43x faster than the C in @enthus1ast 's `wp.c`. Of course, with 291,000 unique keys the average external ch

Re: beat c in fefes "competition"

2019-03-25 Thread cblake
As @Stefan mentioned the string stuff can be slow. My `MemSlice` techniques might be more broadly interesting. The tokenizer below only works for "strict" one-character delimiting, not, e.g., repeated whitespace. Also, I agree completely with @arnetheduck that algorithm choices matter more than

Re: beat c in fefes "competition"

2019-03-25 Thread mratsim
Optimising strings is a pain :P. I don't have enough experience in that yet, but I plan to acquire it as I would need to do that for text analysis/natural language processing anyway.

Re: beat c in fefes "competition"

2019-03-24 Thread arnetheduck
Given that Nim compiles to C, then a C compiler compiles the resulting code, I always find these faster-than-C benchmark claims slightly amusing ;) I'd venture ahead and say that it's mostly the skill of the developer and not so much the language that determines performance, by and large - that

Re: beat c in fefes "competition"

2019-03-24 Thread Stefan_Salewski
It seems to be obvious that in your approach the problem is, that split iterator allocates a new string for each call. There may exist other, faster but uglier solutions. Basically reusing the same string. Was that available in parseutils? I can not remember, but I guess faster examples may be a

Re: beat c in fefes "competition"

2019-03-24 Thread miran
> So can one beat the c version? Long time ago I concluded that when it comes to optimizations, I can optimize all day and night, but then at some point @mratsim will come and will improve it by 50+% :D So if you want to beat C, basically wait for @mratsim to show up and do his magic :)

beat c in fefes "competition"

2019-03-24 Thread enthus1ast
Hi, fefe (a german blogger, [http://blog.fefe.de](http://blog.fefe.de)/ ) has called for a little benchmark. The challenge is to count words in an webserver logfile (from stdin), then output them ordered like so: 150 foobaa 80 faa 12 www Run Blog entries: