Re: Faster Command Line Tools in D
On Tuesday, 8 August 2017 at 21:51:30 UTC, Joakim wrote: On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote: Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance. The blog: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ Reddit: https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/ Heh, happened to notice that this blog post now has 21 comments, with people posting links to versions in Go, C++, and Kotlin up till this week, months after the post went up! :D There was also a Haskell version on Reddit.
Re: Faster Command Line Tools in D
On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote: Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance. The blog: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ Reddit: https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/ Heh, happened to notice that this blog post now has 21 comments, with people posting links to versions in Go, C++, and Kotlin up till this week, months after the post went up! :D
Re: Faster Command Line Tools in D
On 5/31/17 1:09 AM, Patrick Schluter wrote: In any case, you can download the dataset from [1] if you like. There are several 100 Mb big zip files containing a collection of tmx files (translation memory exchange) with European Legislation. The files contain multi-alignment texts in up to 24 languages. The files are encoded in UCS-2 little-endian. I know for a fact (because I compiled the data) that they don't contain characters outside of the BMP. The data is public and can be used freely (as in beer). When I get some time, I will try to port the java app that is distributed with it to D (partially done yet). [1]: https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory Thanks, I'll bookmark it for later use. -Steve
Re: Faster Command Line Tools in D
On 5/30/17 5:57 PM, Patrick Schluter wrote: On Tuesday, 30 May 2017 at 21:18:42 UTC, Steven Schveighoffer wrote: On 5/26/17 11:20 AM, John Colvin wrote: On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote: [...] This version also has the advantage of being (discounting any bugs in iopipe) correct for arbitrary unicode in all common UTF encodings. I worked a lot on making sure this works properly. However, it's possible that there are some lingering issues. I also did not spend much time optimizing these paths (whereas I spent a ton of time getting the utf8 line parsing as fast as it could be). Partly because finding things other than utf8 in the wild is rare, and partly because I have nothing to compare it with to know what is possible :) If you want UCS-2 (aka UTF-16 without surrogates) data I can give you gigabytes of files in tmx format. The data I can (and have) generated from UTF-8 data. I have tested my byLine parser to make sure it properly splits on "interesting" code points in all widths. UTF-16 data without surrogates should probably work fine. I haven't tuned it though like I tuned the UTF-8 version. Is there a memchr for wide characters? ;) What I really haven't done is compared my line parsing code with multi-code-unit delimiters against one that can do the same thing. I know Phobos and C FILE * really can't do it. I haven't really looked at all in C++, so I should probably look there before giving up. -Steve
Re: Faster Command Line Tools in D
On Tuesday, 30 May 2017 at 21:18:42 UTC, Steven Schveighoffer wrote: On 5/26/17 11:20 AM, John Colvin wrote: On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote: [...] This version also has the advantage of being (discounting any bugs in iopipe) correct for arbitrary unicode in all common UTF encodings. I worked a lot on making sure this works properly. However, it's possible that there are some lingering issues. I also did not spend much time optimizing these paths (whereas I spent a ton of time getting the utf8 line parsing as fast as it could be). Partly because finding things other than utf8 in the wild is rare, and partly because I have nothing to compare it with to know what is possible :) -Steve If you want UCS-2 (aka UTF-16 without surrogates) data I can give you gigabytes of files in tmx format.
Re: Faster Command Line Tools in D
On 5/26/17 11:20 AM, John Colvin wrote: On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote: I spent some time fiddling with my own manual approaches to making this as fast, wasn't satisfied and so decided to try using Steven's iopipe (https://github.com/schveiguy/iopipe) instead. Results were excellent. https://gist.github.com/John-Colvin/980b11f2b7a7e23faf8dfb44bd9f1242 This version also has the advantage of being (discounting any bugs in iopipe) correct for arbitrary unicode in all common UTF encodings. I worked a lot on making sure this works properly. However, it's possible that there are some lingering issues. I also did not spend much time optimizing these paths (whereas I spent a ton of time getting the utf8 line parsing as fast as it could be). Partly because finding things other than utf8 in the wild is rare, and partly because I have nothing to compare it with to know what is possible :) -Steve
Re: Faster Command Line Tools in D
On 5/26/17 10:41 AM, John Colvin wrote: On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote: Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance. The blog: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ Reddit: https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/ I spent some time fiddling with my own manual approaches to making this as fast, wasn't satisfied and so decided to try using Steven's iopipe (https://github.com/schveiguy/iopipe) instead. Results were excellent. https://gist.github.com/John-Colvin/980b11f2b7a7e23faf8dfb44bd9f1242 nice! hm /** something vaguely like this should be in iopipe, users shouldn't need to write it */ auto ref runWithEncoding(alias process, FileT, Args...)(FileT file, auto ref Args args) stealing for iopipe, thanks :) I'll need to dedicate another slide to you... On my machine: python takes a little over 20s, pypy wobbles around 3.5s, v1 from the blog takes about 3.9s, v4b took 1.45s, a version of my own that is hideous* manages 0.78s at best, the above version with iopipe hits below 0.67s most runs. Not bad for a process that most people would call "IO-bound" (code for "I don't want to have to write fast code & it's all the disk's fault"). Obviously this version is a bit more code than is ideal, iopipe is currently quite "barebones", but I don't see why with some clever abstractions and wrappers it couldn't be the default thing that one does even for small scripts. The idea behind iopipe is to give you the building blocks to create exactly the pipeline you need, without a lot of effort. Once you have those blocks, then you make higher level functions out of it. Like you have above :) BTW, there is a byLineRange function that handles slicing off the newline character inside iopipe.textpipe. -Steve
Re: Faster Command Line Tools in D
On 05/25/2017 08:30 AM, xtreak wrote: There are repeated references over usage of D at Netflix for machine learning. It will be a very helpful boost if someone comes up with any reference or a post regarding how D is used at Netflix and addition of Netflix to https://dlang.org/orgs-using-d.html will be amazing. I've used netflix. If its "suggestion" features are any indication, I'm not sure such a thing would be a feather in D's cap ;)
Re: Faster Command Line Tools in D
On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote: I spent some time fiddling with my own manual approaches to making this as fast, wasn't satisfied and so decided to try using Steven's iopipe (https://github.com/schveiguy/iopipe) instead. Results were excellent. https://gist.github.com/John-Colvin/980b11f2b7a7e23faf8dfb44bd9f1242 This version also has the advantage of being (discounting any bugs in iopipe) correct for arbitrary unicode in all common UTF encodings.
Re: Faster Command Line Tools in D
On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote: Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance. The blog: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ Reddit: https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/ I spent some time fiddling with my own manual approaches to making this as fast, wasn't satisfied and so decided to try using Steven's iopipe (https://github.com/schveiguy/iopipe) instead. Results were excellent. https://gist.github.com/John-Colvin/980b11f2b7a7e23faf8dfb44bd9f1242 On my machine: python takes a little over 20s, pypy wobbles around 3.5s, v1 from the blog takes about 3.9s, v4b took 1.45s, a version of my own that is hideous* manages 0.78s at best, the above version with iopipe hits below 0.67s most runs. Not bad for a process that most people would call "IO-bound" (code for "I don't want to have to write fast code & it's all the disk's fault"). Obviously this version is a bit more code than is ideal, iopipe is currently quite "barebones", but I don't see why with some clever abstractions and wrappers it couldn't be the default thing that one does even for small scripts. *using byChunk and manually managing linesplits over chunks, very nasty.
Re: Faster Command Line Tools in D
On Friday, 26 May 2017 at 06:05:11 UTC, Basile B. wrote: On Thursday, 25 May 2017 at 22:04:36 UTC, Ali Çehreli wrote: On 05/24/2017 06:39 AM, Mike Parker wrote: Reddit: https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/ Inspired Nim version, found on Reddit: https://www.reddit.com/r/programming/comments/6dct6e/faster_command_line_tools_in_nim/ Ali Wow, the D blog post opened Pandora's box. I guess programmers will do comparisons of language speed independent of whether it makes sense for that problem.
Re: Faster Command Line Tools in D
On Thursday, 25 May 2017 at 22:04:36 UTC, Ali Çehreli wrote: On 05/24/2017 06:39 AM, Mike Parker wrote: Reddit: https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/ Inspired Nim version, found on Reddit: https://www.reddit.com/r/programming/comments/6dct6e/faster_command_line_tools_in_nim/ Ali Wow, the D blog post opened Pandora's box.
Re: Faster Command Line Tools in D
On 05/24/2017 06:39 AM, Mike Parker wrote: Reddit: https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/ Inspired Nim version, found on Reddit: https://www.reddit.com/r/programming/comments/6dct6e/faster_command_line_tools_in_nim/ Ali
Re: Faster Command Line Tools in D
On Thursday, May 25, 2017 14:17:27 Suliman via Digitalmars-d-announce wrote: > > std.string, std.array, and std.algorithm all have > > cross-polination when it comes to array operations. It has to > > do with the history of when the modules were introduced. > > Is there any plan to deprecate all splitters and make one single. > Because now as I understand we have 4 functions that make same > task. I wouldn't expect any of the split-related functions to be going away. We often have a function that operates on arrays or strings and another which operates on more general ranges. It may mainly be for historical reasons, but removing the array-based functions would break existing code, and we'd get a whole other set of complaints about folks not understanding that you need to slap array() on the end of a call to splitter to get the split that they were looking for (especially if they're coming from another language and don't understand ranges yet). And ultimately, the array-based functions continue to serve as a way to have simpler code when you don't care about (or you actually need) the additional memory allocations. Also, splitLines/lineSplitter can't actually be written in terms of split/splitter, because split/splitter does not have a way to provide multiple delimeters (let alone multiple delimeters where one includes the other, which is what you get with "\n" and "\r\n"). So, that distinction isn't going away. It's also a common enough operation that having a function for it rather than having to pass all of the delimeters to a more general function is arguably worth it, just like having the overload of split/splitter which takes no delimiter and then splits on whitespace is arguably worth it over having a more general function where you have to feed it every variation of whitespace. - Jonathan M Davis
Re: Faster Command Line Tools in D
On Thursday, May 25, 2017 08:46:17 Steven Schveighoffer via Digitalmars-d- announce wrote: > std.string, std.array, and std.algorithm all have cross-polination when > it comes to array operations. It has to do with the history of when the > modules were introduced. Not only that, but over time, there has been a push to generalize functions. So, something that might have originally gotten put in std.string (because you'd normally think of it as a string function) got moved to std.array, because it could easily be generalized to work on arrays in general and not just string operations (I believe that split is an example of this). And something which was in std.array or std.string might have been generalized for ranges in general, in which case, we ended up with a new function in std.algorithm (hence, we have splitter in std.algorithm but split in std.array). The end result tends to make sense if you understand that functions that only operate on strings go in std.string, functions that operate on dynamic arrays in general (but not ranges) go in std.array, and functions which could have gone in std.string or std.array except that they operate on ranges in general go in std.algorithm. But if you don't understand that, it tends to be quite confusing, and even if you do, it's often the case that when you want to find a function to operate on a string, you're going to need to look in std.string, std.array, and std.algorithm. So, in part, it's an evolution thing, and in part, it's often just plain hard to find stuff when you're focused on a specific use case, and the library writer is focused on making the function that you need as general as possible. - Jonathan M Davis
Re: Faster Command Line Tools in D
std.string, std.array, and std.algorithm all have cross-polination when it comes to array operations. It has to do with the history of when the modules were introduced. Is there any plan to deprecate all splitters and make one single. Because now as I understand we have 4 functions that make same task.
Re: Faster Command Line Tools in D
On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote: Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance. The blog: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ Reddit: https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/ There are repeated references over usage of D at Netflix for machine learning. It will be a very helpful boost if someone comes up with any reference or a post regarding how D is used at Netflix and addition of Netflix to https://dlang.org/orgs-using-d.html will be amazing. References : https://news.ycombinator.com/item?id=14064012 https://news.ycombinator.com/item?id=14413546
Re: Faster Command Line Tools in D
On Thursday, 25 May 2017 at 06:22:28 UTC, Jon Degenhardt wrote: Thanks Walter, I appreciate your comments. And correct, as multiple people noted, a speed comparison with other languages not at all a goal of the article. The real intent was to tell a story of how several of D's features play together to enable optimizations like this, without having to write low-level code or step outside the core language features and standard library. Maybe as a more casual observer the article did feel more like Python vs D. I have not yet read the ycombinator comments, just from my personal observation after reading the article. My thinking was: - Python its PyPy is surprising fast. - Surprised that D was slower in version 1. - Kind of surprised again that it took so many versions to figure out the best approach. - Also wondering why one needed std.algorithm splitter, when you expect string split to be the fasted. Even the fact that you need to import std.array to split a string simply felt strange. - So much effort for relative little gain ( after v2 splitter ). The time spend on finding a faster solution is in business sense not worth it. But not finding a faster way is simply wasting performance, just on this simple function. - Started to wonder if Python its PyPy is so optimized that without any effort, your even faster then D. What other D idiomatic functions are slow? I am not criticizing your article Jon, just mentioning how i felt when reading it yesterday. It felt like the solution was overly complex to find and required too much deep D knowledge. Going to read the ycombinator comments now. Off-topic: Yesterday i was struggling with split but for a whole different reason. Take in account that i am new at D. Needed to split a string. Simple right? Search Google for "split string dlang". Get on the https://dlang.org/phobos/std_string.html page. After seeing the splitLines and start experimenting with it. Half a hour later i realize that the wrong function was used and needed to import std.array split function. Call it a issue with the documentation or my own stupidity. But the fact that Split was only listed as a imported function, in this mass of text, totally send me on the wrong direction. As stated above, i expected split to be part of the std.string, because i am manipulating a string, not that i needed to import std.array what is the end result. I simply find the documentation confusing with the wall of text. When i search for string split, you expect to arrive on the string.split page. Not only that, the split example are using split as a separate keyword, when i was looking for variable.split(). Veteran D programmers are probably going to laughing at me for this but one does feel a bit salty after that.
Re: Faster Command Line Tools in D
On Thursday, 25 May 2017 at 05:17:29 UTC, Walter Bright wrote: Any time one writes an article comparing speed between languages X and Y, someone gets their ox gored and will bitterly complain about how unfair the article is (though I noticed that none of the complainers wrote a faster Python version). Even if you tried to optimize the Python program, you'll be inevitably accused of deliberately not doing it right. The nadir of this for me was when I compared Digital Mars C++ code with DMD. Both share the same optimizer and back end, yet I was accused of "sabotaging" my own C++ compiler in order to make D look better !! Me, I just don't do public comparison benchmarking anymore. It's a waste of time arguing with people about it. I thought you wrote a fine article, and the criticism about the Python code was unwarranted (especially since nobody suggested better code), because the article was about optimizing D code, not optimizing Python. Thanks Walter, I appreciate your comments. And correct, as multiple people noted, a speed comparison with other languages not at all a goal of the article. The real intent was to tell a story of how several of D's features play together to enable optimizations like this, without having to write low-level code or step outside the core language features and standard library. --Jon
Re: Faster Command Line Tools in D
On Wednesday, 24 May 2017 at 21:46:10 UTC, cym13 wrote: I am disappointed because there are so many good things to say about this, so many good questions or remarks to make when not familiar with the language, and yet all we get is "Meh, this benchmark shows nothing of D's speed against Python". Wouldn't be the first time https://news.ycombinator.com/item?id=10828450
Re: Faster Command Line Tools in D
On 5/24/2017 3:56 PM, Jon Degenhardt wrote: Its not easy writing an article that doesn't draw some form of criticism. FWIW, the reason I gave a Python example is because it is very commonly used for this type of problem and the language is well suited to it. A second reason is that I've seen several posts where someone has tried to rewrite a Python program like this in D, start with `split`, and wonder how to make it faster. My hope is that this will clarify how to achieve this. Another goal of the article was to describe how performance in the TSV Utilities had been achieved. The article is not about the TSV Utilities, but discussing both the benchmark results and how they had been achieved would be a very long article. Any time one writes an article comparing speed between languages X and Y, someone gets their ox gored and will bitterly complain about how unfair the article is (though I noticed that none of the complainers wrote a faster Python version). Even if you tried to optimize the Python program, you'll be inevitably accused of deliberately not doing it right. The nadir of this for me was when I compared Digital Mars C++ code with DMD. Both share the same optimizer and back end, yet I was accused of "sabotaging" my own C++ compiler in order to make D look better !! Me, I just don't do public comparison benchmarking anymore. It's a waste of time arguing with people about it. I thought you wrote a fine article, and the criticism about the Python code was unwarranted (especially since nobody suggested better code), because the article was about optimizing D code, not optimizing Python.
Re: Faster Command Line Tools in D
On Wednesday, 24 May 2017 at 21:46:10 UTC, cym13 wrote: On Wednesday, 24 May 2017 at 21:34:08 UTC, Walter Bright wrote: It's now #4 on the front page of Hacker News: https://news.ycombinator.com/news The comments on HN are useless though, everybody went for the "D versus Python" thing and seem to complain that it's doing a D/Python benchmark while only talking about D optimization...even though optimizing D is the whole point of the article. In the same way they rant against the fact that many iterations on the D script are shown while it is obviously to give different tricks while being clear on what trick gives what. I am disappointed because there are so many good things to say about this, so many good questions or remarks to make when not familiar with the language, and yet all we get is "Meh, this benchmark shows nothing of D's speed against Python". Its not easy writing an article that doesn't draw some form of criticism. FWIW, the reason I gave a Python example is because it is very commonly used for this type of problem and the language is well suited to it. A second reason is that I've seen several posts where someone has tried to rewrite a Python program like this in D, start with `split`, and wonder how to make it faster. My hope is that this will clarify how to achieve this. Another goal of the article was to describe how performance in the TSV Utilities had been achieved. The article is not about the TSV Utilities, but discussing both the benchmark results and how they had been achieved would be a very long article. --Jon
Re: Faster Command Line Tools in D
On Wednesday, 24 May 2017 at 21:34:08 UTC, Walter Bright wrote: It's now #4 on the front page of Hacker News: https://news.ycombinator.com/news The comments on HN are useless though, everybody went for the "D versus Python" thing and seem to complain that it's doing a D/Python benchmark while only talking about D optimization...even though optimizing D is the whole point of the article. In the same way they rant against the fact that many iterations on the D script are shown while it is obviously to give different tricks while being clear on what trick gives what. I am disappointed because there are so many good things to say about this, so many good questions or remarks to make when not familiar with the language, and yet all we get is "Meh, this benchmark shows nothing of D's speed against Python".
Re: Faster Command Line Tools in D
It's now #4 on the front page of Hacker News: https://news.ycombinator.com/news
Re: Faster Command Line Tools in D
On Wednesday, 24 May 2017 at 17:36:29 UTC, cym13 wrote: On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote: [...snip...] A bit off topic but I really like that we still get quality content such as this post on this blog. Sustained quality is hard job and I thank everyone involved for that. The complement to the community is well deserved, thank you for including this post in the company. In this case, the post benefited from some really excellent review feedback and Mike made the publication side really easy. --Jon
Re: Faster Command Line Tools in D
On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote: Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance. The blog: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ Reddit: https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/ A bit off topic but I really like that we still get quality content such as this post on this blog. Sustained quality is hard job and I thank everyone involved for that.
Faster Command Line Tools in D
Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance. The blog: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ Reddit: https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/