Re: to compose or hack?
On Wednesday, 7 July 2021 at 01:44:20 UTC, Steven Schveighoffer wrote: This is pretty minimal, but does what I want it to do. Is it ready for inclusion in Phobos? Not by a longshot! A truly generic interleave would properly forward everything else that the range supports (like `length`, `save`, etc). But it got me thinking, how often do people roll their own vs. trying to compose using existing Phobos nuggets? I found this pretty satisfying, even if I didn't test it to death and maybe I use it only in one place. Do you find it difficult to use Phobos in a lot of situations to compose your specialized ranges? I try to compose using existing Phobos facilities, but don't hesitate to write my own ranges. The reasons are usually along the lines you describe. For one, range creation is easy in D, consistent with the pro/con tradeoffs described in the thread/talk [Iterator and Ranges: Comparing C++ to D to Rust](https://forum.dlang.org/thread/diexjstekiyzgxlic...@forum.dlang.org). Another is that if application/task specific logic is involved, it is often simpler/faster to just incorporate it into the range rather than figure out how to factor it out of the more general range. Especially if the range is not going to be used much. --Jon
Re: Need for speed
On Thursday, 1 April 2021 at 19:55:05 UTC, H. S. Teoh wrote: On Thu, Apr 01, 2021 at 07:25:53PM +, matheus via Digitalmars-d-learn wrote: [...] Since this is a "Learn" part of the Foruam, be careful with "-boundscheck=off". I mean for this little snippet is OK, but for a other projects this my be wrong, and as it says here: https://dlang.org/dmd-windows.html#switch-boundscheck "This option should be used with caution and as a last resort to improve performance. Confirm turning off @safe bounds checks is worthwhile by benchmarking." [...] It's interesting that whenever a question about D's performance pops up in the forums, people tend to reach for optimization flags. I wouldn't say it doesn't help; but I've found that significant performance improvements can usually be obtained by examining the code first, and catching common newbie mistakes. Those usually account for the majority of the observed performance degradation. Only after the code has been cleaned up and obvious mistakes fixed, is it worth reaching for optimization flags, IMO. This is my experience as well, and not just for D. Pick good algorithms and pay attention to memory allocation. Don't go crazy on the latter. Many people try to avoid GC at all costs, but I don't usually find it necessary to go quite that far. Very often simply reusing already allocated memory does the trick. The blog post I wrote a few years ago focuses on these ideas: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ --Jon
Re: Silicon Valley D Meetup - March 18, 2021 - "Templates in the D Programming Language" by Ali Çehreli
On Friday, 19 March 2021 at 17:10:27 UTC, Ali Çehreli wrote: Jon mentioned how PR 7678 reduced the performance of std.regex.matchOnce. After analyzing the code we realized that the performance loss must be due to two delegate context allocations: https://github.com/dlang/phobos/pull/7678/files#diff-269abc020de3a951eaaa5b8eca5a0700ba8b298767c7a64f459e74e1531a80aeR825 One delegate is 'matchOnceImp' and the other one is the anonymous delegate created on the return expression. We understood that 'matchOnceImp' could not be a nested function because of an otherwise useful rule: the name of the nested function alone would *call* that function instead of being a symbol for it. That is not the case for a local delegate variable, so that's why 'matchOnceImp' exists as a delegate variable there. Then there is the addition of the 'pure' attribute to it. Fine... After tinkering with the code, we realized that the same effect can be achieved with a static member function of a static struct, which would not allocate any delegate context. I add @nogc to the following code to prove that point. The following code is even simpler than Jon and I came up with yesterday. [... Code snippet removed ...] There: we injected @trusted code inside a @nogc @safe function. Question to others: Did we understand the reason for the convoluted code in that PR fully? Is the above method really a better solution? I submitted PR 7902 (https://github.com/dlang/phobos/pull/7902) to address this. I wasn't able to use the version Ali showed in the post, but the PR does use what is essentially the same idea identified at the D Meetup. It is a performance regression, and is a bit more nuanced than would be ideal. Comments and review would be appreciated. --Jon
Re: Trying to reduce memory usage
On Tuesday, 23 February 2021 at 00:08:40 UTC, tsbockman wrote: On Friday, 19 February 2021 at 00:13:19 UTC, Jon Degenhardt wrote: It would be interesting to see how the performance compares to tsv-uniq (https://github.com/eBay/tsv-utils/tree/master/tsv-uniq). The prebuilt binaries turn on all the optimizations (https://github.com/eBay/tsv-utils/releases). My program (called line-dedup below) is modestly faster than yours, with the gap gradually widening as files get bigger. Similarly, when not using a memory-mapped scratch file, my program is modestly less memory hungry than yours, with the gap gradually widening as files get bigger. In neither case is the difference very exciting though; the real benefit of my algorithm is that it can process files too large for physical memory. It might also handle frequent hash collisions better, and could be upgraded to handle huge numbers of very short lines efficiently. Thanks for running the comparison! I appreciate seeing how other implementations compare. I'd characterize the results a differently though. Based on the numbers, line-dedup is materially faster than tsv-uniq, at least on the tests run. To your point, it may not make much practical difference on data sets that fit in memory. tsv-uniq is fast enough for most needs. But it's still a material performance delta. Nice job! I agree also that the bigger pragmatic benefit is fast processing of files much larger than will fit in memory. There are other useful problems like this. One I often need is creating a random weighted ordering. Easy to do for data sets that fit in memory, but hard to do fast for data sets that do not. --Jon
Re: Trying to reduce memory usage
On Wednesday, 17 February 2021 at 04:10:24 UTC, tsbockman wrote: I spent some time experimenting with this problem, and here is the best solution I found, assuming that perfect de-duplication is required. (I'll put the code up on GitHub / dub if anyone wants to have a look.) It would be interesting to see how the performance compares to tsv-uniq (https://github.com/eBay/tsv-utils/tree/master/tsv-uniq). The prebuilt binaries turn on all the optimizations (https://github.com/eBay/tsv-utils/releases). tsv-uniq wasn't included in the different comparative benchmarks I published, but I did run my own benchmarks and it holds up well. However, it should not be hard to beat it. What might be more interesting is what the delta is. tsv-uniq is using the most straightforward approach of popping things into an associate array. No custom data structures. Enough memory is required to hold all the unique keys in memory, so it won't handle arbitrarily large data sets. It would be interesting to see how the straightforward approach compares with the more highly tuned approach. --Jon
Re: Article: Why I use the D programming language for scripting
On Sunday, 31 January 2021 at 20:36:43 UTC, aberba wrote: It's finally out! https://opensource.com/article/21/1/d-scripting Very nice! Clearly I'm not taking enough advantage of scripting capabilities! --Jon
Re: std.algorithm.splitter on a string not always bidirectional
On Friday, 22 January 2021 at 17:29:08 UTC, Steven Schveighoffer wrote: On 1/22/21 11:57 AM, Jon Degenhardt wrote: I think the idea is that if a construct like 'xyz.splitter(args)' produces a range with the sequence of elements {"a", "bc", "def"}, then 'xyz.splitter(args).back' should produce "def". But, if finding the split points starting from the back results in something like {"f", "de", "abc"} then that relationship hasn't held, and the results are unexpected. But that is possible with all 3 splitter variants. Why is one allowed to be bidirectional and the others are not? I'm not defending it, just explaining what I believe the thinking was based on the examination I did. It wasn't just looking at the code, there was a discussion somewhere. A forum discussion, PR discussion, bug or code comments. Something somewhere, but I don't remember exactly. However, to answer your question - The relationship described is guaranteed if the basis for the split is a single element. If the range is a string, that's a single 'char'. If the range is composed of integers, then a single integer. Note that if the basis for the split is itself a range, then the relationship described is not guaranteed. Personally, I can see a good argument that bidirectionality should not be supported in any of these cases, and instead force the user to choose between eager splitting or reversing the range via retro. For the common case of strings, the further argument could be made that the distinction between char and dchar is another point of inconsistency. Regardless whether the choices made were the best choices, there was some thinking that went into it, and it is worth understanding the thinking when considering changes. --Jon
Re: std.algorithm.splitter on a string not always bidirectional
On Friday, 22 January 2021 at 14:14:50 UTC, Steven Schveighoffer wrote: On 1/22/21 12:55 AM, Jon Degenhardt wrote: On Friday, 22 January 2021 at 05:51:38 UTC, Jon Degenhardt wrote: On Thursday, 21 January 2021 at 22:43:37 UTC, Steven Schveighoffer wrote: auto sp1 = "a|b|c".splitter('|'); writeln(sp1.back); // ok auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v)); writeln(sp2.back); // error, not bidirectional Why? is it an oversight, or is there a good reason for it? I believe the reason is two-fold. First, splitter is lazy. Second, the range splitting is defined in the forward direction, not the reverse direction. A bidirectional range is only supported if it is guaranteed that the splits will occur at the same points in the range when run in either direction. That's why the single element delimiter is supported. Its clearly the case for the predicate function in your example. If that's known to be always true then perhaps it would make sense to enhance splitter to generate bidirectional results in this case. Note that the predicate might use a random number generator to pick the split points. Even for same sequence of random numbers, the split points would be different if run from the front than if run from the back. I think this isn't a good explanation. All forms of splitter accept a predicate (including the one which supports a bi-directional result). Many other phobos algorithms that accept a predicate provide bidirectional support. The splitter result is also a forward range (which makes no sense in the context of random splits). Finally, I'd suggest that even if you split based on a subrange that is also bidirectional, it doesn't make sense that you couldn't split backwards based on that. Common sense says a range split on substrings is the same whether you split it forwards or backwards. I can do this too (and in fact I will, because it works, even though it's horrifically ugly): auto sp3 = "a.b|c".splitter!((c, unused) => !isAlphaNum(c))('?'); writeln(sp3.back); // ok Looking at the code, it looks like the first form of spltter uses a different result struct than the other two (which have a common implementation). It just needs cleanup. -Steve I think the idea is that if a construct like 'xyz.splitter(args)' produces a range with the sequence of elements {"a", "bc", "def"}, then 'xyz.splitter(args).back' should produce "def". But, if finding the split points starting from the back results in something like {"f", "de", "abc"} then that relationship hasn't held, and the results are unexpected. Note that in the above example, 'xyz.retro.splitter(args)' might produce {"f", "ed", "cba"}, so again not the same. Another way to look at it: If split (eager) took a predicate, that 'xyz.splitter(args).back' and 'xyz.split(args).back' should produce the same result. But they will not with the example given. I believe these consistency issues are the reason why the bidirectional support is limited. Note: I didn't design any of this, but I did redo the examples in the documentation at one point, which is why I looked at this. --Jon
Re: std.algorithm.splitter on a string not always bidirectional
On Friday, 22 January 2021 at 05:51:38 UTC, Jon Degenhardt wrote: On Thursday, 21 January 2021 at 22:43:37 UTC, Steven Schveighoffer wrote: auto sp1 = "a|b|c".splitter('|'); writeln(sp1.back); // ok auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v)); writeln(sp2.back); // error, not bidirectional Why? is it an oversight, or is there a good reason for it? -Steve I believe the reason is two-fold. First, splitter is lazy. Second, the range splitting is defined in the forward direction, not the reverse direction. A bidirectional range is only supported if it is guaranteed that the splits will occur at the same points in the range when run in either direction. That's why the single element delimiter is supported. Its clearly the case for the predicate function in your example. If that's known to be always true then perhaps it would make sense to enhance splitter to generate bidirectional results in this case. --Jon Note that the predicate might use a random number generator to pick the split points. Even for same sequence of random numbers, the split points would be different if run from the front than if run from the back.
Re: std.algorithm.splitter on a string not always bidirectional
On Thursday, 21 January 2021 at 22:43:37 UTC, Steven Schveighoffer wrote: auto sp1 = "a|b|c".splitter('|'); writeln(sp1.back); // ok auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v)); writeln(sp2.back); // error, not bidirectional Why? is it an oversight, or is there a good reason for it? -Steve I believe the reason is two-fold. First, splitter is lazy. Second, the range splitting is defined in the forward direction, not the reverse direction. A bidirectional range is only supported if it is guaranteed that the splits will occur at the same points in the range when run in either direction. That's why the single element delimiter is supported. Its clearly the case for the predicate function in your example. If that's known to be always true then perhaps it would make sense to enhance splitter to generate bidirectional results in this case. --Jon
Re: Github Actions now support D out of the box!
On Friday, 21 August 2020 at 02:03:40 UTC, Mathias LANG wrote: Hi everyone, Almost a year ago, Ernesto Castelloti (@ErnyTech) submitted a PR for Github's "starter-workflow" to add support for D out of the box (https://github.com/actions/starter-workflows/pull/74). It was in a grey area for a while, as Github was trying to come up with a policy for external actions. I ended up picking up the project, after working with actions extensively for my own projects and the dlang org, and my PR was finally merged yesterday (https://github.com/actions/starter-workflows/pull/546). A thank you to everyone who helped put this together. I just started using it, and it works quite well. It's a very valuable tool to have! --Jon
Re: Why is BOM required to use unicode in tokens?
On Tuesday, 15 September 2020 at 14:59:03 UTC, Steven Schveighoffer wrote: On 9/15/20 10:18 AM, James Blachly wrote: What will it take (i.e. order of difficulty) to get this fixed -- will merely a bug report (and PR, not sure if I can tackle or not) do it, or will this require more in-depth discussion with compiler maintainers? I'm thinking your issue will not be fixed (just like we don't allow $abc to be an identifier). But the spec can be fixed to refer to the correct standards. Looks like it has to do with the '∂' character. But non-ascii alphabetic characters work generally. # The 'Ш' and 'ä' characters are fine. $ echo $'import std.stdio; void Шä() { writeln("Hello World!"); } void main() { Шä(); }' | dmd -run - Hello World! # But not '∂' $ echo $'import std.stdio; void x∂() { writeln("Hello World!"); } void main() { x∂(); }' | dmd -run - __stdin.d(1): Error: char 0x2202 not allowed in identifier __stdin.d(1): Error: character 0x2202 is not a valid token __stdin.d(1): Error: char 0x2202 not allowed in identifier __stdin.d(1): Error: character 0x2202 is not a valid token However, 'Ш' and 'ä' satisfy the definition of a Unicode letter, '∂' does not. (Using D's current Unicode definitions). I'll use tsv-filter (from tsv-utils) to show this rather than writing out the full D code. But, this uses std.regex.matchFirst(). # The input $ echo $'x\n∂\nШ\nä' x ∂ Ш ä The input filtered by Unicode letter '\p{L}' $ echo $'x\n∂\nШ\nä' | tsv-filter --regex 1:'^\p{L}$' x Ш ä The spec can be made more clear and correct. But if a "universal alpha" is essentially about Unicode letters you might be looking for a change in the spec to use the symbol chosen. --Jon
Re: Why is BOM required to use unicode in tokens?
On Tuesday, 15 September 2020 at 02:23:31 UTC, Paul Backus wrote: On Tuesday, 15 September 2020 at 01:49:13 UTC, James Blachly wrote: I wish to write a function including ∂x and ∂y (these are trivial to type with appropriate keyboard shortcuts - alt+d on Mac), but without a unicode byte order mark at the beginning of the file, the lexer rejects the tokens. It is not apparently easy to insert such marks (AFAICT no common tool does this specifically), while other languages work fine (i.e., accept unicode in their source) without it. Is there a downside to at least presuming UTF-8? According to the spec [1] this should Just Work. I'd recommend filing a bug. [1] https://dlang.org/spec/lex.html#source_text Under the identifiers section (https://dlang.org/spec/lex.html#identifiers) it describes identifiers as: Identifiers start with a letter, _, or universal alpha, and are followed by any number of letters, _, digits, or universal alphas. Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D of the C99 Standard. I was unable to find the definition of a "universal alpha", or whether that includes non-ascii alphabetic characters.
Re: Install multiple executables with DUB
On Friday, 4 September 2020 at 07:27:33 UTC, glis-glis wrote: On Thursday, 3 September 2020 at 14:34:48 UTC, Jacob Carlborg wrote: Oh, multiple binaries, I missed that. You can try to add multiple configurations [1]. Or if you have executables depending on only one source file, you can use single-file packages [2]. Thanks, but this still means I would have to write an install-script running `dub build --single` on each script, right? I looked at tsv-utils [1] which seems to be a similar use-case as mine, and they declare each tool as a subpackage. The main package runs a d-file called `dub_build.d` which compiles all subpackages. Fells like an overkill to me, I'll probably just stick to a makefile. [1] https://github.com/eBay/tsv-utils/blob/master/docs/AboutTheCode.md#building-and-makefile The `dub_build.d` is so that people can use `$ dub fetch` to download and build the tools with `$ dub run`, from code.dlang.org. dub fetch/run is the typical dub sequence. But it's awkward. And it geared toward users that have a D compiler plus dub already installed. For building your own binaries you might as well use `make`. However, if you decide to add your tools to the public dub package registry you might consider the technique. My understanding is that the dub developers recognize that multiple binaries are inconvenient at present and have ideas on improvements. Having a few more concrete use cases might help nail down the requirements. The tsv-utils directory layout may be worth a look. It's been pretty successful for multiple binaries in a single repo with some shared code. (Different folks made suggestions leading to this structure.) It works for both make and dub, and works well with other tools, like dlpdocs (Adam Ruppe's doc generator). The tsv-utils `make` setup is quite messy at this point, you can probably do quite a bit better. --Jon
Re: How to get the element type of an array?
On Tuesday, 25 August 2020 at 15:02:14 UTC, FreeSlave wrote: On Tuesday, 25 August 2020 at 03:41:06 UTC, Jon Degenhardt wrote: What's the best way to get the element type of an array at compile time? Something like std.range.ElementType except that works on any array type. There is std.traits.ForeachType, but it wasn't clear if that was the right thing. --Jon Why not just use typeof(a[0]) It does not matter if array is empty or not. Typeof does not actually evaluate its expression, just the type. Wow, yet another way that should have been obvious! Thanks! --Jon
Re: How to get the element type of an array?
On Tuesday, 25 August 2020 at 12:50:35 UTC, Steven Schveighoffer wrote: The situation is still confusing though. If only 'std.range.ElementType' is imported, a static array does not have a 'front' member, but ElementType still gets the correct type. (This is where the documentation says it'll return void.) You are maybe thinking of how C works? D imports are different, the code is defined the same no matter how it is imported. *your* module cannot see std.range.primitives.front, but the range module itself can see that UFCS function. This is a good characteristic. But the reason it surprised me was that I expected to be able to manually expand the ElementType (or ElementEncodingType) template see the results of the expressions it uses. template ElementType(R) { static if (is(typeof(R.init.front.init) T)) alias ElementType = T; else alias ElementType = void; } So, yes, I was expecting this to behave like an inline code expansion. Yesterday I was doing that for 'hasSlicing', which has a more complicated set of tests. I wanted to see exactly which expression in 'hasSlicing' was causing it to return false for a struct I wrote. (Turned out to be a test for 'length'.) I'll have to be more careful about this.
Re: How to get the element type of an array?
On Tuesday, 25 August 2020 at 05:02:46 UTC, Basile B. wrote: On Tuesday, 25 August 2020 at 03:41:06 UTC, Jon Degenhardt wrote: What's the best way to get the element type of an array at compile time? Something like std.range.ElementType except that works on any array type. There is std.traits.ForeachType, but it wasn't clear if that was the right thing. --Jon I'm curious to know what are the array types that were not accepted by ElementType ( or ElementEncodingType ) ? Interesting. I need to test static arrays. In fact 'ElementType' does work with static arrays. Which is likely what you expected. I assumed ElementType would not work, because static arrays don't satisfy 'isInputRange', and the documentation for ElementType says: The element type is determined as the type yielded by r.front for an object r of type R. [...] If R doesn't have front, ElementType!R is void. But, if std.range is imported, a static array does indeed get a 'front' member. It doesn't satisfy isInputRange, but it does have a 'front' element. The situation is still confusing though. If only 'std.range.ElementType' is imported, a static array does not have a 'front' member, but ElementType still gets the correct type. (This is where the documentation says it'll return void.) --- Import std.range --- @safe unittest { import std.range; ubyte[10] staticArray; ubyte[] dynamicArray = new ubyte[](10); static assert(is(ElementType!(typeof(staticArray)) == ubyte)); static assert(is(ElementType!(typeof(dynamicArray)) == ubyte)); // front is available static assert(__traits(compiles, staticArray.front)); static assert(__traits(compiles, dynamicArray.front)); static assert(is(typeof(staticArray.front) == ubyte)); static assert(is(typeof(dynamicArray.front) == ubyte)); } --- Import std.range.ElementType --- @safe unittest { import std.range : ElementType; ubyte[10] staticArray; ubyte[] dynamicArray = new ubyte[](10); static assert(is(ElementType!(typeof(staticArray)) == ubyte)); static assert(is(ElementType!(typeof(dynamicArray)) == ubyte)); // front is not available static assert(!__traits(compiles, staticArray.front)); static assert(!__traits(compiles, dynamicArray.front)); static assert(!is(typeof(staticArray.front) == ubyte)); static assert(!is(typeof(dynamicArray.front) == ubyte)); } This suggests the documentation for ElementType not quite correct.
Re: How to get the element type of an array?
On Tuesday, 25 August 2020 at 04:36:56 UTC, H. S. Teoh wrote: [...] Harry Gillanders, H.S. Teoh, Thank you both for the quick replies. Both methods address my needs. Very much appreciated, I was having trouble figuring this one out. --Jon
How to get the element type of an array?
What's the best way to get the element type of an array at compile time? Something like std.range.ElementType except that works on any array type. There is std.traits.ForeachType, but it wasn't clear if that was the right thing. --Jon
Re: Github Actions now support D out of the box!
On Friday, 21 August 2020 at 02:03:40 UTC, Mathias LANG wrote: [...] Thanks for the effort on this, I'll definitely be checking it out! --Jon
Re: getopt Basic usage
On Saturday, 15 August 2020 at 04:09:19 UTC, James Gray wrote: I am trying to use getopt and would not like the program to throw an unhandled exception when parsing command line options. Is the following, adapted from the first example in the getopt documentation, a reasonable approach? I use the approach you showed, except for writing errors to stderr and returning an exit status. This has worked fine. An example: https://github.com/eBay/tsv-utils/blob/master/number-lines/src/tsv_utils/number-lines.d#L48
Re: Reading from stdin significantly slower than reading file directly?
On Thursday, 13 August 2020 at 14:41:02 UTC, Steven Schveighoffer wrote: But for sure, reading from stdin doesn't do anything different than reading from a file if you are using the File struct. A more appropriate test might be using the shell to feed the file into the D program: dprogram < FILE Which means the same code runs for both tests. Indeed, using the 'prog < file' approach rather than 'cat file | prog' indeed removes any distinction for 'tsv-select'. 'tsv-select' uses File.rawRead rather than File.byLine.
Re: Reading from stdin significantly slower than reading file directly?
On Wednesday, 12 August 2020 at 22:44:44 UTC, methonash wrote: Hi, Relative beginner to D-lang here, and I'm very confused by the apparent performance disparity I've noticed between programs that do the following: 1) cat some-large-file | D-program-reading-stdin-byLine() 2) D-program-directly-reading-file-byLine() using File() struct The D-lang difference I've noticed from options (1) and (2) is somewhere in the range of 80% wall time taken (7.5s vs 4.1s), which seems pretty extreme. I don't know enough details of the implementation to really answer the question, and I expect it's a bit complicated. However, it's an interesting question, and I have relevant programs and data files, so I tried to get some actuals. The tests I ran don't directly answer the question posed, but may be a useful proxy. I used Unix 'cut' (latest GNU version) and 'tsv-select' from the tsv-utils package (https://github.com/eBay/tsv-utils). 'tsv-select' is written in D, and works like 'cut'. 'tsv-select' reads from stdin or a file via a 'File' struct. It's not using the built-in 'byLine' member though, it uses a version of 'byLine' that includes some additional buffering. Both stdin and a file system file are read this way. I used a file from the google ngram collection (http://storage.googleapis.com/books/ngrams/books/datasetsv2.html) and the file TREE_GRM_ESTN.csv from https://apps.fs.usda.gov/fia/datamart/CSV/datamart_csv.html, converted to a tsv file. The ngram file is a narrow file (21 bytes/line, 4 columns), the TREE file is wider (206 bytes/line, 49 columns). In both cases I cut the 2nd and 3rd columns. This tends to focus processing on input rather than processing and output. I also timed 'wc -l' for another data point. I ran the benchmarks 5 times each way and recorded the median time below. Machine used is a MacMini (so Mac OS) with 16 GB RAM and SSD drives. The numbers are very consisent for this test on this machine. Differences in the reported times are real deltas, not system noise. The commands timed were: * bash -c 'tsv-select -f 2,3 FILE > /dev/null' * bash -c 'cat FILE | tsv-select -f 2,3 > /dev/null' * bash -c 'gcut -f 2,3 FILE > /dev/null' * bash -c 'cat FILE | gcut -f 2,3 > /dev/null' * bash -c 'gwc -l FILE > /dev/null' * bash -c 'cat FILE | gwc -l > /dev/null' Note that 'gwc' and 'gcut' are the GNU versions of 'wc' and 'cut' installed by Homebrew. Google ngram file (the 's' unigram file): Test Elapsed System User --- -- tsv-select -f 2,3 FILE 10.280.42 9.85 cat FILE | tsv-select -f 2,311.101.45 10.23 cut -f 2,3 FILE 14.640.60 14.03 cat FILE | cut -f 2,3 14.361.03 14.19 wc -l FILE 1.320.39 0.93 cat FILE | wc -l 1.180.96 1.04 The TREE file: Test Elapsed System User --- -- tsv-select -f 2,3 FILE 3.770.95 2.81 cat FILE | tsv-select -f 2,3 4.542.65 3.28 cut -f 2,3 FILE 17.781.53 16.24 cat FILE | cut -f 2,3 16.772.64 16.36 wc -l FILE 1.380.91 0.46 cat FILE | wc -l 2.022.63 0.77 What this shows is that 'tsv-select' (D program) was faster when reading from a file than when reading from a standard input. It doesn't indicate why or whether the delta is due to code D library or code in 'tsv-select'. Interestingly, 'cut' showed the opposite behavior. It was faster when reading from standard input than when reading from the file. For 'wc', which method was faster was dependent on line length. Again, I caution against reading too much into this regarding performance of reading from standard input vs a disk file. Much more definitive tests can be done. However, it is an interesting comparison. Also, the D program is still fast in both cases. --Jon
Re: tsv-utils 2.0 release: Named field support
On Tuesday, 28 July 2020 at 15:57:57 UTC, bachmeier wrote: Thanks for your work. I've recommended tsv-utils to some students for their data analysis. It's a nice substitute for a database depending on what you're doing. It really helps that you store can store your "database" in repo like any other text file. I'm going to be checking out the new version soon. Thanks for the support and for checking out tools! Much appreciated.
Re: tsv-utils 2.0 release: Named field support
On Monday, 27 July 2020 at 14:32:27 UTC, aberba wrote: On Sunday, 26 July 2020 at 20:28:56 UTC, Jon Degenhardt wrote: I'm happy to announce a new major release of eBay's TSV Utilities. The 2.0 release supports named field selection in all of the tools, a significant usability enhancement. So I didn't checked it out until today and I'm really impressed about the documentation, presentation and just about everything. Thanks for the kind words, and for taking the time to check out the toolkit. Both are very much appreciated!
tsv-utils 2.0 release: Named field support
Hi all, I'm happy to announce a new major release of eBay's TSV Utilities. The 2.0 release supports named field selection in all of the tools, a significant usability enhancement. For those not familiar, tsv-utils is a set of command line tools for manipulating tabular data files of the type commonly found in machine learning and data mining environments. Filtering, statistics, sampling, joins, etc. The tools are patterned after traditional Unix common line tools like 'cut', 'grep', 'sort', etc., and are intended to work with these tools. Each tool is a standalone executable. Most people will only care about a subset of the tools. It is not necessary to learn the entire toolkit to get value from the tools. The tools are all written in D and are the fastest tools of their type available (benchmarks are on the GitHub repository). Previous versions of the tools referenced fields by field number, same as traditional Unix tools like 'cut'. In version 2.0, tsv-utils tools take fields either by field number or by field name, for files with header lines. A few examples using 'tsv-select', a tool similar to 'cut' that also supports field reordering and dropping fields: $ # Field numbers: Output fields 2 and 1, in that order. $ tsv-select -f 2,1 data.tsv $ # Field names: Output the 'Name' and 'RecordNum' fields. $ tsv-select -H -f Name,RecordNum data.tsv $ # Drop the 'Color' field, keep everything else. $ tsv-select -H --exclude Color file.tsv $ # Drop all the fields ending in '_time' $ tsv-select -H -e '*_time' data.tsv More information is available on the tsv-utils GitHub repository, including documentation and pre-built binaries: https://github.com/eBay/tsv-utils --Jon
Re: getopt: How does arraySep work?
On Thursday, 16 July 2020 at 17:40:25 UTC, Steven Schveighoffer wrote: On 7/16/20 1:13 PM, Andre Pany wrote: On Thursday, 16 July 2020 at 05:03:36 UTC, Jon Degenhardt wrote: On Wednesday, 15 July 2020 at 07:12:35 UTC, Andre Pany wrote: [...] An enhancement is likely to hit some corner-cases involving list termination requiring choices that are not fully generic. Any time a legal list value looks like a legal option. Perhaps the most important case is single digit numeric options like '-1', '-2'. These are legal short form options, and there are programs that use them. They are also somewhat common numeric values to include in command lines inputs. [...] My naive implementation would be that any dash would stop the list of multiple values. If you want to have a value containing a space or a dash, you enclose it with double quotes in the terminal. Enclose with double quotes in the terminal does nothing: myapp --modelicalibs "file-a.mo" "file-b.mo" will give you EXACTLY the same string[] args as: myapp --modelicalibs file-a.mo file-b.mo I think Jon's point is that it's difficult to distinguish where an array list ends if you get the parameters as separate items. Like: myapp --numbers 1 2 3 -5 -6 Is that numbers=> [1, 2, 3, -5, -6] or is it numbers=> [1, 2, 3], 5 => true, 6 => true This is probably why the code doesn't support that. -Steve Yes, this what I was getting. Thanks for the clarification. Also, it's not always immediately obvious what part of the argument splitting is being done by the shell, and what is being done by the program/getopt. Taking inspiration from the recent one-liners, here's way to see how the program gets the args from the shell for different command lines: $ echo 'import std.stdio; void main(string[] args) { args[1 .. $].writeln; }' | dmd -run - --numbers 1,2,3,-5,-6 ["--numbers", "1,2,3,-5,-6"] $ echo 'import std.stdio; void main(string[] args) { args[1 .. $].writeln; }' | dmd -run - --numbers 1 2 3 -5 -6 ["--numbers", "1", "2", "3", "-5", "-6"] $ echo 'import std.stdio; void main(string[] args) { args[1 .. $].writeln; }' | dmd -run - --numbers "1" "2" "3" "-5" "-6" ["--numbers", "1", "2", "3", "-5", "-6"] $ echo 'import std.stdio; void main(string[] args) { args[1 .. $].writeln; }' | dmd -run - --numbers '1 2 3 -5 -6' ["--numbers", "1 2 3 -5 -6"] The first case is what getopt supports now - All the values in a single string with a separator that getopt splits on. The 2nd and 3rd are identical from the program's perspective (Steve's point), but they've already been split, so getopt would need a different approach. And requires dealing with ambiguity. The fourth form eliminates the ambiguity, but puts the burden on the user to use quotes.
Re: getopt: How does arraySep work?
On Wednesday, 15 July 2020 at 07:12:35 UTC, Andre Pany wrote: On Tuesday, 14 July 2020 at 15:48:59 UTC, Andre Pany wrote: On Tuesday, 14 July 2020 at 14:33:47 UTC, Steven Schveighoffer wrote: On 7/14/20 10:22 AM, Steven Schveighoffer wrote: The documentation needs updating, it should say "parameters are added sequentially" or something like that, instead of "separation by whitespace". https://github.com/dlang/phobos/pull/7557 -Steve Thanks for the answer and the pr. Unfortunately my goal here is to simulate a partner tool written in C/C++ which supports this behavior. I will also create an enhancement issue for supporting this behavior. Kind regards Anste Enhancement issue: https://issues.dlang.org/show_bug.cgi?id=21045 Kind regards André An enhancement is likely to hit some corner-cases involving list termination requiring choices that are not fully generic. Any time a legal list value looks like a legal option. Perhaps the most important case is single digit numeric options like '-1', '-2'. These are legal short form options, and there are programs that use them. They are also somewhat common numeric values to include in command lines inputs. I ran into a couple cases like this with a getopt cover I wrote. The cover supports runtime processing of command arguments in the order entered on the command line rather than the compile-time getopt() call order. Since it was only for my stuff, not Phobos, it was an easy choice: Disallow single digit short options. But a Phobos enhancement might make other choices. IIRC, a characteristic of the current getopt implementation is that it does not have run-time knowledge of all the valid options, so the set of ambiguous entries is larger than just the limited set of options specified in the program. Essentially, anything that looks syntactically like an option. Doesn't mean an enhancement can't be built, just that there might some constraints to be aware of. --Jon
Re: Looking for a Code Review of a Bioinformatics POC
On Friday, 12 June 2020 at 06:20:59 UTC, H. S. Teoh wrote: I glanced over the implementation of byLine. It appears to be the unhappy compromise of trying to be 100% correct, cover all possible UTF encodings, and all possible types of input streams (on-disk file vs. interactive console). It does UTF decoding and resizing of arrays, and a lot of other frilly little squirrelly things. In fact I'm dismayed at how hairy it is, considering the conceptual simplicity of the task! Given this, it will definitely be much faster to load in large chunks of the file at a time into a buffer, and scanning in-memory for linebreaks. I wouldn't bother with decoding at all; I'd just precompute the byte sequence of the linebreaks for whatever encoding the file is expected to be in, and just scan for that byte pattern and return slices to the data. This is basically what bufferedByLine in tsv-utils does. See: https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/utils.d#L793. tsv-utils has the advantage of only needing to support utf-8 files with Unix newlines, so the code is simpler. (Windows newlines are detected, this occurs separately from bufferedByLine.) But as you describe, support for a wider variety of input cases could be done without sacrificing basic performance. iopipe provides much more generic support, and it is quite fast. Having said all of that, though: usually in non-trivial programs reading input is the least of your worries, so this kind of micro-optimization is probably unwarranted except for very niche cases and for micro-benchmarks and other such toy programs where the cost of I/O constitutes a significant chunk of running times. But knowing what byLine does under the hood is definitely interesting information for me to keep in mind, the next time I write an input-heavy program. tsv-utils tools saw performance gains of 10-40% by moving from File.byLine to bufferedByLine, depending on tool and type of file (narrow or wide). Gains of 5-20% were obtained by switching from File.write to BufferedOutputRange, with some special cases improving by 50%. tsv-utils tools aren't micro-benchmarks, but they are not typical apps either. Most of the tools go into a tight loop of some kind, running a transformation on the input and writing to the output. Performance is a real benefit to these tools, as they get run on reasonably large data sets.
Re: Looking for a Code Review of a Bioinformatics POC
On Friday, 12 June 2020 at 00:58:34 UTC, duck_tape wrote: On Thursday, 11 June 2020 at 23:45:31 UTC, H. S. Teoh wrote: Hmm, looks like it's not so much input that's slow, but *output*. In fact, it looks pretty bad, taking almost as much time as overlap() does in total! [snip...] I'll play with that a bit tomorrow! I saw a nice implementation on eBay's tsvutils that I may need to look closer at. Someone else suggested that stdout flushes per line by default. I dug around the stdlib but could confirm that. I also played around with setvbuf but it didn't seem to change anything. Have you run into that before / know if stdout is flushing every newline? I'm not above opening '/dev/stdout' as a file of that writes faster. I put some comparative benchmarks in https://github.com/jondegenhardt/dcat-perf. It compares input and output using standard Phobos facilities (File.byLine, File.write), iopipe (https://github.com/schveiguy/iopipe), and the tsv-utils buffered input and buffered output facilities. I haven't spent much time on results presentation, I know it's not that easy to read and interpret the results. Brief summary - On files with short lines buffering will result in dramatic throughput improvements over the standard phobos facilities. This is true for both input and output, through likely for different reasons. For input iopipe is the fastest available. tsv-utils buffered facilities are materially faster than phobos for both input and output, but not as fast as iopipe for input. Combining iopipe for input with tsv-utils BufferOutputRange for output works pretty well. For files with long lines both iopipe and tsv-utils BufferedByLine are materially faster than Phobos File.byLine when reading. For writing there wasn't much difference from Phobos File.write. A note on File.byLine - I've had many opportunities to compare Phobos File.byLine to facilities in other programming languages, and it is not bad at all. But it is beatable. About Memory Mapped Files - The benchmarks don't include compare against mmfile. They certainly make sense as a comparison point. --Jon
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160 Got posted again to Hacker News earlier today. Currently at position 5.
Re: Idiomatic way to write a range that tracks how much it consumes
On Monday, 27 April 2020 at 05:06:21 UTC, anon wrote: To implement your option A you could simply use std.range.enumerate. Would something like this work? import std.algorithm.iteration : map; import std.algorithm.searching : until; import std.range : tee; size_t bytesConsumed; auto result = input.map!(a => a.yourTransformation ) .until!(stringTerminator) .tee!(a => bytesConsumed++); // bytesConsumed is automatically updated as result is consumed That's interesting. Wouldn't work quite like, but something similar would, but I don't think it quite achieves what I want. One thing that's missing is that the initial input is simply a string, there's nothing to map over at that point. There is however a transformation step that transforms the string into a sequence of slices. Then there's a transformation on those slices. That would be a step prior to the 'map' step. Also, in my case 'map' cannot be used, because each slice may produce multiple outputs. The specifics are minor details, not really so important. The implementation can take a form along the lines described. However, structuring like this exposes the details of these steps to all callers. That is, all callers would have to write the code above. My goal is encapsulate the steps into a single range all callers can use. That is, encapsulate something like the steps you have above in a standalone range that takes the input string as an argument, produces all the output elements, and preserves the bytesConsumed in a way the caller can access it.
Re: Idiomatic way to write a range that tracks how much it consumes
On Monday, 27 April 2020 at 04:51:54 UTC, Steven Schveighoffer wrote: On 4/26/20 11:38 PM, Jon Degenhardt wrote: Is there a better way to write this? I had exactly the same problems. I created this to solve the problem, I've barely tested it, but I plan to use it with all my parsing utilities on iopipe: https://code.dlang.org/packages/bufref https://github.com/schveiguy/bufref/blob/master/source/bufref.d Thanks Steve, I'll definitely take a look at this. --Jon
Re: Idiomatic way to write a range that tracks how much it consumes
On Monday, 27 April 2020 at 04:41:58 UTC, drug wrote: 27.04.2020 06:38, Jon Degenhardt пишет: Is there a better way to write this? --Jon I don't know a better way, I think you enlist all possible ways - get a value using either `front` or special range member. I prefer the second variant, I don't think it is less consistent with range paradigms. Considering you need amount of consumed bytes only when range is empty the second way is more effective. Thanks. Of two, I like the second better as well.
Idiomatic way to write a range that tracks how much it consumes
I have a string that contains a sequence of elements, then a terminator character, followed by a different sequence of elements (of a different type). I want to create an input range that traverses the initial sequence. This is easy enough. But after the initial sequence has been traversed, the caller will need to know where the next sequence starts. That is, the caller needs to know the index in the input string where the initial sequence ends and the next sequence begins. The values returned by the range are a transformation of the input, so the values by themselves are insufficient for the caller to determined how much of the string has been consumed. And, the caller cannot simply search for the terminator character. Tracking the number of bytes consumed is easy enough. I like to do in a way that is consistent with D's normal range paradigm. Two candidate approaches: a) Instead of having the range return the individual values, it could return a tuple containing the value and the number of bytes consumed. b) Give the input range an extra member function which returns the number of bytes consumed. The caller could call this after 'empty()' returns true to find the amount of data consumed. Both will work, but I'm not especially satisfied with either. Approach (a) seems more consistent with the typical range paradigms, but also more of a hassle for callers. Is there a better way to write this? --Jon
Re: Integration tests
On Friday, 17 April 2020 at 16:56:57 UTC, Russel Winder wrote: Hi, Thinking of trying to do the next project in D rather than Rust, but… Rust has built in unit testing on a module basis. D has this so no problem. Rust allows for integration tests in the tests directory of a project. These are automatically build and run along with all unit tests as part of "cargo test". Does D have any integrated support for integration tests in the way Rust does? Automated testing is important, perhaps you describe further what's needed? I haven't worked with Rust test frameworks, but I took a look at the description of the integration tests and unit tests. It wasn't immediately obvious what can be done with the Rust integration test framework that cannot be done with D's unittest framework. An important concept described was testing a module as an external caller. That would seem very be doable using D's unittest framework. For example, one could create a set of tests against Phobos, put them in a separate location (e.g. a separate file), and arrange to have the unittests run as part of a CI process run along with a build. My look was very superficial, perhaps you could explain more.
Re: How to correctly import tsv-utilites functions?
On Tuesday, 14 April 2020 at 20:25:08 UTC, p.shkadzko wrote: On Tuesday, 14 April 2020 at 20:05:28 UTC, Steven Schveighoffer wrote: On 4/14/20 3:34 PM, p.shkadzko wrote: [...] What about using dependency tsv-utils:common ? Looks like tsv-utils is a collection of subpackages, and the main package just serves as a namespace. -Steve Yes, it works! Thank you. Glad that worked for you. (And thanks Steve!) I have a small app with an example of a dub.json file that pulls the tsv-utils common dependencies this way: https://github.com/jondegenhardt/dcat-perf/blob/master/dub.json --Jon
Re: Our HOPL IV submission has been accepted!
On Saturday, 29 February 2020 at 01:00:40 UTC, Andrei Alexandrescu wrote: Walter, Mike, and I are happy to announce that our paper submission "Origins of the D Programming Language" has been accepted at the HOPL IV (History of Programming Languages) conference. https://hopl4.sigplan.org/track/hopl-4-papers Getting a HOPL paper in is quite difficult, and an important milestone for the D language. We'd like to thank the D community which was instrumental in putting the D language on the map. The HOPL IV conference will take place in London right before DConf. With regard to travel, right now Covid-19 fears are on everybody's mind; however, we are hopeful that between now and then the situation will improve. Congrats! Indeed a meaningful accomplishment.
New graphs for tsv-utils performance benchmarks
A small thing - Many people who have seen the performance benchmarks for eBay's TSV Utilities find the text table format I've used in the past hard to read. Me too. So, I finally generated more traditional graphical representations for the 2018 benchmark results. The graphs are here: https://github.com/eBay/tsv-utils/blob/master/docs/Performance.md#2018-benchmark-summary There are no new benchmarks, just new visualizations of the results. For folks who not familiar with these benchmarks - This is part of performance studies done by comparing eBay's TSV Utilities with a number of command line tools providing similar functionality (e.g. awk). The results shown were presented at DConf 2018. * Details of the performance study - https://github.com/eBay/tsv-utils/blob/master/docs/Performance.md * DConf 2018 talk slides - https://github.com/eBay/tsv-utils/blob/master/docs/dconf2018.pdf
Re: Unexpected result with std.conv.to
On Friday, 15 November 2019 at 03:51:04 UTC, Joel wrote: I made a feature that converts, say, [9:59am] -> [10:00am] to 1 minute. but found '9'.to!int = 57 (not 9). Doesn't seem right... I'm guessing that's standard though, same with ldc. Use a string or char[] array. e.g. writeln("9".to!int) => 9. With a single 'char' what is being produced is the ascii value of the character.
Re: csvReader & specifying separator problems...
On Thursday, 14 November 2019 at 12:25:30 UTC, Robert M. Münch wrote: Just trying a very simple thing and it's pretty hard: "Read a CSV file (raw_data) that has a ; separator so that I can iterate over the lines and access the fields." csv_data = raw_data.byLine.joiner("\n") From the docs, which I find extremly hard to understand: auto csvReader(Contents = string, Malformed ErrorLevel = Malformed.throwException, Range, Separator = char)(Range input, Separator delimiter = ',', Separator quote = '"') So, let's see if I can decyphre this, step-by-step by trying out: csv_records = csv_data.csvReader(); Would split the CSV data into iterable CSV records using ',' char as separator using UFCS syntax. When running this I get: [...] Side comment - This code looks like it was taken from the first example in the std.csv documentation. To me, the code in the std.csv example is doing something that might not be obvious at first glance and is potentially confusing. In particular, 'byLine' is not reading individual CSV records. CSV can have embedded newlines, these are identified by CSV escape syntax. 'byLine' doesn't know the escape syntax. If there are embedded newlines, 'byLine' will read partial records, which may not be obvious at first glance. The .joiner("\n") step puts the newline back, stitching fields and records back together again in the process. The effect is to create an input range of characters representing the entire file, using 'byLine' to do buffered reads. This input range is passed to CSVReader. This could also be done using 'byChunk' and 'joiner' (with no separator). This would use a fixed size buffer, no searching for newlines while reading, so it should be faster. An example: csv_by_chunk.d import std.algorithm; import std.csv; import std.conv; import std.stdio; import std.typecons; import std.utf; void main() { // Small buffer used to show it works. Normally would use a larger buffer. ubyte[16] buffer; auto stdinBytes = stdin.byChunk(buffer).joiner; auto stdinDChars = stdinBytes.map!((ubyte b) => cast(char) b).byDchar; writefln("--"); foreach (record; stdinDChars.csvReader!(Tuple!(string, string, string))) { writefln("Field 0: |%s|", record[0]); writefln("Field 1: |%s|", record[1]); writefln("Field 2: |%s|", record[2]); writefln("--"); } } Pass it csv data without embedded newlines: $ echo $'abc,def,ghi\njkl,mno,pqr' | ./csv_by_chunk -- Field 0: |abc| Field 1: |def| Field 2: |ghi| -- Field 0: |jkl| Field 1: |mno| Field 2: |pqr| -- Pass it csv data with embedded newlines: $ echo $'abc,"LINE 1\nLINE 2",ghi\njkl,mno,pqr' | ./csv_by_chunk -- Field 0: |abc| Field 1: |LINE 1 LINE 2| Field 2: |ghi| -- Field 0: |jkl| Field 1: |mno| Field 2: |pqr| -- An example like this may avoid the confusion about newlines. Unfortunately, the need to do the odd looking conversion from ubyte to char/dchar is undesirable in a code example. I haven't found a cleaner way to write that. If there's a nicer way I'd appreciate hearing about it. --Jon
Re: formatting a float or double in a string with all significant digits kept
On Thursday, 10 October 2019 at 17:12:25 UTC, dan wrote: Thanks also berni44 for the information about the dig attribute, Jon for the neat packaging into one line using the attribute on the type. Unfortunately, the version of gdc that comes with the version of debian that i am using does not have the dig attribute yet, but perhaps i can upgrade, and eventually i think gdc will have it. Glad these ideas helped. The value of the 'double.dig' property is not going to change between compilers/versions/etc. It's really a property of IEEE 754 floating point for 64 bit floats. (D specified the size of double as 64). So, if you are using double, then it's pretty safe to use 15 until the compiler you're using is further along on versions. Declare an enum or const variable to give it a name so you can track it down later. Also, don't get thrown off by the PI is a real, not a double. D supports 80 bit floats as real, so constants like PI are defined as real. But if you convert PI to a double, it'll then have 15 significant bits of precision. --Jon
Re: formatting a float or double in a string with all significant digits kept
On Wednesday, 9 October 2019 at 05:46:12 UTC, berni44 wrote: On Tuesday, 8 October 2019 at 20:37:03 UTC, dan wrote: But i would like to be able to do this without knowing the expansion of pi, or writing too much code, especially if there's some d function like writeAllDigits or something similar. You can use the property .dig to get the number of significant digits of a number: writeln(PI.dig); // => 18 You still need to account for the numbers before the dot. If you're happy with scientific notation you can do: auto t = format("%.*e", PI.dig, PI); writeln("PI = ",t); Using the '.dig' property is a really nice idea and looks very useful for this. A clarification though - It's the significant digits in the data type, not the value. (PI is 18 because it's a real, not a double.) So: writeln(1.0f.dig, ", ", float.dig); => 6, 6 writeln(1.0.dig, ", ", double.dig); => 15, 15 writeln(1.0L.dig, ", ", real.dig); => 18, 18 Another possibility would be to combine the '.dig' property with the "%g" option, similar to the use "%e" shown. For example, these lines: writeln(format("%0.*g", PI.dig, PI)); writeln(format("%0.*g", double.dig, 1.0)); writeln(format("%0.*g", double.dig, 100.0)); writeln(format("%0.*g", double.dig, 1.0001)); writeln(format("%0.*g", double.dig, 0.0001)); produce: 3.14159265358979324 1 100 1.0001 1e-08 Hopefully experimenting with the different formatting options available will yield one that works for your use case.
Re: LDC 1.17.0-beta1
On Saturday, 10 August 2019 at 15:51:28 UTC, kinke wrote: Glad to announce the first beta for LDC 1.17: ... Please help test, and thanks to all contributors! No changes in my standard performance tests (good). All functional tests pass as well.
Re: Help me decide D or C
On Wednesday, 31 July 2019 at 18:38:02 UTC, Alexandre wrote: Should I go for C and then when I become a better programmer change to D? Should I start with D right now? In my view, the most important thing is the decision you've already made - to pick a programming language and learn it in a reasonable bit of depth. Which programming language you choose is less important. No matter which choice you make you'll have the opportunity to learn skills that will transfer to other programming languages. As you can tell from the other responses, the pros and cons of a learning a specific language depend quite a bit on what you hope to get out of it, and are to a fair extent subjective. But both C and D provide meaningful opportunities to gain worthwhile experience. A couple reasons for considering learning D over C are its support for functional programming and templates. These were also mentioned by a few other people. These are not really "beginner" topics, but as one moves past the beginner stage they are really quite valuable techniques to start mastering. For both D is the far better option, and it's not necessary to use either when starting out. --Jon
Re: rdmd takes 2-3 seconds on a first-run of a simple .d script
On Saturday, 25 May 2019 at 22:18:16 UTC, Andre Pany wrote: On Saturday, 25 May 2019 at 08:32:08 UTC, BoQsc wrote: I have a simple standard .d script and I'm getting annoyed that it takes 2-3 seconds to run and see the results via rdmd. Also please keep in mind there could be other factors like slow disks, anti virus scanners,... which causes a slow down. I have seen similar behavior that I attribute to virus scan software. After compiling a program, the first run takes several seconds to run, after that it runs immediately. I'm assuming the first run of an unknown binary triggers a scan, though I cannot be completely sure. Try compiling a new binary in D or C++ and see if a similar effect is seen. --Jon
Re: bool (was DConf 2019 AGM Livestream)
On Sunday, 12 May 2019 at 17:08:49 UTC, Jonathan M Davis wrote: ... snip ... Fortunately, in the grand scheme of things, while this issue does matter, it's still much smaller than almost all of the issues that we have to worry about and consider having DIPs for. Personally, I'm not at all happy that this DIP was rejected, but I think that continued debate on it is a waste of everyone's time. Agreed. I too have never liked numeric values equated to true/false, in any programming language. However, it is very common. And, relative to other the big ticket items on the table, of minor importance. Changing the current behavior won't materially affect the usability of D or its future. This is a case where the best course is to make a decision move on. --Jon
Re: eBay's TSV Utilities status update
On Friday, 3 May 2019 at 03:54:14 UTC, James Blachly wrote: On 4/29/19 11:23 AM, Jon Degenhardt wrote: An update on changes to this tool-set over the last year. ... Thank you for this, and thanks for your blog post of a couple of years ago, which I referred to many times while learning D and writing fast(er) CLI tools. Looking forward to trying Steve's iopipe as well as your bufferedByLineReader. James Thanks for the kind words James!
Re: Poor regex performance?
On Thursday, 4 April 2019 at 10:31:43 UTC, Julian wrote: On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole wrote: If you need performance use ldc not dmd (assumed). LLVM has many factors better code optimizes than dmd does. Thanks! I already had dmd installed from a brief look at D a long time ago, so I missed the details at https://dlang.org/download.html ldc2 -O3 does a lot better, but the result is still 30x slower without PCRE. Try: ldc2 -O3 -release -flto=thin -defaultlib=phobos2-ldc-lto,druntime-ldc-lto -enable-inlining This will improve inlining and optimization across the runtime library boundaries. This can help in certain types of code.
Dub: A json/sdl equivalent to --combined command line option?
In Dub, is there a way to specify the equivalent of the --combined command line argument in the json/sdl package config file? What I'd like to be able to do is create a custom build type such that $ dub build --build=build-xyz builds in combined mode, without needing to add the --combined on the command line. Putting it on the command line as follows did what I intended: $ dub build --build=build-xyz --combined --Jon
Re: NEW Milestone: 1500 packages at code.dlang.org
On Thursday, 7 February 2019 at 18:02:21 UTC, H. S. Teoh wrote: On Thu, Feb 07, 2019 at 05:06:09PM +, Seb via Digitalmars-d-announce wrote: On Thursday, 7 February 2019 at 16:40:08 UTC, Anonymouse wrote: > > What was the word on the autotester (or similar) testing > popular > packages as part of the test suite? This is been done since more than a year now for the ~50 most popular packages: https://buildkite.com/dlang In my opinion this is one of the main reasons why the last releases were so successful (=almost no regressions). That's awesome. This is the way to go. Congrats to everyone who helped pull this off. T Agreed! This is a really nice bit of work that's come out of the D ecosystem.
Re: D-lighted, I'm Sure
On Friday, 18 January 2019 at 14:29:14 UTC, Mike Parker wrote: Not long ago, in my retrospective on the D Blog in 2018, I invited folks to write about their first impressions of D. Ron Tarrant, who you may have seen in the Lear forum, answered the call. The result is the latest post on the blog, the first guest post of 2019. Thanks, Ron! As a reminder, I'm still looking for new-user impressions and guest posts on any D-related topic. Please contact me if you're interested. And don't forget, there's a bounty for guest posts, so you can make a bit of extra cash in the process. The blog: https://dlang.org/blog/2019/01/18/d-lighted-im-sure/ Reddit: https://www.reddit.com/r/programming/comments/ahawhz/dlighted_im_sure_the_first_two_months_with_d/ Nicely done. Very enjoyable, thanks for publishing this! --Jon
Re: My Meeting C++ Keynote video is now available
On Saturday, 12 January 2019 at 15:51:03 UTC, Andrei Alexandrescu wrote: https://youtube.com/watch?v=tcyb1lpEHm0 If nothing else please watch the opening story, it's true and quite funny :o). Now as to the talk, as you could imagine, it touches on another language as well... Andrei Very nice. I especially liked how design by introspection was contrasted with other approaches and how the constexpr discussion fit into the overall theme. --Jon
Re: DCD, D-Scanner and DFMT : new year edition
On Monday, 31 December 2018 at 07:56:00 UTC, Basile B. wrote: DCD [1] 0.10.2 comes with bugfixes and small API changes. DFMT [2] and D-Scanner [3] with bugfixes too and all of the three products are based on d-parse 0.10.z, making life easier and the libraries versions more consistent for the D IDE and D IDE plugins developers. [1] https://github.com/dlang-community/DCD/releases/tag/v0.10.2 [2] https://github.com/dlang-community/dfmt/releases/tag/v0.9.0 [3] https://github.com/dlang-community/D-Scanner/releases/tag/v0.6.0 Thanks for the ongoing work on DCD et al!
Re: Which Docker to use?
On Monday, 22 October 2018 at 18:44:01 UTC, Jacob Carlborg wrote: On 2018-10-21 20:45, Jon Degenhardt wrote: The issue that caused me to go to Ubuntu 16.04 had to do with uncaught exceptions when using LTO with the gold linker and LDC 1.5. Problem occurred with 14.04, but not 16.04. I should go back and retest on Ubuntu 14.04 with a more recent LDC, it may well have been corrected. The issue thread is here: https://github.com/ldc-developers/ldc/issues/2390. Ah, that might be the reason. I am not using LTO. You might want to try a newer version of LDC as well since 1.5 is quite old now. I switched to LDC 1.12.0. The problem remains with LTO and static builds on Ubuntu 14.04. Ubuntu 16.04 is required, at least with LTO of druntime/phobos. The good news on this front is that the regularly updated dlang2 docker images work fine with LTO on druntime/phobos (using the LTO build support available in LDC 1.9.0). Examples of travis-ci setups for both dlanguage and dlang2 docker images are available on the tsv-utils travis config: https://github.com/eBay/tsv-utils/blob/master/.travis.yml. Look for the DOCKERSPECIAL environment variables.
Re: d word counting approach performs well but has higher mem usage
On Saturday, 3 November 2018 at 14:26:02 UTC, dwdv wrote: Hi there, the task is simple: count word occurrences from stdin (around 150mb in this case) and print sorted results to stdout in a somewhat idiomatic fashion. Now, d is quite elegant while maintaining high performance compared to both c and c++, but I, as a complete beginner, can't identify where the 10x memory usage (~300mb, see results below) is coming from. Unicode overhead? Internal buffer? Is something slurping the whole file? Assoc array allocations? Couldn't find huge allocs with dmd -vgc and -profile=gc either. What did I do wrong? Not exactly the same problem, but there is relevant discussion in the blog post I wrote a while ago: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ See in particular the section on Associate Array lookup optimization. This takes advantage of the fact that it's only necessary to create the immutable string the first time a key is entered into the hash. Subsequent occurrences do not need to take this step. As creating allocates new memory, even if only used temporarily, this is a meaningful savings. There have been additional APIs added to the AA interface since I wrote the blog post, I believe it is now possible to accomplish the same thing with more succinct code. Other optimization possibilities: * Avoid auto-decode: Not sure if your code is hitting this, but if so it's a significant performance hit. Unfortunately, it's not always obvious when this is happening. The task your are performing doesn't need auto-decode because it is splitting on single-byte utf-8 char boundaries (newline and space). * LTO on druntime/phobos: This is easy and will have a material speedup. Simply add '-defaultlib=phobos2-ldc-lto,druntime-ldc-lto' to the 'ldc2' build line, after the '-flto=full' entry. This will be a win because it will enable a number of optimizations in the internal loop. * Reading the whole file vs line by line - 'byLine' is really fast. It's also nice and general, as it allows reading arbitrary size files or standard input without changes to the code. However, it's not as fast as reading the file in a single shot. * std.algorithm.joiner - Has improved dramatically, but is still slower than a foreach loop. See: https://github.com/dlang/phobos/pull/6492 --Jon
Re: Which Docker to use?
On Sunday, 21 October 2018 at 18:11:37 UTC, Jacob Carlborg wrote: On 2018-10-18 01:15, Jon Degenhardt wrote: I need to use docker to build static linked Linux executables. My reason is specific, may be different than the OP's. I'm using Travis-CI to build executables. Travis-CI uses Ubuntu 14.04, but static linking fails on 14.04. The standard C library from Ubuntu 16.04 or later is needed. There may be other/better ways to do this, I don't know. That's interesting. I've built static binaries for DStep using LDC on Travis CI without any problems. My comment painted too broad a brush. I had forgotten how specific the issue I saw was. Apologies for the confusion. The issue that caused me to go to Ubuntu 16.04 had to do with uncaught exceptions when using LTO with the gold linker and LDC 1.5. Problem occurred with 14.04, but not 16.04. I should go back and retest on Ubuntu 14.04 with a more recent LDC, it may well have been corrected. The issue thread is here: https://github.com/ldc-developers/ldc/issues/2390.
Re: Which Docker to use?
On Friday, 19 October 2018 at 22:16:04 UTC, Ky-Anh Huynh wrote: On Wednesday, 17 October 2018 at 23:15:53 UTC, Jon Degenhardt wrote: I need to use docker to build static linked Linux executables. My reason is specific, may be different than the OP's. I'm using Travis-CI to build executables. Travis-CI uses Ubuntu 14.04, but static linking fails on 14.04. The standard C library from Ubuntu 16.04 or later is needed. There may be other/better ways to do this, I don't know. Yes I'm also using Travis-CI and that's why I need some Docker support. I'm using dlanguage/ldc. The reason for that choice was because it was what was available when I put the travis build together. As you mentioned, it hasn't been updated in a while. I'm still producing this build with an older ldc version, but when I move to a more current version I'll have to switch to a different docker image. My travis config is here: https://github.com/eBay/tsv-utils/blob/master/.travis.yml. Look for the sections referencing the DOCKERSPECIAL environment variable.
Re: Which Docker to use?
On Wednesday, 17 October 2018 at 08:08:44 UTC, Gary Willoughby wrote: On Wednesday, 17 October 2018 at 03:37:21 UTC, Ky-Anh Huynh wrote: Hi, I need to build some static binaries with LDC. I also need to execute builds on both platform 32-bit and 64-bit. From Docker Hub there are two image groups: * language/ldc (last update 5 months ago) * dlang2/ldc-ubuntu (updated recently) Which one do you suggest? Thanks a lot. To be honest, you don't need docker for this. You can just download LDC in a self-contained folder and use it as is. https://github.com/ldc-developers/ldc/releases That's what I do on Linux. I need to use docker to build static linked Linux executables. My reason is specific, may be different than the OP's. I'm using Travis-CI to build executables. Travis-CI uses Ubuntu 14.04, but static linking fails on 14.04. The standard C library from Ubuntu 16.04 or later is needed. There may be other/better ways to do this, I don't know.
Re: A Friendly Challenge for D
On Tuesday, 16 October 2018 at 07:09:05 UTC, Vijay Nayar wrote: D has multiple compilers, but for the speed of the finished binary, LDC2 is generally recommended. I used version 1.11.0. https://github.com/ldc-developers/ldc/releases/tag/v1.11.0 I was using DUB to manage the project, but to build the stand-alone file from the gist link, use this command: $ ldc2 -release -O3 twinprimes_ssoz.d And to run it: $ echo "30" | ./twinprimes_ssoz It'd be interesting to see if LTO or PGO generated an improvement. It looks like it could in this case, as it might optimize some of the inner loops. LTO is easy, enable it with: -flto= -defaultlib=phobos2-ldc-lto,druntime-ldc-lto (see: https://github.com/ldc-developers/ldc/releases/tag/v1.9.0). I've been using 'thin' on OSX, 'full' on Linux. PGO is a bit more work, but not too bad. A good primer is here: https://johanengelen.github.io/ldc/2016/07/15/Profile-Guided-Optimization-with-LDC.html --Jon
Re: Iain Buclaw at GNU Tools Cauldron 2018
On Monday, 8 October 2018 at 05:12:03 UTC, Joakim wrote: On Sunday, 7 October 2018 at 15:41:43 UTC, greentea wrote: Date: September 7 to 9, 2018. Location: Manchester, UK GDC - D front-end GCC https://www.youtube.com/watch?v=iXRJJ_lrSxE Thanks for the link, just watched the whole video. The first half-hour sets the standard as an intro to the language, as only a compiler developer other than the main implementer could give, ie someone with fresh eyes. I loved that Iain started off with a list of real-world projects. That's a mistake a lot of tech talks make, ie not motivating _why_ anybody should care about their tech and simply diving into the tech itself. I hadn't heard some of that info either, great way to begin. I agree, a very nice talk, including the way the motivation part of was handled. I especially liked the example of the group who typically used Python for rapid prototyping, then re-wrote in C++ for production, who upon trying D for a prototype, were pleasantly surprised it was performant enough for production.
Re: Error: variable 'xyz' has scoped destruction, cannot build closure
On Friday, 5 October 2018 at 16:34:32 UTC, Paul Backus wrote: On Friday, 5 October 2018 at 06:56:49 UTC, Nicholas Wilson wrote: On Friday, 5 October 2018 at 06:44:08 UTC, Nicholas Wilson wrote: Alas is does not because each does not accept additional argument other than the range. Shouldn't be hard to fix though. https://issues.dlang.org/show_bug.cgi?id=19287 You can thread multiple arguments through to `each` using `std.range.zip`: tenRandomNumbers .zip(repeat(output)) .each!(unpack!((n, output) => output.appendln(n.to!string))); Full code: https://run.dlang.io/is/Qe7uHt Very interesting, thanks. It's a clever way to avoid the delegate capture issue. (Aside: A nested function that accesses 'output' from lexical context has the same issue as delegates wrt to capturing the variable.)
Re: Error: variable 'xyz' has scoped destruction, cannot build closure
On Friday, 5 October 2018 at 06:44:08 UTC, Nicholas Wilson wrote: On Friday, 5 October 2018 at 06:22:57 UTC, Nicholas Wilson wrote: tenRandomNumbers.each!((n,o) => o.appendln(n.to!string))(output); or tenRandomNumbers.each!((n, ref o) => o.appendln(n.to!string))(output); should hopefully do the trick (run.dlang.io seems to be down atm). Alas is does not because each does not accept additional argument other than the range. Shouldn't be hard to fix though. Yeah, that's what I was seeing also. Thanks for taking a look. Is there perhaps a way to limit the scope of the delegate to the local function? Something that would tell the compiler the delegate has a lifetime shorter than the struct. One specific it points out is that this a place where the BufferedOutputRange I wrote cannot be used interchangeably with other output ranges. It's minor, but the intent was to be able to pass this anyplace an output range could be used.
Error: variable 'xyz' has scoped destruction, cannot build closure
I got the compilation error in the subject line when trying to create a range via std.range.generate. Turns out this was caused by trying to create a closure for 'generate' where the closure was accessing a struct containing a destructor. The fix was easy enough: write out the loop by hand rather than using 'generate' with a closure. What I'm wondering/asking is if there alternate way to do this that would enable the 'generate' approach. This is more curiosity/learning at this point. Below is a stripped down version of what I was doing. I have a struct for output buffering. The destructor writes any data left in the buffer to the output stream. This gets passed to routines performing output. It was this context that I created a generator that wrote to it. example.d- struct BufferedStdout { import std.array : appender; private auto _outputBuffer = appender!(char[]); ~this() { import std.stdio : write; write(_outputBuffer.data); _outputBuffer.clear; } void appendln(T)(T stuff) { import std.range : put; put(_outputBuffer, stuff); put(_outputBuffer, "\n"); } } void foo(BufferedStdout output) { import std.algorithm : each; import std.conv : to; import std.range: generate, takeExactly; import std.random: Random, uniform, unpredictableSeed; auto randomGenerator = Random(unpredictableSeed); auto randomNumbers = generate!(() => uniform(0, 1000, randomGenerator)); auto tenRandomNumbers = randomNumbers.takeExactly(10); tenRandomNumbers.each!(n => output.appendln(n.to!string)); } void main(string[] args) { foo(BufferedStdout()); } End of example.d- Compiling the above results in: $ dmd example.d example.d(22): Error: variable `example.foo.output` has scoped destruction, cannot build closure As mentioned, using a loop rather than 'generate' works fine, but help with alternatives that would use generate would be appreciated. The actual buffered output struct has more behind it than shown above, but not too much. For anyone interested it's here: https://github.com/eBay/tsv-utils/blob/master/common/src/tsvutil.d#L358
Re: More fun with autodecoding
On Saturday, 8 September 2018 at 15:36:25 UTC, Steven Schveighoffer wrote: On 8/9/18 2:44 AM, Walter Bright wrote: On 8/8/2018 2:01 PM, Steven Schveighoffer wrote: Here's where I'm struggling -- because a string provides indexing, slicing, length, etc. but Phobos ignores that. I can't make a new type that does the same thing. Not only that, but I'm finding the specializations of algorithms only work on the type "string", and nothing else. One of the worst things about autodecoding is it is special, it *only* steps in for strings. Fortunately, however, that specialness enabled us to save things with byCodePoint and byCodeUnit. So it turns out that technically the problem here, even though it seemed like an autodecoding problem, is a problem with splitter. splitter doesn't deal with encodings of character ranges at all. This could partially explain why when I tried byCodeUnit and friends awhile ago I concluded it wasn't a reasonable approach: splitter is in the middle of much of what I've written. Even if splitter is changed I'll still be very doubtful about the byCodeUnit approach as a work-around. An automated way to validate that it is engaged only when necessary would be very helpful (@noautodecode perhaps? :)) --Jon
Re: This is why I don't use D.
On Wednesday, 5 September 2018 at 16:26:14 UTC, rikki cattermole wrote: On 06/09/2018 4:19 AM, H. S. Teoh wrote: On Wed, Sep 05, 2018 at 09:34:14AM -0600, Jonathan M Davis via Digitalmars-d wrote: On Wednesday, September 5, 2018 9:28:38 AM MDT H. S. Teoh via Digitalmars-d wrote: [...] Also, if the last working compiler version is prominently displayed e.g. in the search results, it will inform people about the maintenance state of that package, which could factor in their decision to use that package or find an alternative. It will also inform people about potential breakages before they upgrade their compiler. It doesn't solve all the problems, but it does seem like a good initial low-hanging fruit that shouldn't be too hard to implement. Alternatively we can let dub call home for usage with CI systems and register it having been tested for a given compiler on a specific tag. A possibility might be to let package owners specify one of the build status badges commonly added to README files when registering the DUB package. Then display the badge in the code.dlang.org pages (home page, search result page). It would of course be better to display the latest compiler version tested, but repurposing existing badges might be simpler and provide some value until a more sophisticated scheme can be implemented. --Jon
Re: tupleof function parameters?
On Tuesday, 28 August 2018 at 06:20:37 UTC, Sebastiaan Koppe wrote: On Tuesday, 28 August 2018 at 06:11:35 UTC, Jon Degenhardt wrote: The goal is to write the argument list once and use it to create both the function and the Tuple alias. That way I could create a large number of these function / arglist tuple pairs with less brittleness. --Jon I would probably use a combination of std.traits.Parameters and std.traits.ParameterIdentifierTuple. Parameters returns a tuple of types and ParameterIdentifierTuple returns a tuple of strings. Maybe you'll need to implement a staticZip to interleave both tuples to get the result you want. (although I remember seeing one somewhere). Alex, Sebastiaan - Thanks much, this looks like it should get me what I'm looking for. --Jon
tupleof function parameters?
I'd like to create a Tuple alias representing a function's parameter list. Is there a way to do this? Here's an example creating a Tuple alias for a function's parameters by hand: import std.typecons: Tuple; bool fn(string op, int v1, int v2) { switch (op) { default: return false; case "<": return v1 < v2; case ">": return v1 > v2; } } alias fnArgs = Tuple!(string, "op", int, "v1", int, "v2"); unittest { auto args = fnArgs("<", 3, 5); assert(fn(args[])); } This is quite useful. I'm wondering if there is a way to create the 'fnArgs' alias from the definition of 'fn' without needing to manually write out the '(string, "op", int, "v1", int, "v2")' sequence by hand. Something like a 'tupleof' operation on the function parameter list. Or conversely, define the tuple and use it when defining the function. The goal is to write the argument list once and use it to create both the function and the Tuple alias. That way I could create a large number of these function / arglist tuple pairs with less brittleness. --Jon
Re: Dicebot on leaving D: It is anarchy driven development in all its glory.
On Sunday, 26 August 2018 at 05:55:47 UTC, Pjotr Prins wrote: Artem wrote Sambamba as a student https://github.com/biod/sambamba and it is now running around the world in sequencing centers. Many many CPU hours and a resulting huge carbon foot print. The large competing C++ samtools project has been trying for 8 years to catch up with an almost unchanged student project and they are still slower in many cases. [snip] Note that Artem used the GC and only took GC out for critical sections in parallel code. I don't buy these complaints about GC. The complaints about breaking code I don't see that much either. Sambamba pretty much kept compiling over the years and with LDC/LLVM latest we see a 20% perfomance increase. For free (at least from our perspective). Kudos to LDC/LLVM efforts!! This sounds very similar to my experiences with the tsv utilities, on most of the same points (development simplicity, comparative performance, GC use, LDC). Data processing apps may well be a sweet spot. See my DConf talk for an overview (https://github.com/eBay/tsv-utils/blob/master/docs/dconf2018.pdf). Though not mentioned in the talk, I also haven't had any significant issues with new compiler releases. May have be related to the type of code being written. Regarding the GC - The throughput oriented nature of data processing tools like the tsv utilities looks like a very good fit for the current GC. Applications where low GC latency is needed may have different results. It'd be great to hear an experience report from development of an application where GC was used and low GC latency was a priority. --Jon
Re: D is dead (was: Dicebot on leaving D: It is anarchy driven development in all its glory.)
On Friday, 24 August 2018 at 00:46:14 UTC, Mike Franklin wrote: It seems, from someone without much historical perspective, that Phobos was intended to be something like the .Net Framework for D. Perhaps there are a few fundamentals (std.algorithm, std.allocator, etc.) to keep, but for the others... move 'em to Dub and let the "free market" sort it out. That might work for some use cases, but not for others. For my use cases, a rock solid standard library is a basic requirement (think STL, Boost, etc). These don't normally come out of a loose knit community of individuals, there needs to be some sort of organizational presence involved to ensure quality, consistency, completeness, etc. If Phobos or an equivalent wasn't available at its present level of quality then D wouldn't be in the consideration set. On the other hand, my use-cases don't have the requirements that drive other folks towards removing dependence on druntime and similar. An individual or organization's prioritization preferences will depend on their goals. --Jon
Re: More fun with autodecoding
On Wednesday, 8 August 2018 at 21:01:18 UTC, Steven Schveighoffer wrote: Not trying to give too much away about the library I'm writing, but the problem I'm trying to solve is parsing out tokens from a buffer. I want to delineate the whole, as well as the parts, but it's difficult to get back to the original buffer once you split and slice up the buffer using phobos functions. I wonder if there are some parallels in the tsv utilities I wrote. The tsv parser is extremely simple, byLine and splitter on a char buffer. Most of the tools just iterate the split result in order, but a couple do things like operate on a subset of fields, potentially reordered. For these a separate structure is created that maps back the to original buffer to avoid copying. Likely quite simple compared to what you are doing. The csv2tsv tool may be more interesting. Parsing is relatively simple, mostly identifying field values in the context of CSV escape syntax. It's modeled as reading an infinite stream of utf-8 characters, byte-by-byte. Occasionally the bytes forming the value need to be modified due to the escape syntax, but most of the time the characters in the original buffer remain untouched and parsing is identifying the start and end positions. The infinite stream is constructed by reading fixed size blocks from the input stream and concatenating them with joiner. This eliminates the need to worry about utf-8 characters spanning block boundaries, but it comes at a cost: either write byte-at-a-time, or make an extra copy (also byte-at-a-time). Making an extra copy is faster, that what the code does. But, as a practical matter, most of the time large blocks could often be written directly from the original input buffer. If I wanted it make it faster than current I'd do this. But I don't see an easy way to do this with phobos ranges. At minimum I'd have to be able to run code when the joiner operation hits block boundaries. And it'd also be necessary to create a mapping back to the original input buffer. Autodecoding comes into play of course. Basically, splitter on char arrays is fine, but in a number of cases it's necessary to work using ubtye to avoid the performance penalty. --Jon
Re: std.experimental.collections.rcstring and its integration in Phobos
On Tuesday, 17 July 2018 at 15:21:30 UTC, Seb wrote: So we managed to revive the rcstring project and it's already a PR for Phobos: https://github.com/dlang/phobos/pull/6631 (still WIP though) The current approach in short: - uses the new @nogc, @safe and nothrow Array from the collections library (check Eduardo's DConf18 talk) - uses reference counting - _no_ range by default (it needs an explicit `.by!{d,w,}char`) (as in no auto-decoding by default) [snip] What do you think about this approach? Do you have a better idea? I don't know the goals/role rcstring is expected to play, especially wrt existing string/character facilities. Perhaps you could describe more? Strings are central to many applications, so I'm wondering about things like whether rcstring is intended as a replacement for string that would be used by most new programs, and whether applications would use arrays and ranges of char together with rcstring, or rcstring would be used for everything. Perhaps its too early for these questions, and the current goal is simpler. For example, adding a meaningful collection class that is @nogc, @safe and ref-counted that be used as a proving ground for the newer memory management facilities being developed. Such simpler goals would be quite reasonable. What's got me wondering about the larger questions are the comments about ranges and autodecoding. If rcstring is intended as a vehicle for general @nogc handling of character data and/or for reducing the impact of autodecoding, then it makes sense to consider from those perspectives. --Jon
eBay's TSV Utilities repository renamed
I've renamed the TSV Utilities Github repository from eBay/tsv-utils-dlang to eBay/tsv-utils. This is to better reflect the functional nature of the tools. Links pointing to the old github repo will be redirected to the new repo. This includes git operations like clone, etc., so Project Tester should not be affected. Let me know if any issues surface. --Jon
Re: Driving Continuous Improvement in D
On Saturday, 2 June 2018 at 07:23:42 UTC, Mike Parker wrote: In this post for the D Blog, Jack Stouffer details how dscanner is used in the Phobos development process to help improve code quality and fight entropy. The blog: https://dlang.org/blog/2018/06/02/driving-continuous-improvement-in-d/ reddit: https://www.reddit.com/r/programming/comments/8nyzmk/driving_continuous_improvement_in_d/ Nice post. I haven't tried dscanner on my code, but I plan to now. It looks like the documentation on the dscanner repo is pretty good. If you think it's ready for wider adoption, consider adding a couple lines to the blog post indicating that folks who want to try it will find instructions in the repo.
Re: Splitting up large dirty file
On Monday, 21 May 2018 at 15:00:09 UTC, Dennis wrote: I want to be convinced that Range programming works like a charm, but the procedural approaches remain more flexible (and faster too) it seems. Thanks for the example. On Monday, 21 May 2018 at 22:11:42 UTC, Dennis wrote: In this case I used drop to drop lines, not characters. The exception was thrown by the joiner it turns out. ... From the benchmarking I did, I found that ranges are easily an order of magnitude slower even with compiler optimizations: My general experience is that range programming works quite well. It's especially useful when used to do lazy processing and as a result minimize memory allocations. I've gotten quite good performance with these techniques (see my DConf talk slides: https://dconf.org/2018/talks/degenhardt.html). Your benchmarks are not against the file split case, but if you benchmarked that you may have also seen it as slow. It that case you may be hitting specific areas where there are opportunities for performance improvement in the standard library. One is that joiner is slow (PR: https://github.com/dlang/phobos/pull/6492). Another is that the write[fln] routines are much faster when operating on a single large object than many small objects. e.g. It's faster to call write[fln] with an array of 100 characters than: (a) calling it 100 times with one character; (b) calling it once, with 100 characters as individual arguments (template form); (c) calling it once with range of 100 characters, each processed one at a time. When joiner is used as in your example, you not only hit the joiner performance issue, but the write[fln] issue. This is due to something that may not be obvious at first: When joiner is used to concatenate arrays or ranges, it flattens out the array/range into a single range of elements. So, rather than writing a line at a time, you example is effectively passing a character at a time to write[fln]. So, in the file split case, using byLine in an imperative fashion as in my example will have the effect of passing a full line at a time to write[fln], rather than individual characters. Mine will be faster, but not because it's imperative. The same thing could be achieved procedurally. Regarding the benchmark programs you showed - This is very interesting. It would certainly be worth additional looks into this. One thing I wonder is if the performance penalty may be due to a lack of inlining due to crossing library boundaries. The imperative versions aren't crossing these boundaries. If you're willing, you could try adding LDC's LTO options and see what happens. There are some instructions in the release notes for LDC 1.9.0 (https://github.com/ldc-developers/ldc/releases). Make sure you use the form that includes druntime and phobos. --Jon
Re: Splitting up large dirty file
On Thursday, 17 May 2018 at 20:08:09 UTC, Dennis wrote: On Wednesday, 16 May 2018 at 15:47:29 UTC, Jon Degenhardt wrote: If you write it in the style of my earlier example and use counters and if-tests it will work. byLine by itself won't try to interpret the characters (won't auto-decode them), so it won't trigger an exception if there are invalid utf-8 characters. When printing to stdout it seems to skip any validation, but writing to a file does give an exception: ``` auto inputStream = (args.length < 2 || args[1] == "-") ? stdin : args[1].File; auto outputFile = new File("output.txt"); foreach (line; inputStream.byLine(KeepTerminator.yes)) outputFile.write(line); ``` std.exception.ErrnoException@C:\D\dmd2\windows\bin\..\..\src\phobos\std\stdio.d(2877): (No error) According to the documentation, byLine can throw an UTFException so relying on the fact that it doesn't in some cases doesn't seem like a good idea. Instead of: auto outputFile = new File("output.txt"); try: auto outputFile = File("output.txt", "w"); That works for me. The second arg ("w") opens the file for write. When I omit it, I also get an exception, as the default open mode is for read: * If file does not exist: Cannot open file `output.txt' in mode `rb' (No such file or directory) * If file does exist: (Bad file descriptor) The second error presumably occurs when writing. As an aside - I agree with one of your bigger picture observations: It would be preferable to have more control over utf-8 error handling behavior at the application level.
Re: Splitting up large dirty file
On Wednesday, 16 May 2018 at 07:06:45 UTC, Dennis wrote: On Wednesday, 16 May 2018 at 02:47:50 UTC, Jon Degenhardt wrote: Can you show the program you are using that throws when using byLine? Here's a version that only outputs the first chunk: ``` import std.stdio; import std.range; import std.algorithm; import std.file; import std.exception; void main(string[] args) { enforce(args.length == 2, "Pass one filename as argument"); auto lineChunks = File(args[1], "r").byLine.drop(4).chunks(10_000_000/10); new File("output.txt", "w").write(lineChunks.front.joiner); } ``` If you write it in the style of my earlier example and use counters and if-tests it will work. byLine by itself won't try to interpret the characters (won't auto-decode them), so it won't trigger an exception if there are invalid utf-8 characters.
Re: Splitting up large dirty file
On Tuesday, 15 May 2018 at 20:36:21 UTC, Dennis wrote: I have a file with two problems: - It's too big to fit in memory (apparently, I thought 1.5 Gb would fit but I get an out of memory error when using std.file.read) - It is dirty (contains invalid Unicode characters, null bytes in the middle of lines) I want to write a program that splits it up into multiple files, with the splits happening every n lines. I keep encountering roadblocks though: - You can't give Yes.useReplacementChar to `byLine` and `byLine` (or `readln`) throws an Exception upon encountering an invalid character. Can you show the program you are using that throws when using byLine? I tried a very simple program that reads and outputs line-by-line, then fed it a file that contained invalid utf-8. I did not see an exception. The invalid utf-8 was created by taking part of this file: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt (a commonly used file with utf-8 edge cases), plus adding a number of random hex characters, including null. I don't see exceptions thrown. The program I used: int main(string[] args) { import std.stdio; import std.conv : to; try { auto inputStream = (args.length < 2 || args[1] == "-") ? stdin : args[1].File; foreach (line; inputStream.byLine(KeepTerminator.yes)) write(line); } catch (Exception e) { stderr.writefln("Error [%s]: %s", args[0], e.msg); return 1; } return 0; }
Re: iopipe v0.0.4 - RingBuffers!
On Friday, 11 May 2018 at 15:44:04 UTC, Steven Schveighoffer wrote: On 5/10/18 7:22 PM, Steven Schveighoffer wrote: Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU grep, which has much better performance (and is 2x as fast as iopipe_search on my Linux VM, even when printing line numbers). Yeah, the MacOS default versions of the Unix text processing tools are really slow. It's worth installing the GNU versions if doing performance comparisons on MacOS, or because you work with large files. Homebrew and MacPorts both have the GNU versions. Some relevant packages: coreutils, grep, gsed (sed), gawk (awk). Most tools are in coreutils. Many will be installed with a 'g' prefix by default, leaving the existing tools in place. e.g. 'cut' will be installed as 'gcut' unless specified otherwise. --Jon
Re: Things to do in Munich
On Monday, 30 April 2018 at 19:57:10 UTC, Seb wrote: As I live in Munich and there have been a few threads about things to do in Munich, I thought I quickly share a few selected activities + current events. - over 80 museums (best ones: Museum Brandhost, Pinakothek der Moderne, Haus der Kunst, Deutsches Museum, Glyptothek, potato museum, NS- Most of the museums are closed today (public holiday). Check before you go. However, the surfers are out! —Jon
Re: Am I reading this wrong, or is std.getopt *really* this stupid?
On Saturday, 24 March 2018 at 16:11:18 UTC, Andrei Alexandrescu wrote: Anyhow. Right now the order of processing is the same as the lexical order in which flags are passed to getopt. There may be use cases for which that's the more desirable way to go about things, so if you author a PR to change the order you'd need to build an argument on why command-line order is better. FWIW the traditional POSIX doctrine makes behavior of flags independent of their order, which would imply the current choice is more natural. Several of the TSV tools I built rely on command-line order. There is an enhancement request here: https://issues.dlang.org/show_bug.cgi?id=16539. A few of the tools use a paradigm where the user is entering a series instructions on the command line, and there are times when the user entered order matters. Two general cases: * Display/output order - The tool produces delimited output, and the user wants to control the order. The order of command line options determines the order. * Short-circuiting - tsv-filter in particular allows numeric tests like less-than, but also allow the user to short-circuit the test by testing if the data contains a valid number prior to making the numeric test. This is done by evaluating the command line arguments in left-to-right order. Short-circuiting is supported the Unix `find` utility. I have used this approach for CLI tools I've written in other languages. Perl's Getopt::Long processes args in command-line, so it supports this. I considered submitting a PR to getopt to change this, but decided against it. The approach used looks like it is central to the design, and changing it in a backward compatible way would be a meaningful undertaking. Instead I wrote a cover to getopt that processes arguments in command-line order. It is here: https://github.com/eBay/tsv-utils-dlang/blob/master/common/src/getopt_inorder.d. It handles most of what std.getopt handles. The TSV utilities documentation should help illustrate these cases. tsv-filter use short circuiting: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/ToolReference.md#tsv-filter-reference. Look for "Short-circuiting expressions" toward the bottom of the section. tsv-summarize obeys the command-line order for output/display. See: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/ToolReference.md#tsv-summarize-reference. There's one other general limitation I encountered with the current compile-time approach to command-line argument processing. I couldn't find a clean way to allow it to be extended in a plug-in manner. In particular, the original goal for the tsv-summarize tool was to allow users to create custom operators. The tool has a fair number of built-in operators, like median, sum, min, max, etc. Each of these operators has a getopt arg invoking it, eg. '--median', '--sum', etc. However, it is common for people to have custom analysis needs, so allowing extension of the set would be quite useful. The code is setup to allow this. People would clone the repo, write their own operator, placed in a separate file they maintain, and rebuild. However, I couldn't figure out a clean way to allow additions to command line argument set. There may be a reasonable way and I just couldn't find it, but my current thinking is that I need to write my own command line argument handler to support this idea. I think handling command line argument processing at run-time would make this simpler, at the cost loosing some compile-time validation. --Jon
Re: Why not flag away the mistakes of the past?
On Wednesday, 7 March 2018 at 16:33:25 UTC, Seb wrote: On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt wrote: On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist wrote: [...] Auto-decoding is a significant issue for the applications I work on (search engines). There is a lot of string manipulation in these environments, and performance matters. Auto-decoding is a meaningful performance hit. Otherwise, Phobos has a very nice collection of algorithms for string manipulation. It would be great to have a way to turn auto-decoding off in Phobos. Well you can use byCodeUnit, which disables auto-decoding Though it's not well-known and rather annoying to explicitly add it almost everywhere. I looked at this once. It didn't appear to be a viable solution, though I forget the details. I can probably resurrect them if that would be helpful.
Re: Why not flag away the mistakes of the past?
On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist wrote: So i've seen on the forum over the years arguments about auto-decoding (mostly) and some other things. Things that have been considered mistakes, and cannot be corrected because of the breaking changes it would create. And I always wonder why not make a solution to the tune of a flag that makes things work as they used too, and make the new behavior default. dmd --UseAutoDecoding That way the breaking change was easily fixable, and the mistakes of the past not forever. Is it just the cost of maintenance? Auto-decoding is a significant issue for the applications I work on (search engines). There is a lot of string manipulation in these environments, and performance matters. Auto-decoding is a meaningful performance hit. Otherwise, Phobos has a very nice collection of algorithms for string manipulation. It would be great to have a way to turn auto-decoding off in Phobos. --Jon
Re: Project Highlight: The D Community Hub
On Saturday, 17 February 2018 at 12:56:34 UTC, Mike Parker wrote: In case you aren't aware of the dlang-community organization at GitHub, it's an umbrella group of contributors working to keep certain D projects alive and updated. Sebastian Wilzbach filled me in on some details for the latest Project Highlight on the blog. blog: https://dlang.org/blog/2018/02/17/project-highlight-the-d-community-hub/ reddit: https://www.reddit.com/r/programming/comments/7y6gw1/the_d_community_hub_an_umbrella_group_for_d/ Very nice article. There are more projects there than I had realized!
Re: OT: Photo of a single atom by David Nadlinger wins top prize
On Tuesday, 13 February 2018 at 23:09:07 UTC, Ali Çehreli wrote: David (aka klickverbot) is a longtime D contributor. https://www.epsrc.ac.uk/newsevents/news/single-trapped-atom-captures-science-photography-competitions-top-prize/ Ali More than cool!! Congrats David!
Re: Which language futures make D overcompicated?
On Friday, 9 February 2018 at 07:54:49 UTC, Suliman wrote: Which language futures by your opinion make D harder? For me, one of the attractive qualities of D is its relative simplicity. Key comparison points are C++, Scala, and Python. Python being the simplest, then D, not far off, with Scala and C++ being more complex. Entirely subjective, not measured in any empirical way. That said, a couple of D constructs that I personally find increases friction: * Static arrays aren't not ranges. I continually forget to slice them when I want to use them as ranges. The compiler errors are often complex template instantiation failure messages. * Template instantiation failures - It takes longer than I'd like to figure out why a template failed to instantiate. This is especially true when there are multiple overloads, each with multiple template constraints. * Auto-decoding - Mentioned by multiple people. It's mainly an issue after you've decided you need to avoid it. Figuring out how out to utilize Phobos routines without having them engage auto-decoding on your behalf is challenging. --Jon
OT: Indexing reordering in the eBay Search Engine
If anyone is interested in the type of work that goes on in my group at eBay, take a look at this blog post by one of my colleagues: https://www.ebayinc.com/stories/blogs/tech/making-e-commerce-search-faster/ It describes a 25% efficiency gain via a technique called index reordering. This is the engineering side of the work, I also work on recall and ranking. --Jon
Re: I closed a very old bug!
On Thursday, 18 January 2018 at 07:46:03 UTC, Andrei Alexandrescu wrote: There's been some discussion about what to do with issues that propose enhancements like this. We want to make them available and searchable just in case someone working on a related proposal is looking for precedent and inspiration. I was thinking of closing with REMIND or LATER. Seb is experimenting with moving the entire bug database to github issues, which may offer us more options for classification. It would make sense to separate bugs from enhancements in this regard. It's useful to record and maintain useful enhancements ideas even if they don't fit the current priorities. There are multiple ways to implement this, but it'd be most useful if the distinction between "bugs" and "enhancements" is obvious and easy to discover. --Jon
Re: TSV Utilities release with LTO and PGO enabled
On Wednesday, 17 January 2018 at 21:49:52 UTC, Johan Engelen wrote: On Wednesday, 17 January 2018 at 04:37:04 UTC, Jon Degenhardt wrote: Clearly personal judgment played a role. However, the tools are reasonably task focused, and I did take basic steps to ensure the benchmark data and tests were separate from the training data/tests. For these reasons, my confidence is good that the results are reasonable and well founded. Great, thanks for the details, I agree. Hope it's useful for others to see these details. Thanks Johan, much appreciated. :) (btw, did you also check the performance gains when using the profile of the benchmark itself, to learn about the upper-bound of PGO for your program?) I'll merge the IR PGO addition into LDC master soon. Don't know what difference it'll make. No, I didn't do an upper-bounds check, that's a good idea. I plan to test the IR based PGO when it's available, I'll run an upper-bounds check as part of it.
Re: TSV Utilities release with LTO and PGO enabled
On Tuesday, 16 January 2018 at 22:04:52 UTC, Johan Engelen wrote: Because PGO optimizes for the given profile, it would help a lot if you clarified how you do your PGO benchmarking. What kind of test load profile you used for optimization and what test load you use for the time measurement. The profiling used is checked into the repo and run as part of a PGO build, it is available for inspection. The benchmarks used for deltas are also documented, they the ones used in the benchmark comparison to similar tools done in March 2017. This report is in the repo (https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md). However, it's hard to imagine anyone perusing the repo for this stuff, so I'll try to summarize what I did below. Benchmarks - Six different tests of rather different but common operations run on large data files. The six tests were chosen because for each I was able to find at least three other tools, written in native compiled languages, with similar functionality. There are other valuable benchmarks, but I haven't published them. Profiling - Profiling was developed separately for each tool. For each I generated several data files with data representative of typical uses cases. Generally numeric or text data in several forms and distributions. The data was unrelated to the data used in benchmarks, which is from publicly available machine learning data sets. However, personal judgement was used in the generation of the data sets, so it's not free from bias. After generating the data, I generated a set of run options specific to each tool. As an example, tsv-filter selects data file lines based on various numeric and text criteria (e.g. less-than). There are a bit over 50 comparison operations, plus a few meta operations. The profiling runs ensure all the operations are run at least once, but that the most important overweighted. The ldc.profile.resetAll call was used to exclude all the initial setup code (command line argument processing). This was nice because it meant the data files could be small relative to real-world sets, and it runs fast enough to do at part of the build step (ie. on Travis-CI). Look at https://github.com/eBay/tsv-utils-dlang/tree/master/tsv-filter/profile_data to see a concrete example (tsv-filter). In that directory are five data files and a shell script that runs the commands and collects the data. This was done for four of the tools covering five of the benchmarks. I skipped one of the tools (tsv-join), as it's harder to come up with a concise set of profile operations for it. I then ran the standard benchmarks I usually report on in various D venues. Clearly personal judgment played a role. However, the tools are reasonably task focused, and I did take basic steps to ensure the benchmark data and tests were separate from the training data/tests. For these reasons, my confidence is good that the results are reasonable and well founded. --Jon
Re: TSV Utilities release with LTO and PGO enabled
On Tuesday, 16 January 2018 at 00:19:24 UTC, Martin Nowak wrote: On Sunday, 14 January 2018 at 23:18:42 UTC, Jon Degenhardt wrote: Combined, LTO and PGO resulted in performance improvements greater than 25% on three of my standard six benchmarks, and five of the six improved at least 8%. Yay, I'm usually seeing double digit improvements for PGO alone, and single digit improvements for LTO. Meaning PGO has more effect even though LTO seems to be the more hyped one. Have you bothered benchmarking them separately? Last spring I made a few quick tests of both separately. That was just against the app code, without druntime/phobos. Saw some benefit from LTO, mainly one of the tools, and not much from PGO. More recently I tried LTO standalone and LTO plus PGO, both against app code and druntime/phobos, but not PGO standalone. The LTO benchmarks are here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/dlang-meetup-14dec2017.pdf. I've haven't published the LTO + PGO benchmarks. The takeaway from my tests is that LTO and PGO will benefit different apps differently, perhaps in ways not easily predicted. One of my tools benefited primarily from PGO, two primarily from LTO, and one materially from both. So, it is worth trying both. For both, the big win was from optimizing across app code and libs (druntime/phobos in my case). It'd be interesting to see if other apps see similar behavior, either with phobos/druntime or other libraries, perhaps libraries from dub dependencies.
TSV Utilities release with LTO and PGO enabled
I just released a new version of eBay's TSV Utilities. The cool thing about the release is not about changes in toolkit, but that it was possible to build everything using LDC's support for Link Time Optimization (LTO) and Profile Guided Optimization (PGO). This includes running the optimizations on both the application code and the D standard libraries (druntime and phobos). Further, it was all doable on Travis-CI (Linux and MacOS), including building release binaries available from the GitHub release page. Combined, LTO and PGO resulted in performance improvements greater than 25% on three of my standard six benchmarks, and five of the six improved at least 8%. Release info: https://github.com/eBay/tsv-utils-dlang/releases/tag/v1.1.16
Re: DLang docker images for CircleCi 2.0
On Wednesday, 3 January 2018 at 13:12:48 UTC, Seb wrote: tl;dr: you can now use special D docker images for CircleCi 2.0 [snip PS: I'm aware of Stefan Rohe's great D Docker images [1], but this Docker image is built on top of the specialized CircleCi image (e.g. for their SSH login). One useful characteristic of Stefan's images is that the Dockerhub pages include the Dockerfile and github repository links. I don't know what it takes to include them. It does make it easier to see exactly what the configuration is, find the repo, and even create PRs against them. Would be useful if they can be added to the CircleCI image pages. My interest in this case - I use Stefan's LDC image in Travis-CI builds. Building the runtime libraries with LTO/PGO requires the ldc-build-runtime tool, which in turn requires a few additional things in the docker image, like cmake or ninja. I was interested if they might have been included in the CircleCI images as well. (Doesn't appear so.)
Re: Article: Finding memory bugs in D code with AddressSanitizer
On Monday, 25 December 2017 at 17:03:37 UTC, Johan Engelen wrote: I've been writing this article since August, and finally found some time to finish it: http://johanengelen.github.io/ldc/2017/12/25/LDC-and-AddressSanitizer.html "LDC comes with improved support for Address Sanitizer since the 1.4.0 release. Address Sanitizer (ASan) is a runtime memory write/read checker that helps discover and locate memory access bugs. ASan is part of the official LDC release binaries; to use it you must build with -fsanitize=address. In this article, I’ll explain how to use ASan, what kind of bugs it can find, and what bugs it will be able to find in the (hopefully near) future." Nice article. Main question / comment is about the need for blacklisting D standard libraries (druntime/phobos). If someone wants to try ASan out on their own code, can they start by ignoring the D standard libraries? And, for programs that use druntime/phobos, will this be effective? If I understand the post, the answer is "yes", but I think it could be more explicit. Second comment is related - If the reader was to try instrumenting druntime/phobos along with their own code, how much effort should be expected to correctly blacklist druntime/phobos code? Would many programs have smooth sailing if they took the blacklist published in the post? Or is this early stage enough that some real effort should be expected? Also, if the blacklist file in the post represents a meaningful starting point, perhaps it makes sense to check it in and distribute it. This would provide a place for contributors to start making improvements.
Re: Silicon Valley D Meetup - December 14, 2017 - "Experimenting with Link Time Optimization" by Jon Degenhardt
On Saturday, 16 December 2017 at 11:52:37 UTC, Johan Engelen wrote: Clearly very interested in what your PGO testing will show. :-) Early returns on adding PGO on top of LTO (first five benchmarks in the slide deck, tsv-join not tested): * Two meaningful improvements: - csv2tsv: Linux: 8%; macOS: 33% - tsv-summarize: Linux: 6%; macOS: 11% * Minor improvements on the other three benchmarks (< 5%) Overall, for LDC 1.5, the improvements going from a normal optimized build to one combining LTO and PGO ranged from on 8-45% Linux, and 6-57% on macOS. (First five benchmarks, excluding tsv-join). Impressive! --Jon
Re: Silicon Valley D Meetup - December 14, 2017 - "Experimenting with Link Time Optimization" by Jon Degenhardt
On Saturday, 16 December 2017 at 11:52:37 UTC, Johan Engelen wrote: On Friday, 15 December 2017 at 03:08:35 UTC, Ali Çehreli wrote: This should be live now: http://youtu.be/e05QvoKy_8k Great! I've added some comments there, pasted here: Fantastic feedback! Fills in some really important details. Can't wait to see the results of LTO on Weka.io's (LARGE) applications. Work in progress...! Agreed. It'd be great to see the experience of a few more apps. Could you add the reference links in the comment section there too? (can't click on blue links in the video ;-) Done. Thanks for pointing this out. I also updated the posted slide deck so that the hyperlinks work after downloading it. (They still aren't clickable in the GitHub inline viewer.) Clearly very interested in what your PGO testing will show. :-) Yes, should be interesting. Promising results in one benchmark. And sigh, I forgot to mention the opportunity you mentioned for someone to participate: Adding LLVM's IR-level PGO to the LDC compiler. Sounds pretty cool.
Re: Silicon Valley D Meetup - December 14, 2017 - "Experimenting with Link Time Optimization" by Jon Degenhardt
On Friday, 15 December 2017 at 03:08:35 UTC, Ali Çehreli wrote: This should be live now: http://youtu.be/e05QvoKy_8k Ali On 11/21/2017 11:58 AM, Ali Çehreli wrote: Meetup page: https://www.meetup.com/D-Lang-Silicon-Valley/events/245288287/ LDC[1], the LLVM-based D compiler, has been adding Link Time Optimization capabilities over the last several releases. [...] This talk will look at the results of applying LTO to one set of applications, eBay's TSV utilities[2]. [...] Jon Degenhardt is a member of eBay's Search Science team. [...] D quickly became his favorite programming language, one he uses whenever he can. Ali [1] https://github.com/ldc-developers/ldc#ldc--the-llvm-based-d-compiler [2] https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ Slides from the talk: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/dlang-meetup-14dec2017.pdf
Re: What's the proper way to use std.getopt?
On Monday, 11 December 2017 at 20:58:25 UTC, Jordi Gutiérrez Hermoso wrote: What's the proper style, then? Can someone show me a good example of how to use getopt and the docstring it automatically generates? The command line tools I published use the approach described in a number of the replies, but with a tad more structure. It's hardly perfect, but may be useful if you want more examples. See: https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/src/tsv-sample.d. See the main() routine and the TsvSampleOptions struct. Most of the tools have a similar pattern. --Jon
Re: Thoughts about D
On Wednesday, 29 November 2017 at 16:57:36 UTC, H. S. Teoh wrote: While generally I would still use fullblown D rather than BetterC for my projects, the bloat from druntime/phobos does still bother me at the back of my mind. IIRC, the Phobos docs used to state that the philosophy for Phobos is pay-as-you-go. As in, if you don't use feature X, the code and associated data that implements feature X shouldn't even appear in the executable. It seems that we have fallen away from that for a while now. Perhaps it's time to move D back in that direction. If there specific apps where druntime and/or phobos bloat is thought to be too high, it might be worth trying the new LDC support for building a binary with druntime and phobos compiled with LTO (Link Time Optimization). I saw reduced binary sizes on my apps, it'd be interesting to hear other experiences.
Re: Thoughts about D
On Monday, 27 November 2017 at 00:14:40 UTC, IM wrote: I'm a full-time C++ software engineer in Silicon Valley. I've been learning D and using it in a couple of personal side projects for a few months now. First of all, I must start by saying that I like D, and wish to use it everyday. I'm even considering to donate to the D foundation. However, some of D features and design decisions frustrates me a lot, and sometimes urges me to look for an alternative. I'm here not to criticize, but to channel my frustrations to whom it may concern. I want D to become better and more widely used. I'm sure many others might share with me some of the following points: Forum discussions are valuable venue. Since you are in Silicon Valley, you might also consider attending one of the Silicon Valley D meetups (https://www.meetup.com/D-Lang-Silicon-Valley). It's hard to beat face-to-face conversations with other developers to get a variety of perspectives. The ultimate would be DConf, if you can manage to attend.