Re: to compose or hack?

2021-07-06 Thread Jon Degenhardt via Digitalmars-d-learn
On Wednesday, 7 July 2021 at 01:44:20 UTC, Steven Schveighoffer 
wrote:
This is pretty minimal, but does what I want it to do. Is it 
ready for inclusion in Phobos? Not by a longshot! A truly 
generic interleave would properly forward everything else that 
the range supports (like `length`, `save`, etc).


But it got me thinking, how often do people roll their own vs. 
trying to compose using existing Phobos nuggets? I found this 
pretty satisfying, even if I didn't test it to death and maybe 
I use it only in one place. Do you find it difficult to use 
Phobos in a lot of situations to compose your specialized 
ranges?


I try to compose using existing Phobos facilities, but don't 
hesitate to write my own ranges. The reasons are usually along 
the lines you describe.


For one, range creation is easy in D, consistent with the pro/con 
tradeoffs described in the thread/talk [Iterator and Ranges: 
Comparing C++ to D to 
Rust](https://forum.dlang.org/thread/diexjstekiyzgxlic...@forum.dlang.org). Another is that if application/task specific logic is involved, it is often simpler/faster to just incorporate it into the range rather than figure out how to factor it out of the more general range. Especially if the range is not going to be used much.


--Jon



Re: Need for speed

2021-04-01 Thread Jon Degenhardt via Digitalmars-d-learn

On Thursday, 1 April 2021 at 19:55:05 UTC, H. S. Teoh wrote:
On Thu, Apr 01, 2021 at 07:25:53PM +, matheus via 
Digitalmars-d-learn wrote: [...]
Since this is a "Learn" part of the Foruam, be careful with 
"-boundscheck=off".


I mean for this little snippet is OK, but for a other projects 
this my be wrong, and as it says here: 
https://dlang.org/dmd-windows.html#switch-boundscheck


"This option should be used with caution and as a last resort 
to improve performance. Confirm turning off @safe bounds 
checks is worthwhile by benchmarking."

[...]

It's interesting that whenever a question about D's performance 
pops up in the forums, people tend to reach for optimization 
flags.  I wouldn't say it doesn't help; but I've found that 
significant performance improvements can usually be obtained by 
examining the code first, and catching common newbie mistakes.  
Those usually account for the majority of the observed 
performance degradation.


Only after the code has been cleaned up and obvious mistakes 
fixed, is it worth reaching for optimization flags, IMO.


This is my experience as well, and not just for D. Pick good 
algorithms and pay attention to memory allocation. Don't go crazy 
on the latter. Many people try to avoid GC at all costs, but I 
don't usually find it necessary to go quite that far. Very often 
simply reusing already allocated memory does the trick. The blog 
post I wrote a few years ago focuses on these ideas: 
https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/


--Jon




Re: Trying to reduce memory usage

2021-02-22 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 23 February 2021 at 00:08:40 UTC, tsbockman wrote:
On Friday, 19 February 2021 at 00:13:19 UTC, Jon Degenhardt 
wrote:
It would be interesting to see how the performance compares to 
tsv-uniq 
(https://github.com/eBay/tsv-utils/tree/master/tsv-uniq). The 
prebuilt binaries turn on all the optimizations 
(https://github.com/eBay/tsv-utils/releases).


My program (called line-dedup below) is modestly faster than 
yours, with the gap gradually widening as files get bigger. 
Similarly, when not using a memory-mapped scratch file, my 
program is modestly less memory hungry than yours, with the gap 
gradually widening as files get bigger.


In neither case is the difference very exciting though; the 
real benefit of my algorithm is that it can process files too 
large for physical memory. It might also handle frequent hash 
collisions better, and could be upgraded to handle huge numbers 
of very short lines efficiently.


Thanks for running the comparison! I appreciate seeing how other 
implementations compare.


I'd characterize the results a differently though. Based on the 
numbers, line-dedup is materially faster than tsv-uniq, at least 
on the tests run. To your point, it may not make much practical 
difference on data sets that fit in memory. tsv-uniq is fast 
enough for most needs. But it's still a material performance 
delta. Nice job!


I agree also that the bigger pragmatic benefit is fast processing 
of files much larger than will fit in memory. There are other 
useful problems like this. One I often need is creating a random 
weighted ordering. Easy to do for data sets that fit in memory, 
but hard to do fast for data sets that do not.


--Jon




Re: Trying to reduce memory usage

2021-02-18 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 17 February 2021 at 04:10:24 UTC, tsbockman wrote:
I spent some time experimenting with this problem, and here is 
the best solution I found, assuming that perfect de-duplication 
is required. (I'll put the code up on GitHub / dub if anyone 
wants to have a look.)


It would be interesting to see how the performance compares to 
tsv-uniq 
(https://github.com/eBay/tsv-utils/tree/master/tsv-uniq). The 
prebuilt binaries turn on all the optimizations 
(https://github.com/eBay/tsv-utils/releases).


tsv-uniq wasn't included in the different comparative benchmarks 
I published, but I did run my own benchmarks and it holds up 
well. However, it should not be hard to beat it. What might be 
more interesting is what the delta is.


tsv-uniq is using the most straightforward approach of popping 
things into an associate array. No custom data structures. Enough 
memory is required to hold all the unique keys in memory, so it 
won't handle arbitrarily large data sets. It would be interesting 
to see how the straightforward approach compares with the more 
highly tuned approach.


--Jon



Re: std.algorithm.splitter on a string not always bidirectional

2021-01-22 Thread Jon Degenhardt via Digitalmars-d-learn
On Friday, 22 January 2021 at 17:29:08 UTC, Steven Schveighoffer 
wrote:

On 1/22/21 11:57 AM, Jon Degenhardt wrote:


I think the idea is that if a construct like 
'xyz.splitter(args)' produces a range with the sequence of 
elements {"a", "bc", "def"}, then 'xyz.splitter(args).back' 
should produce "def". But, if finding the split points 
starting from the back results in something like {"f", "de", 
"abc"} then that relationship hasn't held, and the results are 
unexpected.


But that is possible with all 3 splitter variants. Why is one 
allowed to be bidirectional and the others are not?


I'm not defending it, just explaining what I believe the thinking 
was based on the examination I did. It wasn't just looking at the 
code, there was a discussion somewhere. A forum discussion, PR 
discussion, bug or code comments. Something somewhere, but I 
don't remember exactly.


However, to answer your question - The relationship described is 
guaranteed if the basis for the split is a single element. If the 
range is a string, that's a single 'char'. If the range is 
composed of integers, then a single integer. Note that if the 
basis for the split is itself a range, then the relationship 
described is not guaranteed.


Personally, I can see a good argument that bidirectionality 
should not be supported in any of these cases, and instead force 
the user to choose between eager splitting or reversing the range 
via retro. For the common case of strings, the further argument 
could be made that the distinction between char and dchar is 
another point of inconsistency.


Regardless whether the choices made were the best choices, there 
was some thinking that went into it, and it is worth 
understanding the thinking when considering changes.


--Jon



Re: std.algorithm.splitter on a string not always bidirectional

2021-01-22 Thread Jon Degenhardt via Digitalmars-d-learn
On Friday, 22 January 2021 at 14:14:50 UTC, Steven Schveighoffer 
wrote:

On 1/22/21 12:55 AM, Jon Degenhardt wrote:
On Friday, 22 January 2021 at 05:51:38 UTC, Jon Degenhardt 
wrote:
On Thursday, 21 January 2021 at 22:43:37 UTC, Steven 
Schveighoffer wrote:

auto sp1 = "a|b|c".splitter('|');

writeln(sp1.back); // ok

auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v));

writeln(sp2.back); // error, not bidirectional

Why? is it an oversight, or is there a good reason for it?



I believe the reason is two-fold. First, splitter is lazy. 
Second, the range splitting is defined in the forward 
direction, not the reverse direction. A bidirectional range 
is only supported if it is guaranteed that the splits will 
occur at the same points in the range when run in either 
direction. That's why the single element delimiter is 
supported. Its clearly the case for the predicate function in 
your example. If that's known to be always true then perhaps 
it would make sense to enhance splitter to generate 
bidirectional results in this case.




Note that the predicate might use a random number generator to 
pick the split points. Even for same sequence of random 
numbers, the split points would be different if run from the 
front than if run from the back.


I think this isn't a good explanation.

All forms of splitter accept a predicate (including the one 
which supports a bi-directional result). Many other phobos 
algorithms that accept a predicate provide bidirectional 
support. The splitter result is also a forward range (which 
makes no sense in the context of random splits).


Finally, I'd suggest that even if you split based on a subrange 
that is also bidirectional, it doesn't make sense that you 
couldn't split backwards based on that. Common sense says a 
range split on substrings is the same whether you split it 
forwards or backwards.


I can do this too (and in fact I will, because it works, even 
though it's horrifically ugly):


auto sp3 = "a.b|c".splitter!((c, unused) => 
!isAlphaNum(c))('?');


writeln(sp3.back); // ok

Looking at the code, it looks like the first form of spltter 
uses a different result struct than the other two (which have a 
common implementation). It just needs cleanup.


-Steve


I think the idea is that if a construct like 'xyz.splitter(args)' 
produces a range with the sequence of elements {"a", "bc", 
"def"}, then 'xyz.splitter(args).back' should produce "def". But, 
if finding the split points starting from the back results in 
something like {"f", "de", "abc"} then that relationship hasn't 
held, and the results are unexpected.


Note that in the above example, 'xyz.retro.splitter(args)' might 
produce {"f", "ed", "cba"}, so again not the same.


Another way to look at it: If split (eager) took a predicate, 
that 'xyz.splitter(args).back' and 'xyz.split(args).back' should 
produce the same result. But they will not with the example given.


I believe these consistency issues are the reason why the 
bidirectional support is limited.


Note: I didn't design any of this, but I did redo the examples in 
the documentation at one point, which is why I looked at this.


--Jon


Re: std.algorithm.splitter on a string not always bidirectional

2021-01-21 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 22 January 2021 at 05:51:38 UTC, Jon Degenhardt wrote:
On Thursday, 21 January 2021 at 22:43:37 UTC, Steven 
Schveighoffer wrote:

auto sp1 = "a|b|c".splitter('|');

writeln(sp1.back); // ok

auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v));

writeln(sp2.back); // error, not bidirectional

Why? is it an oversight, or is there a good reason for it?

-Steve


I believe the reason is two-fold. First, splitter is lazy. 
Second, the range splitting is defined in the forward 
direction, not the reverse direction. A bidirectional range is 
only supported if it is guaranteed that the splits will occur 
at the same points in the range when run in either direction. 
That's why the single element delimiter is supported. Its 
clearly the case for the predicate function in your example. If 
that's known to be always true then perhaps it would make sense 
to enhance splitter to generate bidirectional results in this 
case.


--Jon


Note that the predicate might use a random number generator to 
pick the split points. Even for same sequence of random numbers, 
the split points would be different if run from the front than if 
run from the back.


Re: std.algorithm.splitter on a string not always bidirectional

2021-01-21 Thread Jon Degenhardt via Digitalmars-d-learn
On Thursday, 21 January 2021 at 22:43:37 UTC, Steven 
Schveighoffer wrote:

auto sp1 = "a|b|c".splitter('|');

writeln(sp1.back); // ok

auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v));

writeln(sp2.back); // error, not bidirectional

Why? is it an oversight, or is there a good reason for it?

-Steve


I believe the reason is two-fold. First, splitter is lazy. 
Second, the range splitting is defined in the forward direction, 
not the reverse direction. A bidirectional range is only 
supported if it is guaranteed that the splits will occur at the 
same points in the range when run in either direction. That's why 
the single element delimiter is supported. Its clearly the case 
for the predicate function in your example. If that's known to be 
always true then perhaps it would make sense to enhance splitter 
to generate bidirectional results in this case.


--Jon


Re: Why is BOM required to use unicode in tokens?

2020-09-15 Thread Jon Degenhardt via Digitalmars-d-learn
On Tuesday, 15 September 2020 at 14:59:03 UTC, Steven 
Schveighoffer wrote:

On 9/15/20 10:18 AM, James Blachly wrote:
What will it take (i.e. order of difficulty) to get this fixed 
-- will merely a bug report (and PR, not sure if I can tackle 
or not) do it, or will this require more in-depth discussion 
with compiler maintainers?


I'm thinking your issue will not be fixed (just like we don't 
allow $abc to be an identifier). But the spec can be fixed to 
refer to the correct standards.


Looks like it has to do with the '∂' character. But non-ascii 
alphabetic characters work generally.


# The 'Ш' and 'ä' characters are fine.
$ echo $'import std.stdio; void Шä() { writeln("Hello World!"); } 
void main() { Шä(); }' | dmd -run -

Hello World!

# But not '∂'
$ echo $'import std.stdio; void x∂() { writeln("Hello World!"); } 
void main() { x∂(); }' | dmd -run -

__stdin.d(1): Error: char 0x2202 not allowed in identifier
__stdin.d(1): Error: character 0x2202 is not a valid token
__stdin.d(1): Error: char 0x2202 not allowed in identifier
__stdin.d(1): Error: character 0x2202 is not a valid token

However, 'Ш' and 'ä' satisfy the definition of a Unicode letter, 
'∂' does not. (Using D's current Unicode definitions). I'll use 
tsv-filter (from tsv-utils) to show this rather than writing out 
the full D code. But, this uses std.regex.matchFirst().


# The input
$ echo $'x\n∂\nШ\nä'
x
∂
Ш
ä

The input filtered by Unicode letter '\p{L}'
$ echo $'x\n∂\nШ\nä' | tsv-filter --regex 1:'^\p{L}$'
x
Ш
ä

The spec can be made more clear and correct. But if a "universal 
alpha" is essentially about Unicode letters you might be looking 
for a change in the spec to use the symbol chosen.


--Jon


Re: Why is BOM required to use unicode in tokens?

2020-09-15 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 15 September 2020 at 02:23:31 UTC, Paul Backus wrote:
On Tuesday, 15 September 2020 at 01:49:13 UTC, James Blachly 
wrote:
I wish to write a function including ∂x and ∂y (these are 
trivial to type with appropriate keyboard shortcuts - alt+d on 
Mac), but without a unicode byte order mark at the beginning 
of the file, the lexer rejects the tokens.


It is not apparently easy to insert such marks (AFAICT no 
common tool does this specifically), while other languages 
work fine (i.e., accept unicode in their source) without it.


Is there a downside to at least presuming UTF-8?


According to the spec [1] this should Just Work. I'd recommend 
filing a bug.


[1] https://dlang.org/spec/lex.html#source_text


Under the identifiers section 
(https://dlang.org/spec/lex.html#identifiers) it describes 
identifiers as:


Identifiers start with a letter, _, or universal alpha, and are 
followed by any number of letters, _, digits, or universal 
alphas. Universal alphas are as defined in ISO/IEC 9899:1999(E) 
Appendix D of the C99 Standard.


I was unable to find the definition of a "universal alpha", or 
whether that includes non-ascii alphabetic characters.


Re: Install multiple executables with DUB

2020-09-04 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 4 September 2020 at 07:27:33 UTC, glis-glis wrote:
On Thursday, 3 September 2020 at 14:34:48 UTC, Jacob Carlborg 
wrote:
Oh, multiple binaries, I missed that. You can try to add 
multiple configurations [1]. Or if you have executables 
depending on only one source file, you can use single-file 
packages [2].


Thanks, but this still means I would have to write an 
install-script running


`dub build --single`

on each script, right?
I looked at tsv-utils [1] which seems to be a similar use-case 
as mine, and they declare each tool as a subpackage. The main 
package runs a d-file called `dub_build.d` which compiles all 
subpackages. Fells like an overkill to me, I'll probably just 
stick to a makefile.



[1] 
https://github.com/eBay/tsv-utils/blob/master/docs/AboutTheCode.md#building-and-makefile


The `dub_build.d` is so that people can use `$ dub fetch` to 
download and build the tools with `$ dub run`, from 
code.dlang.org. dub fetch/run is the typical dub sequence. But 
it's awkward. And it geared toward users that have a D compiler 
plus dub already installed. For building your own binaries you 
might as well use `make`. However, if you decide to add your 
tools to the public dub package registry you might consider the 
technique.


My understanding is that the dub developers recognize that 
multiple binaries are inconvenient at present and have ideas on 
improvements. Having a few more concrete use cases might help 
nail down the requirements.


The tsv-utils directory layout may be worth a look. It's been 
pretty successful for multiple binaries in a single repo with 
some shared code. (Different folks made suggestions leading to 
this structure.) It works for both make and dub, and works well 
with other tools, like dlpdocs (Adam Ruppe's doc generator). The 
tsv-utils `make` setup is quite messy at this point, you can 
probably do quite a bit better.


--Jon


Re: How to get the element type of an array?

2020-08-25 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 25 August 2020 at 15:02:14 UTC, FreeSlave wrote:
On Tuesday, 25 August 2020 at 03:41:06 UTC, Jon Degenhardt 
wrote:
What's the best way to get the element type of an array at 
compile time?


Something like std.range.ElementType except that works on any 
array type. There is std.traits.ForeachType, but it wasn't 
clear if that was the right thing.


--Jon


Why not just use typeof(a[0])

It does not matter if array is empty or not. Typeof does not 
actually evaluate its expression, just the type.


Wow, yet another way that should have been obvious! Thanks!

--Jon


Re: How to get the element type of an array?

2020-08-25 Thread Jon Degenhardt via Digitalmars-d-learn
On Tuesday, 25 August 2020 at 12:50:35 UTC, Steven Schveighoffer 
wrote:
The situation is still confusing though. If only 
'std.range.ElementType' is imported, a static array does not 
have a 'front' member, but ElementType still gets the correct 
type. (This is where the documentation says it'll return void.)


You are maybe thinking of how C works? D imports are different, 
the code is defined the same no matter how it is imported. 
*your* module cannot see std.range.primitives.front, but the 
range module itself can see that UFCS function.


This is a good characteristic. But the reason it surprised me was 
that I expected to be able to manually expand the ElementType (or 
ElementEncodingType) template see the results of the expressions 
it uses.


   template ElementType(R)
   {
   static if (is(typeof(R.init.front.init) T))
   alias ElementType = T;
   else
   alias ElementType = void;
   }

So, yes, I was expecting this to behave like an inline code 
expansion.


Yesterday I was doing that for 'hasSlicing', which has a more 
complicated set of tests. I wanted to see exactly which 
expression in 'hasSlicing' was causing it to return false for a 
struct I wrote. (Turned out to be a test for 'length'.)


I'll have to be more careful about this.


Re: How to get the element type of an array?

2020-08-25 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 25 August 2020 at 05:02:46 UTC, Basile B. wrote:
On Tuesday, 25 August 2020 at 03:41:06 UTC, Jon Degenhardt 
wrote:
What's the best way to get the element type of an array at 
compile time?


Something like std.range.ElementType except that works on any 
array type. There is std.traits.ForeachType, but it wasn't 
clear if that was the right thing.


--Jon


I'm curious to know what are the array types that were not 
accepted by ElementType ( or ElementEncodingType ) ?


Interesting. I need to test static arrays. In fact 'ElementType' 
does work with static arrays. Which is likely what you expected.


I assumed ElementType would not work, because static arrays don't 
satisfy 'isInputRange', and the documentation for ElementType 
says:


The element type is determined as the type yielded by r.front 
for an object r of type R. [...] If R doesn't have front, 
ElementType!R is void.


But, if std.range is imported, a static array does indeed get a 
'front' member. It doesn't satisfy isInputRange, but it does have 
a 'front' element.


The situation is still confusing though. If only 
'std.range.ElementType' is imported, a static array does not have 
a 'front' member, but ElementType still gets the correct type. 
(This is where the documentation says it'll return void.)


--- Import std.range ---
@safe unittest
{
import std.range;

ubyte[10] staticArray;
ubyte[] dynamicArray = new ubyte[](10);

static assert(is(ElementType!(typeof(staticArray)) == ubyte));
static assert(is(ElementType!(typeof(dynamicArray)) == 
ubyte));


// front is available
static assert(__traits(compiles, staticArray.front));
static assert(__traits(compiles, dynamicArray.front));

static assert(is(typeof(staticArray.front) == ubyte));
static assert(is(typeof(dynamicArray.front) == ubyte));
}

--- Import std.range.ElementType ---
@safe unittest
{
import std.range : ElementType;

ubyte[10] staticArray;
ubyte[] dynamicArray = new ubyte[](10);

static assert(is(ElementType!(typeof(staticArray)) == ubyte));
static assert(is(ElementType!(typeof(dynamicArray)) == 
ubyte));


// front is not available
static assert(!__traits(compiles, staticArray.front));
static assert(!__traits(compiles, dynamicArray.front));

static assert(!is(typeof(staticArray.front) == ubyte));
static assert(!is(typeof(dynamicArray.front) == ubyte));
}

This suggests the documentation for ElementType not quite correct.



Re: How to get the element type of an array?

2020-08-24 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 25 August 2020 at 04:36:56 UTC, H. S. Teoh wrote:

[...]


Harry Gillanders, H.S. Teoh,

Thank you both for the quick replies. Both methods address my 
needs. Very much appreciated, I was having trouble figuring this 
one out.


--Jon



How to get the element type of an array?

2020-08-24 Thread Jon Degenhardt via Digitalmars-d-learn
What's the best way to get the element type of an array at 
compile time?


Something like std.range.ElementType except that works on any 
array type. There is std.traits.ForeachType, but it wasn't clear 
if that was the right thing.


--Jon


Re: getopt Basic usage

2020-08-15 Thread Jon Degenhardt via Digitalmars-d-learn

On Saturday, 15 August 2020 at 04:09:19 UTC, James Gray wrote:
I am trying to use getopt and would not like the program to 
throw an unhandled exception when parsing command line options. 
Is the following, adapted from the first example in the getopt 
documentation, a reasonable approach?


I use the approach you showed, except for writing errors to 
stderr and returning an exit status. This has worked fine. An 
example: 
https://github.com/eBay/tsv-utils/blob/master/number-lines/src/tsv_utils/number-lines.d#L48


Re: Reading from stdin significantly slower than reading file directly?

2020-08-13 Thread Jon Degenhardt via Digitalmars-d-learn
On Thursday, 13 August 2020 at 14:41:02 UTC, Steven Schveighoffer 
wrote:
But for sure, reading from stdin doesn't do anything different 
than reading from a file if you are using the File struct.


A more appropriate test might be using the shell to feed the 
file into the D program:


dprogram < FILE

Which means the same code runs for both tests.


Indeed, using the 'prog < file' approach rather than 'cat file | 
prog' indeed removes any distinction for 'tsv-select'. 
'tsv-select' uses File.rawRead rather than File.byLine.




Re: Reading from stdin significantly slower than reading file directly?

2020-08-13 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 12 August 2020 at 22:44:44 UTC, methonash wrote:

Hi,

Relative beginner to D-lang here, and I'm very confused by the 
apparent performance disparity I've noticed between programs 
that do the following:


1) cat some-large-file | D-program-reading-stdin-byLine()

2) D-program-directly-reading-file-byLine() using File() struct

The D-lang difference I've noticed from options (1) and (2) is 
somewhere in the range of 80% wall time taken (7.5s vs 4.1s), 
which seems pretty extreme.


I don't know enough details of the implementation to really 
answer the question, and I expect it's a bit complicated.


However, it's an interesting question, and I have relevant 
programs and data files, so I tried to get some actuals.


The tests I ran don't directly answer the question posed, but may 
be a useful proxy. I used Unix 'cut' (latest GNU version) and 
'tsv-select' from the tsv-utils package 
(https://github.com/eBay/tsv-utils). 'tsv-select' is written in 
D, and works like 'cut'. 'tsv-select' reads from stdin or a file 
via a 'File' struct. It's not using the built-in 'byLine' member 
though, it uses a version of 'byLine' that includes some 
additional buffering. Both stdin and a file system file are read 
this way.


I used a file from the google ngram collection 
(http://storage.googleapis.com/books/ngrams/books/datasetsv2.html) and the file TREE_GRM_ESTN.csv from https://apps.fs.usda.gov/fia/datamart/CSV/datamart_csv.html, converted to a tsv file.


The ngram file is a narrow file (21 bytes/line, 4 columns), the 
TREE file is wider (206 bytes/line, 49 columns). In both cases I 
cut the 2nd and 3rd columns. This tends to focus processing on 
input rather than processing and output. I also timed 'wc -l' for 
another data point.


I ran the benchmarks 5 times each way and recorded the median 
time below. Machine used is a MacMini (so Mac OS) with 16 GB RAM 
and SSD drives. The numbers are very consisent for this test on 
this machine. Differences in the reported times are real deltas, 
not system noise. The commands timed were:


* bash -c 'tsv-select -f 2,3 FILE > /dev/null'
* bash -c 'cat FILE | tsv-select -f 2,3 > /dev/null'
* bash -c 'gcut -f 2,3 FILE > /dev/null'
* bash -c 'cat FILE | gcut -f 2,3 > /dev/null'
* bash -c 'gwc -l FILE > /dev/null'
* bash -c 'cat FILE | gwc -l > /dev/null'

Note that 'gwc' and 'gcut' are the GNU versions of 'wc' and 'cut' 
installed by Homebrew.


Google ngram file (the 's' unigram file):

Test  Elapsed  System   User
  ---  --   
tsv-select -f 2,3 FILE  10.280.42   9.85
cat FILE | tsv-select -f 2,311.101.45  10.23
cut -f 2,3 FILE 14.640.60  14.03
cat FILE | cut -f 2,3   14.361.03  14.19
wc -l FILE   1.320.39   0.93
cat FILE | wc -l 1.180.96   1.04


The TREE file:

Test  Elapsed  System   User
  ---  --   
tsv-select -f 2,3 FILE   3.770.95   2.81
cat FILE | tsv-select -f 2,3 4.542.65   3.28
cut -f 2,3 FILE 17.781.53  16.24
cat FILE | cut -f 2,3   16.772.64  16.36
wc -l FILE   1.380.91   0.46
cat FILE | wc -l 2.022.63   0.77


What this shows is that 'tsv-select' (D program) was faster when 
reading from a file than when reading from a standard input. It 
doesn't indicate why or whether the delta is due to code D 
library or code in 'tsv-select'.


Interestingly, 'cut' showed the opposite behavior. It was faster 
when reading from standard input than when reading from the file. 
For 'wc', which method was faster was dependent on line length.


Again, I caution against reading too much into this regarding 
performance of reading from standard input vs a disk file. Much 
more definitive tests can be done. However, it is an interesting 
comparison.


Also, the D program is still fast in both cases.

--Jon


Re: getopt: How does arraySep work?

2020-07-16 Thread Jon Degenhardt via Digitalmars-d-learn
On Thursday, 16 July 2020 at 17:40:25 UTC, Steven Schveighoffer 
wrote:

On 7/16/20 1:13 PM, Andre Pany wrote:
On Thursday, 16 July 2020 at 05:03:36 UTC, Jon Degenhardt 
wrote:

On Wednesday, 15 July 2020 at 07:12:35 UTC, Andre Pany wrote:

[...]


An enhancement is likely to hit some corner-cases involving 
list termination requiring choices that are not fully 
generic. Any time a legal list value looks like a legal 
option. Perhaps the most important case is single digit 
numeric options like '-1', '-2'. These are legal short form 
options, and there are programs that use them. They are also 
somewhat common numeric values to include in command lines 
inputs.


[...]


My naive implementation would be that any dash would stop the 
list of multiple values. If you want to have a value 
containing a space or a dash, you enclose it with double 
quotes in the terminal.


Enclose with double quotes in the terminal does nothing:

myapp --modelicalibs "file-a.mo" "file-b.mo"

will give you EXACTLY the same string[] args as:

myapp --modelicalibs file-a.mo file-b.mo

I think Jon's point is that it's difficult to distinguish where 
an array list ends if you get the parameters as separate items.


Like:

myapp --numbers 1 2 3 -5 -6

Is that numbers=> [1, 2, 3, -5, -6]

or is it numbers=> [1, 2, 3], 5 => true, 6 => true

This is probably why the code doesn't support that.

-Steve


Yes, this what I was getting. Thanks for the clarification.

Also, it's not always immediately obvious what part of the 
argument splitting is being done by the shell, and what is being 
done by the program/getopt. Taking inspiration from the recent 
one-liners, here's way to see how the program gets the args from 
the shell for different command lines:


$ echo 'import std.stdio; void main(string[] args) { args[1 .. 
$].writeln; }' | dmd -run - --numbers 1,2,3,-5,-6

["--numbers", "1,2,3,-5,-6"]

$ echo 'import std.stdio; void main(string[] args) { args[1 .. 
$].writeln; }' | dmd -run - --numbers 1 2 3 -5 -6

["--numbers", "1", "2", "3", "-5", "-6"]

$ echo 'import std.stdio; void main(string[] args) { args[1 .. 
$].writeln; }' | dmd -run - --numbers "1" "2" "3" "-5" "-6"

["--numbers", "1", "2", "3", "-5", "-6"]

$ echo 'import std.stdio; void main(string[] args) { args[1 .. 
$].writeln; }' | dmd -run - --numbers '1 2 3 -5 -6'

["--numbers", "1 2 3 -5 -6"]

The first case is what getopt supports now - All the values in a 
single string with a separator that getopt splits on. The 2nd and 
3rd are identical from the program's perspective (Steve's point), 
but they've already been split, so getopt would need a different 
approach. And requires dealing with ambiguity. The fourth form 
eliminates the ambiguity, but puts the burden on the user to use 
quotes.


Re: getopt: How does arraySep work?

2020-07-15 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 15 July 2020 at 07:12:35 UTC, Andre Pany wrote:

On Tuesday, 14 July 2020 at 15:48:59 UTC, Andre Pany wrote:
On Tuesday, 14 July 2020 at 14:33:47 UTC, Steven Schveighoffer 
wrote:

On 7/14/20 10:22 AM, Steven Schveighoffer wrote:
The documentation needs updating, it should say "parameters 
are added sequentially" or something like that, instead of 
"separation by whitespace".


https://github.com/dlang/phobos/pull/7557

-Steve


Thanks for the answer and the pr. Unfortunately my goal here 
is to simulate a partner tool written  in C/C++ which supports 
this behavior. I will also create an enhancement issue for 
supporting this behavior.


Kind regards
Anste


Enhancement issue:
https://issues.dlang.org/show_bug.cgi?id=21045

Kind regards
André


An enhancement is likely to hit some corner-cases involving list 
termination requiring choices that are not fully generic. Any 
time a legal list value looks like a legal option. Perhaps the 
most important case is single digit numeric options like '-1', 
'-2'. These are legal short form options, and there are programs 
that use them. They are also somewhat common numeric values to 
include in command lines inputs.


I ran into a couple cases like this with a getopt cover I wrote. 
The cover supports runtime processing of command arguments in the 
order entered on the command line rather than the compile-time 
getopt() call order. Since it was only for my stuff, not Phobos, 
it was an easy choice: Disallow single digit short options. But a 
Phobos enhancement might make other choices.


IIRC, a characteristic of the current getopt implementation is 
that it does not have run-time knowledge of all the valid 
options, so the set of ambiguous entries is larger than just the 
limited set of options specified in the program. Essentially, 
anything that looks syntactically like an option.


Doesn't mean an enhancement can't be built, just that there might 
some constraints to be aware of.


--Jon




Re: Looking for a Code Review of a Bioinformatics POC

2020-06-12 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 12 June 2020 at 06:20:59 UTC, H. S. Teoh wrote:
I glanced over the implementation of byLine.  It appears to be 
the unhappy compromise of trying to be 100% correct, cover all 
possible UTF encodings, and all possible types of input streams 
(on-disk file vs. interactive console).  It does UTF decoding 
and resizing of arrays, and a lot of other frilly little 
squirrelly things.  In fact I'm dismayed at how hairy it is, 
considering the conceptual simplicity of the task!


Given this, it will definitely be much faster to load in large 
chunks of the file at a time into a buffer, and scanning 
in-memory for linebreaks. I wouldn't bother with decoding at 
all; I'd just precompute the byte sequence of the linebreaks 
for whatever encoding the file is expected to be in, and just 
scan for that byte pattern and return slices to the data.


This is basically what bufferedByLine in tsv-utils does. See: 
https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/utils.d#L793.


tsv-utils has the advantage of only needing to support utf-8 
files with Unix newlines, so the code is simpler. (Windows 
newlines are detected, this occurs separately from 
bufferedByLine.) But as you describe, support for a wider variety 
of input cases could be done without sacrificing basic 
performance. iopipe provides much more generic support, and it is 
quite fast.


Having said all of that, though: usually in non-trivial 
programs reading input is the least of your worries, so this 
kind of micro-optimization is probably unwarranted except for 
very niche cases and for micro-benchmarks and other such toy 
programs where the cost of I/O constitutes a significant chunk 
of running times.  But knowing what byLine does under the hood 
is definitely interesting information for me to keep in mind, 
the next time I write an input-heavy program.


tsv-utils tools saw performance gains of 10-40% by moving from 
File.byLine to bufferedByLine, depending on tool and type of file 
(narrow or wide). Gains of 5-20% were obtained by switching from 
File.write to BufferedOutputRange, with some special cases 
improving by 50%. tsv-utils tools aren't micro-benchmarks, but 
they are not typical apps either. Most of the tools go into a 
tight loop of some kind, running a transformation on the input 
and writing to the output. Performance is a real benefit to these 
tools, as they get run on reasonably large data sets.




Re: Looking for a Code Review of a Bioinformatics POC

2020-06-11 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 12 June 2020 at 00:58:34 UTC, duck_tape wrote:

On Thursday, 11 June 2020 at 23:45:31 UTC, H. S. Teoh wrote:


Hmm, looks like it's not so much input that's slow, but 
*output*. In fact, it looks pretty bad, taking almost as much 
time as overlap() does in total!


[snip...]


I'll play with that a bit tomorrow! I saw a nice implementation 
on eBay's tsvutils that I may need to look closer at.


Someone else suggested that stdout flushes per line by default. 
I dug around the stdlib but could confirm that. I also played 
around with setvbuf but it didn't seem to change anything.


Have you run into that before / know if stdout is flushing 
every newline? I'm not above opening '/dev/stdout' as a file of 
that writes faster.


I put some comparative benchmarks in 
https://github.com/jondegenhardt/dcat-perf. It  compares input 
and output using standard Phobos facilities (File.byLine, 
File.write), iopipe (https://github.com/schveiguy/iopipe), and 
the tsv-utils buffered input and buffered output facilities.


I haven't spent much time on results presentation, I know it's 
not that easy to read and interpret the results. Brief summary - 
On files with short lines buffering will result in dramatic 
throughput improvements over the standard phobos facilities. This 
is true for both input and output, through likely for different 
reasons. For input iopipe is the fastest available. tsv-utils 
buffered facilities are materially faster than phobos for both 
input and output, but not as fast as iopipe for input. Combining 
iopipe for input with tsv-utils BufferOutputRange for output 
works pretty well.


For files with long lines both iopipe and tsv-utils 
BufferedByLine are materially faster than Phobos File.byLine when 
reading. For writing there wasn't much difference from Phobos 
File.write.


A note on File.byLine - I've had many opportunities to compare 
Phobos File.byLine to facilities in other programming languages, 
and it is not bad at all. But it is beatable.


About Memory Mapped Files - The benchmarks don't include compare 
against mmfile. They certainly make sense as a comparison point.


--Jon


Re: Idiomatic way to write a range that tracks how much it consumes

2020-04-27 Thread Jon Degenhardt via Digitalmars-d-learn

On Monday, 27 April 2020 at 05:06:21 UTC, anon wrote:
To implement your option A you could simply use 
std.range.enumerate.


Would something like this work?

import std.algorithm.iteration : map;
import std.algorithm.searching : until;
import std.range : tee;

size_t bytesConsumed;
auto result = input.map!(a => a.yourTransformation )
   .until!(stringTerminator)
   .tee!(a => bytesConsumed++);
// bytesConsumed is automatically updated as result is consumed


That's interesting. Wouldn't work quite like, but something 
similar would, but I don't think it quite achieves what I want.


One thing that's missing is that the initial input is simply a 
string, there's nothing to map over at that point. There is 
however a transformation step that transforms the string into a 
sequence of slices. Then there's a transformation on those 
slices. That would be a step prior to the 'map' step. Also, in my 
case 'map' cannot be used, because each slice may produce 
multiple outputs.


The specifics are minor details, not really so important. The 
implementation can take a form along the lines described. 
However, structuring like this exposes the details of these steps 
to all callers. That is, all callers would have to write the code 
above.


My goal is encapsulate the steps into a single range all callers 
can use. That is, encapsulate something like the steps you have 
above in a standalone range that takes the input string as an 
argument, produces all the output elements, and preserves the 
bytesConsumed in a way the caller can access it.


Re: Idiomatic way to write a range that tracks how much it consumes

2020-04-26 Thread Jon Degenhardt via Digitalmars-d-learn
On Monday, 27 April 2020 at 04:51:54 UTC, Steven Schveighoffer 
wrote:

On 4/26/20 11:38 PM, Jon Degenhardt wrote:

Is there a better way to write this?


I had exactly the same problems. I created this to solve the 
problem, I've barely tested it, but I plan to use it with all 
my parsing utilities on iopipe:


https://code.dlang.org/packages/bufref
https://github.com/schveiguy/bufref/blob/master/source/bufref.d


Thanks Steve, I'll definitely take a look at this.  --Jon



Re: Idiomatic way to write a range that tracks how much it consumes

2020-04-26 Thread Jon Degenhardt via Digitalmars-d-learn

On Monday, 27 April 2020 at 04:41:58 UTC, drug wrote:

27.04.2020 06:38, Jon Degenhardt пишет:


Is there a better way to write this?

--Jon


I don't know a better way, I think you enlist all possible ways 
- get a value using either `front` or special range member. I 
prefer the second variant, I don't think it is less consistent 
with range paradigms. Considering you need amount of consumed 
bytes only when range is empty the second way is more effective.


Thanks. Of two, I like the second better as well.


Idiomatic way to write a range that tracks how much it consumes

2020-04-26 Thread Jon Degenhardt via Digitalmars-d-learn
I have a string that contains a sequence of elements, then a 
terminator character, followed by a different sequence of 
elements (of a different type).


I want to create an input range that traverses the initial 
sequence. This is easy enough. But after the initial sequence has 
been traversed, the caller will need to know where the next 
sequence starts. That is, the caller needs to know the index in 
the input string where the initial sequence ends and the next 
sequence begins.


The values returned by the range are a transformation of the 
input, so the values by themselves are insufficient for the 
caller to determined how much of the string has been consumed. 
And, the caller cannot simply search for the terminator character.


Tracking the number of bytes consumed is easy enough. I like to 
do in a way that is consistent with D's normal range paradigm.


Two candidate approaches:
a) Instead of having the range return the individual values, it 
could return a tuple containing the value and the number of bytes 
consumed.


b) Give the input range an extra member function which returns 
the number of bytes consumed. The caller could call this after 
'empty()' returns true to find the amount of data consumed.


Both will work, but I'm not especially satisfied with either. 
Approach (a) seems more consistent with the typical range 
paradigms, but also more of a hassle for callers.


Is there a better way to write this?

--Jon


Re: Integration tests

2020-04-17 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 17 April 2020 at 16:56:57 UTC, Russel Winder wrote:

Hi,

Thinking of trying to do the next project in D rather than 
Rust, but…


Rust has built in unit testing on a module basis. D has this so 
no problem.


Rust allows for integration tests in the tests directory of a 
project. These are automatically build and run along with all 
unit tests as part of "cargo test".


Does D have any integrated support for integration tests in the 
way

Rust does?


Automated testing is important, perhaps you describe further 
what's needed? I haven't worked with Rust test frameworks, but I 
took a look at the description of the integration tests and unit 
tests. It wasn't immediately obvious what can be done with the 
Rust integration test framework that cannot be done with D's 
unittest framework.


An important concept described was testing a module as an 
external caller. That would seem very be doable using D's 
unittest framework. For example, one could create a set of tests 
against Phobos, put them in a separate location (e.g. a separate 
file), and arrange to have the unittests run as part of a CI 
process run along with a build.


My look was very superficial, perhaps you could explain more.


Re: How to correctly import tsv-utilites functions?

2020-04-15 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 14 April 2020 at 20:25:08 UTC, p.shkadzko wrote:
On Tuesday, 14 April 2020 at 20:05:28 UTC, Steven Schveighoffer 
wrote:

On 4/14/20 3:34 PM, p.shkadzko wrote:

[...]



What about using dependency tsv-utils:common ?

Looks like tsv-utils is a collection of subpackages, and the 
main package just serves as a namespace.


-Steve


Yes, it works! Thank you.


Glad that worked for you. (And thanks Steve!) I have a small app 
with an example of a dub.json file that pulls the tsv-utils 
common dependencies this way: 
https://github.com/jondegenhardt/dcat-perf/blob/master/dub.json


--Jon


Re: Unexpected result with std.conv.to

2019-11-14 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 15 November 2019 at 03:51:04 UTC, Joel wrote:
I made a feature that converts, say, [9:59am] -> [10:00am] to 1 
minute. but found '9'.to!int = 57 (not 9).


Doesn't seem right... I'm guessing that's standard though, same 
with ldc.


Use a string or char[] array. e.g. writeln("9".to!int) => 9.

With a single 'char' what is being produced is the ascii value of 
the character.


Re: csvReader & specifying separator problems...

2019-11-14 Thread Jon Degenhardt via Digitalmars-d-learn
On Thursday, 14 November 2019 at 12:25:30 UTC, Robert M. Münch 
wrote:
Just trying a very simple thing and it's pretty hard: "Read a 
CSV file (raw_data) that has a ; separator so that I can 
iterate over the lines and access the fields."


csv_data = raw_data.byLine.joiner("\n")

From the docs, which I find extremly hard to understand:

auto csvReader(Contents = string, Malformed ErrorLevel = 
Malformed.throwException, Range, Separator = char)(Range input, 
Separator delimiter = ',', Separator quote = '"')


So, let's see if I can decyphre this, step-by-step by trying 
out:


csv_records = csv_data.csvReader();

Would split the CSV data into iterable CSV records using ',' 
char as separator using UFCS syntax. When running this I get:


[...]


Side comment - This code looks like it was taken from the first 
example in the std.csv documentation. To me, the code in the 
std.csv example is doing something that might not be obvious at 
first glance and is potentially confusing.


In particular, 'byLine' is not reading individual CSV records. 
CSV can have embedded newlines, these are identified by CSV 
escape syntax. 'byLine' doesn't know the escape syntax. If there 
are embedded newlines, 'byLine' will read partial records, which 
may not be obvious at first glance. The .joiner("\n") step puts 
the newline back, stitching fields and records back together 
again in the process.


The effect is to create an input range of characters representing 
the entire file, using 'byLine' to do buffered reads. This input 
range is passed to CSVReader.


This could also be done using 'byChunk' and 'joiner' (with no 
separator). This would use a fixed size buffer, no searching for 
newlines while reading, so it should be faster.


An example:

 csv_by_chunk.d 
import std.algorithm;
import std.csv;
import std.conv;
import std.stdio;
import std.typecons;
import std.utf;

void main()
{
// Small buffer used to show it works. Normally would use a 
larger buffer.

ubyte[16] buffer;
auto stdinBytes = stdin.byChunk(buffer).joiner;
auto stdinDChars = stdinBytes.map!((ubyte b) => cast(char) 
b).byDchar;


writefln("--");
foreach (record; stdinDChars.csvReader!(Tuple!(string, 
string, string)))

{
writefln("Field 0: |%s|", record[0]);
writefln("Field 1: |%s|", record[1]);
writefln("Field 2: |%s|", record[2]);
writefln("--");
}
}

Pass it csv data without embedded newlines:

$ echo $'abc,def,ghi\njkl,mno,pqr' | ./csv_by_chunk
--
Field 0: |abc|
Field 1: |def|
Field 2: |ghi|
--
Field 0: |jkl|
Field 1: |mno|
Field 2: |pqr|
--

Pass it csv data with embedded newlines:

$ echo $'abc,"LINE 1\nLINE 2",ghi\njkl,mno,pqr' | ./csv_by_chunk
--
Field 0: |abc|
Field 1: |LINE 1
LINE 2|
Field 2: |ghi|
--
Field 0: |jkl|
Field 1: |mno|
Field 2: |pqr|
--

An example like this may avoid the confusion about newlines. 
Unfortunately, the need to do the odd looking conversion from 
ubyte to char/dchar is undesirable in a code example. I haven't 
found a cleaner way to write that. If there's a nicer way I'd 
appreciate hearing about it.


--Jon



Re: formatting a float or double in a string with all significant digits kept

2019-10-10 Thread Jon Degenhardt via Digitalmars-d-learn

On Thursday, 10 October 2019 at 17:12:25 UTC, dan wrote:

Thanks also berni44 for the information about the dig attribute,
Jon for the neat packaging into one line using the attribute on 
the type.
Unfortunately, the version of gdc that comes with the version 
of debian
that i am using does not have the dig attribute yet, but 
perhaps i can

upgrade, and eventually i think gdc will have it.


Glad these ideas helped. The value of the 'double.dig' property 
is not going to change between compilers/versions/etc. It's 
really a property of IEEE 754 floating point for 64 bit floats. 
(D specified the size of double as 64).  So, if you are using 
double, then it's pretty safe to use 15 until the compiler you're 
using is further along on versions. Declare an enum or const 
variable to give it a name so you can track it down later.


Also, don't get thrown off by the PI is a real, not a double. D 
supports 80 bit floats as real, so constants like PI are defined 
as real. But if you convert PI to a double, it'll then have 15 
significant bits of precision.


--Jon


Re: formatting a float or double in a string with all significant digits kept

2019-10-09 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 9 October 2019 at 05:46:12 UTC, berni44 wrote:

On Tuesday, 8 October 2019 at 20:37:03 UTC, dan wrote:
But i would like to be able to do this without knowing the 
expansion of pi, or writing too much code, especially if 
there's some d function like writeAllDigits or something 
similar.


You can use the property .dig to get the number of significant 
digits of a number:


writeln(PI.dig); // => 18

You still need to account for the numbers before the dot. If 
you're happy with scientific notation you can do:


auto t = format("%.*e", PI.dig, PI);
writeln("PI = ",t);


Using the '.dig' property is a really nice idea and looks very 
useful for this. A clarification though - It's the significant 
digits in the data type, not the value. (PI is 18 because it's a 
real, not a double.) So:


writeln(1.0f.dig, ", ", float.dig);  =>  6, 6
writeln(1.0.dig, ", ", double.dig);  => 15, 15
writeln(1.0L.dig, ", ", real.dig);   => 18, 18

Another possibility would be to combine the '.dig' property with 
the "%g" option, similar to the use "%e" shown. For example, 
these lines:


writeln(format("%0.*g", PI.dig, PI));
writeln(format("%0.*g", double.dig, 1.0));
writeln(format("%0.*g", double.dig, 100.0));
writeln(format("%0.*g", double.dig, 1.0001));
writeln(format("%0.*g", double.dig, 0.0001));

produce:

3.14159265358979324
1
100
1.0001
1e-08

Hopefully experimenting with the different formatting options 
available will yield one that works for your use case.


Re: Help me decide D or C

2019-08-02 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 31 July 2019 at 18:38:02 UTC, Alexandre wrote:
Should I go for C and then when I become a better programmer 
change to D?

Should I start with D right now?


In my view, the most important thing is the decision you've 
already made - to pick a programming language and learn it in a 
reasonable bit of depth. Which programming language you choose is 
less important. No matter which choice you make you'll have the 
opportunity to learn skills that will transfer to other 
programming languages.


As you can tell from the other responses, the pros and cons of a 
learning a specific language depend quite a bit on what you hope 
to get out of it, and are to a fair extent subjective. But both C 
and D provide meaningful opportunities to gain worthwhile 
experience.


A couple reasons for considering learning D over C are its 
support for functional programming and templates. These were also 
mentioned by a few other people. These are not really "beginner" 
topics, but as one moves past the beginner stage they are really 
quite valuable techniques to start mastering. For both D is the 
far better option, and it's not necessary to use either when 
starting out.


--Jon


Re: rdmd takes 2-3 seconds on a first-run of a simple .d script

2019-05-26 Thread Jon Degenhardt via Digitalmars-d-learn

On Saturday, 25 May 2019 at 22:18:16 UTC, Andre Pany wrote:

On Saturday, 25 May 2019 at 08:32:08 UTC, BoQsc wrote:
I have a simple standard .d script and I'm getting annoyed 
that it takes 2-3 seconds to run and see the results via rdmd.


Also please keep in mind there could be other factors like slow 
disks, anti virus scanners,... which causes a slow down.


I have seen similar behavior that I attribute to virus scan 
software. After compiling a program, the first run takes several 
seconds to run, after that it runs immediately. I'm assuming the 
first run of an unknown binary triggers a scan, though I cannot 
be completely sure.


Try compiling a new binary in D or C++ and see if a similar 
effect is seen.


--Jon



Re: Poor regex performance?

2019-04-04 Thread Jon Degenhardt via Digitalmars-d-learn

On Thursday, 4 April 2019 at 10:31:43 UTC, Julian wrote:
On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole 
wrote:

If you need performance use ldc not dmd (assumed).

LLVM has many factors better code optimizes than dmd does.


Thanks! I already had dmd installed from a brief look at D a 
long
time ago, so I missed the details at 
https://dlang.org/download.html


ldc2 -O3 does a lot better, but the result is still 30x slower
without PCRE.


Try:
ldc2 -O3 -release -flto=thin 
-defaultlib=phobos2-ldc-lto,druntime-ldc-lto -enable-inlining


This will improve inlining and optimization across the runtime 
library boundaries. This can help in certain types of code.


Dub: A json/sdl equivalent to --combined command line option?

2019-04-01 Thread Jon Degenhardt via Digitalmars-d-learn
In Dub, is there a way to specify the equivalent of the 
--combined command line argument in the json/sdl package config 
file?


What I'd like to be able to do is create a custom build type such 
that


$ dub build --build=build-xyz

builds in combined mode, without needing to add the --combined on 
the command line. Putting it on the command line as follows did 
what I intended:


   $ dub build --build=build-xyz --combined

--Jon


Re: Which Docker to use?

2018-11-11 Thread Jon Degenhardt via Digitalmars-d-learn

On Monday, 22 October 2018 at 18:44:01 UTC, Jacob Carlborg wrote:

On 2018-10-21 20:45, Jon Degenhardt wrote:

The issue that caused me to go to Ubuntu 16.04 had to do with 
uncaught exceptions when using LTO with the gold linker and 
LDC 1.5. Problem occurred with 14.04, but not 16.04. I should 
go back and retest on Ubuntu 14.04 with a more recent LDC, it 
may well have been corrected. The issue thread is here: 
https://github.com/ldc-developers/ldc/issues/2390.


Ah, that might be the reason. I am not using LTO. You might 
want to try a newer version of LDC as well since 1.5 is quite 
old now.


I switched to LDC 1.12.0. The problem remains with LTO and static 
builds on Ubuntu 14.04. Ubuntu 16.04 is required, at least with 
LTO of druntime/phobos. The good news on this front is that the 
regularly updated dlang2 docker images work fine with LTO on 
druntime/phobos (using the LTO build support available in LDC 
1.9.0). Examples of travis-ci setups for both dlanguage and 
dlang2 docker images are available on the tsv-utils travis 
config: 
https://github.com/eBay/tsv-utils/blob/master/.travis.yml. Look 
for the DOCKERSPECIAL environment variables.


Re: d word counting approach performs well but has higher mem usage

2018-11-04 Thread Jon Degenhardt via Digitalmars-d-learn

On Saturday, 3 November 2018 at 14:26:02 UTC, dwdv wrote:

Hi there,

the task is simple: count word occurrences from stdin (around 
150mb in this case) and print sorted results to stdout in a 
somewhat idiomatic fashion.


Now, d is quite elegant while maintaining high performance 
compared to both c and c++, but I, as a complete beginner, 
can't identify where the 10x memory usage (~300mb, see results 
below) is coming from.


Unicode overhead? Internal buffer? Is something slurping the 
whole file? Assoc array allocations? Couldn't find huge allocs 
with dmd -vgc and -profile=gc either. What did I do wrong?


Not exactly the same problem, but there is relevant discussion in 
the blog post I wrote a while ago:  
https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/


See in particular the section on Associate Array lookup 
optimization. This takes advantage of the fact that it's only 
necessary to create the immutable string the first time a key is 
entered into the hash. Subsequent occurrences do not need to take 
this step. As creating allocates new memory, even if only used 
temporarily, this is a meaningful savings.


There have been additional APIs added to the AA interface since I 
wrote the blog post, I believe it is now possible to accomplish 
the same thing with more succinct code.


Other optimization possibilities:
* Avoid auto-decode: Not sure if your code is hitting this, but 
if so it's a significant performance hit. Unfortunately, it's not 
always obvious when this is happening. The task your are 
performing doesn't need auto-decode because it is splitting on 
single-byte utf-8 char boundaries (newline and space).


* LTO on druntime/phobos: This is easy and will have a material 
speedup. Simply add

'-defaultlib=phobos2-ldc-lto,druntime-ldc-lto'
to the 'ldc2' build line, after the '-flto=full' entry. This will 
be a win because it will enable a number of optimizations in the 
internal loop.


* Reading the whole file vs line by line - 'byLine' is really 
fast. It's also nice and general, as it allows reading arbitrary 
size files or standard input without changes to the code. 
However, it's not as fast as reading the file in a single shot.


* std.algorithm.joiner - Has improved dramatically, but is still 
slower than a foreach loop. See: 
https://github.com/dlang/phobos/pull/6492


--Jon




Re: Which Docker to use?

2018-10-21 Thread Jon Degenhardt via Digitalmars-d-learn

On Sunday, 21 October 2018 at 18:11:37 UTC, Jacob Carlborg wrote:

On 2018-10-18 01:15, Jon Degenhardt wrote:

I need to use docker to build static linked Linux executables. 
My reason
is specific, may be different than the OP's. I'm using 
Travis-CI to
build executables. Travis-CI uses Ubuntu 14.04, but static 
linking fails
on 14.04. The standard C library from Ubuntu 16.04 or later is 
needed.

There may be other/better ways to do this, I don't know.


That's interesting. I've built static binaries for DStep using 
LDC on Travis CI without any problems.


My comment painted too broad a brush. I had forgotten how 
specific the issue I saw was. Apologies for the confusion.


The issue that caused me to go to Ubuntu 16.04 had to do with 
uncaught exceptions when using LTO with the gold linker and LDC 
1.5. Problem occurred with 14.04, but not 16.04. I should go back 
and retest on Ubuntu 14.04 with a more recent LDC, it may well 
have been corrected. The issue thread is here: 
https://github.com/ldc-developers/ldc/issues/2390.


Re: Which Docker to use?

2018-10-20 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 19 October 2018 at 22:16:04 UTC, Ky-Anh Huynh wrote:
On Wednesday, 17 October 2018 at 23:15:53 UTC, Jon Degenhardt 
wrote:


I need to use docker to build static linked Linux executables. 
My reason is specific, may be different than the OP's. I'm 
using Travis-CI to build executables. Travis-CI uses Ubuntu 
14.04, but static linking fails on 14.04. The standard C 
library from Ubuntu 16.04 or later is needed. There may be 
other/better ways to do this, I don't know.


Yes I'm also using Travis-CI and that's why I need some Docker 
support.


I'm using dlanguage/ldc. The reason for that choice was because 
it was what was available when I put the travis build together. 
As you mentioned, it hasn't been updated in a while. I'm still 
producing this build with an older ldc version, but when I move 
to a more current version I'll have to switch to a different 
docker image.


My travis config is here: 
https://github.com/eBay/tsv-utils/blob/master/.travis.yml. Look 
for the sections referencing the DOCKERSPECIAL environment 
variable.


Re: Which Docker to use?

2018-10-17 Thread Jon Degenhardt via Digitalmars-d-learn
On Wednesday, 17 October 2018 at 08:08:44 UTC, Gary Willoughby 
wrote:
On Wednesday, 17 October 2018 at 03:37:21 UTC, Ky-Anh Huynh 
wrote:

Hi,

I need to build some static binaries with LDC. I also need to 
execute builds on both platform 32-bit and 64-bit.



From Docker Hub there are two image groups:

* language/ldc (last update 5 months ago)
* dlang2/ldc-ubuntu (updated recently)


Which one do you suggest?

Thanks a lot.


To be honest, you don't need docker for this. You can just 
download LDC in a self-contained folder and use it as is.


https://github.com/ldc-developers/ldc/releases

That's what I do on Linux.


I need to use docker to build static linked Linux executables. My 
reason is specific, may be different than the OP's. I'm using 
Travis-CI to build executables. Travis-CI uses Ubuntu 14.04, but 
static linking fails on 14.04. The standard C library from Ubuntu 
16.04 or later is needed. There may be other/better ways to do 
this, I don't know.


Re: Error: variable 'xyz' has scoped destruction, cannot build closure

2018-10-05 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 5 October 2018 at 16:34:32 UTC, Paul Backus wrote:
On Friday, 5 October 2018 at 06:56:49 UTC, Nicholas Wilson 
wrote:
On Friday, 5 October 2018 at 06:44:08 UTC, Nicholas Wilson 
wrote:
Alas is does not because each does not accept additional 
argument other than the range. Shouldn't be hard to fix 
though.


https://issues.dlang.org/show_bug.cgi?id=19287


You can thread multiple arguments through to `each` using 
`std.range.zip`:


tenRandomNumbers
.zip(repeat(output))
.each!(unpack!((n, output) => 
output.appendln(n.to!string)));


Full code: https://run.dlang.io/is/Qe7uHt


Very interesting, thanks. It's a clever way to avoid the delegate 
capture issue.


(Aside: A nested function that accesses 'output' from lexical 
context has the same issue as delegates wrt to capturing the 
variable.)


Re: Error: variable 'xyz' has scoped destruction, cannot build closure

2018-10-05 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 5 October 2018 at 06:44:08 UTC, Nicholas Wilson wrote:
On Friday, 5 October 2018 at 06:22:57 UTC, Nicholas Wilson 
wrote:
tenRandomNumbers.each!((n,o) => 
o.appendln(n.to!string))(output);


or

tenRandomNumbers.each!((n, ref o) => 
o.appendln(n.to!string))(output);


should hopefully do the trick (run.dlang.io seems to be down 
atm).




Alas is does not because each does not accept additional 
argument other than the range. Shouldn't be hard to fix though.


Yeah, that's what I was seeing also. Thanks for taking a look. Is 
there perhaps a way to limit the scope of the delegate to the 
local function? Something that would tell the compiler the 
delegate has a lifetime shorter than the struct.


One specific it points out is that this a place where the 
BufferedOutputRange I wrote cannot be used interchangeably with 
other output ranges. It's minor, but the intent was to be able to 
pass this anyplace an output range could be used.


Error: variable 'xyz' has scoped destruction, cannot build closure

2018-10-04 Thread Jon Degenhardt via Digitalmars-d-learn
I got the compilation error in the subject line when trying to 
create a range via std.range.generate. Turns out this was caused 
by trying to create a closure for 'generate' where the closure 
was accessing a struct containing a destructor.


The fix was easy enough: write out the loop by hand rather than 
using 'generate' with a closure. What I'm wondering/asking is if 
there alternate way to do this that would enable the 'generate' 
approach. This is more curiosity/learning at this point.


Below is a stripped down version of what I was doing. I have a 
struct for output buffering. The destructor writes any data left 
in the buffer to the output stream. This gets passed to routines 
performing output. It was this context that I created a generator 
that wrote to it.


example.d-
struct BufferedStdout
{
import std.array : appender;

private auto _outputBuffer = appender!(char[]);

~this()
{
import std.stdio : write;
write(_outputBuffer.data);
_outputBuffer.clear;
}

void appendln(T)(T stuff)
{
import std.range : put;
put(_outputBuffer, stuff);
put(_outputBuffer, "\n");
}
}

void foo(BufferedStdout output)
{
import std.algorithm : each;
import std.conv : to;
import std.range: generate, takeExactly;
import std.random: Random, uniform, unpredictableSeed;

auto randomGenerator = Random(unpredictableSeed);
auto randomNumbers = generate!(() => uniform(0, 1000, 
randomGenerator));

auto tenRandomNumbers = randomNumbers.takeExactly(10);
tenRandomNumbers.each!(n => output.appendln(n.to!string));
}

void main(string[] args)
{
foo(BufferedStdout());
}
End of  example.d-

Compiling the above results in:

   $ dmd example.d
   example.d(22): Error: variable `example.foo.output` has scoped 
destruction, cannot build closure


As mentioned, using a loop rather than 'generate' works fine, but 
help with alternatives that would use generate would be 
appreciated.


The actual buffered output struct has more behind it than shown 
above, but not too much. For anyone interested it's here:  
https://github.com/eBay/tsv-utils/blob/master/common/src/tsvutil.d#L358


Re: tupleof function parameters?

2018-08-28 Thread Jon Degenhardt via Digitalmars-d-learn
On Tuesday, 28 August 2018 at 06:20:37 UTC, Sebastiaan Koppe 
wrote:
On Tuesday, 28 August 2018 at 06:11:35 UTC, Jon Degenhardt 
wrote:
The goal is to write the argument list once and use it to 
create both the function and the Tuple alias. That way I could 
create a large number of these function / arglist tuple pairs 
with less brittleness.


--Jon


I would probably use a combination of std.traits.Parameters and 
std.traits.ParameterIdentifierTuple.


Parameters returns a tuple of types and 
ParameterIdentifierTuple returns a tuple of strings. Maybe 
you'll need to implement a staticZip to interleave both tuples 
to get the result you want. (although I remember seeing one 
somewhere).


Alex, Sebastiaan - Thanks much, this looks like it should get me 
what I'm looking for. --Jon


tupleof function parameters?

2018-08-28 Thread Jon Degenhardt via Digitalmars-d-learn
I'd like to create a Tuple alias representing a function's 
parameter list. Is there a way to do this?


Here's an example creating a Tuple alias for a function's 
parameters by hand:


import std.typecons: Tuple;

bool fn(string op, int v1, int v2)
{
switch (op)
{
default: return false;
case "<": return v1 < v2;
case ">": return v1 > v2;
}
}

alias fnArgs = Tuple!(string, "op", int, "v1", int, "v2");

unittest
{
auto args = fnArgs("<", 3, 5);
assert(fn(args[]));
}

This is quite useful. I'm wondering if there is a way to create 
the 'fnArgs' alias from the definition of 'fn' without needing to 
manually write out the '(string, "op", int, "v1", int, "v2")' 
sequence by hand. Something like a 'tupleof' operation on the 
function parameter list. Or conversely, define the tuple and use 
it when defining the function.


The goal is to write the argument list once and use it to create 
both the function and the Tuple alias. That way I could create a 
large number of these function / arglist tuple pairs with less 
brittleness.


--Jon



Re: Splitting up large dirty file

2018-05-21 Thread Jon Degenhardt via Digitalmars-d-learn

On Monday, 21 May 2018 at 15:00:09 UTC, Dennis wrote:
I want to be convinced that Range programming works like a 
charm, but the procedural approaches remain more flexible (and 
faster too) it seems. Thanks for the example.



On Monday, 21 May 2018 at 22:11:42 UTC, Dennis wrote:
In this case I used drop to drop lines, not characters. The 
exception was thrown by the joiner it turns out.

 ...
From the benchmarking I did, I found that ranges are easily an 
order of magnitude slower even with compiler optimizations:


My general experience is that range programming works quite well. 
It's especially useful when used to do lazy processing and as a 
result minimize memory allocations. I've gotten quite good 
performance with these techniques (see my DConf talk slides: 
https://dconf.org/2018/talks/degenhardt.html).


Your benchmarks are not against the file split case, but if you 
benchmarked that you may have also seen it as slow. It that case 
you may be hitting specific areas where there are opportunities 
for performance improvement in the standard library. One is that 
joiner is slow (PR: https://github.com/dlang/phobos/pull/6492). 
Another is that the write[fln] routines are much faster when 
operating on a single large object than many small objects. e.g. 
It's faster to call write[fln] with an array of 100 characters 
than: (a) calling it 100 times with one character; (b) calling it 
once, with 100 characters as individual arguments (template 
form); (c) calling it once with range of 100 characters, each 
processed one at a time.


When joiner is used as in your example, you not only hit the 
joiner performance issue, but the write[fln] issue. This is due 
to something that may not be obvious at first: When joiner is 
used to concatenate arrays or ranges, it flattens out the 
array/range into a single range of elements. So, rather than 
writing a line at a time, you example is effectively passing a 
character at a time to write[fln].


So, in the file split case, using byLine in an imperative fashion 
as in my example will have the effect of passing a full line at a 
time to write[fln], rather than individual characters. Mine will 
be faster, but not because it's imperative. The same thing could 
be achieved procedurally.


Regarding the benchmark programs you showed - This is very 
interesting. It would certainly be worth additional looks into 
this. One thing I wonder is if the performance penalty may be due 
to a lack of inlining due to crossing library boundaries. The 
imperative versions aren't crossing these boundaries. If you're 
willing, you could try adding LDC's LTO options and see what 
happens. There are some instructions in the release notes for LDC 
1.9.0 (https://github.com/ldc-developers/ldc/releases). Make sure 
you use the form that includes druntime and phobos.


--Jon


Re: Splitting up large dirty file

2018-05-17 Thread Jon Degenhardt via Digitalmars-d-learn

On Thursday, 17 May 2018 at 20:08:09 UTC, Dennis wrote:

On Wednesday, 16 May 2018 at 15:47:29 UTC, Jon Degenhardt wrote:
If you write it in the style of my earlier example and use 
counters and if-tests it will work. byLine by itself won't try 
to interpret the characters (won't auto-decode them), so it 
won't trigger an exception if there are invalid utf-8 
characters.


When printing to stdout it seems to skip any validation, but 
writing to a file does give an exception:


```
auto inputStream = (args.length < 2 || args[1] == "-") ? 
stdin : args[1].File;

auto outputFile = new File("output.txt");
foreach (line; inputStream.byLine(KeepTerminator.yes)) 
outputFile.write(line);

```
std.exception.ErrnoException@C:\D\dmd2\windows\bin\..\..\src\phobos\std\stdio.d(2877):
  (No error)

According to the documentation, byLine can throw an 
UTFException so relying on the fact that it doesn't in some 
cases doesn't seem like a good idea.


Instead of:

 auto outputFile = new File("output.txt");

try:

auto outputFile = File("output.txt", "w");

That works for me. The second arg ("w") opens the file for write. 
When I omit it, I also get an exception, as the default open mode 
is for read:


 * If file does not exist:  Cannot open file `output.txt' in mode 
`rb' (No such file or directory)

 * If file does exist:   (Bad file descriptor)

The second error presumably occurs when writing.

As an aside - I agree with one of your bigger picture 
observations: It would be preferable to have more control over 
utf-8 error handling behavior at the application level.


Re: Splitting up large dirty file

2018-05-16 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 16 May 2018 at 07:06:45 UTC, Dennis wrote:

On Wednesday, 16 May 2018 at 02:47:50 UTC, Jon Degenhardt wrote:
Can you show the program you are using that throws when using 
byLine?


Here's a version that only outputs the first chunk:
```
import std.stdio;
import std.range;
import std.algorithm;
import std.file;
import std.exception;

void main(string[] args) {
enforce(args.length == 2, "Pass one filename as argument");
	auto lineChunks = File(args[1], 
"r").byLine.drop(4).chunks(10_000_000/10);

new File("output.txt", "w").write(lineChunks.front.joiner);
}
```


If you write it in the style of my earlier example and use 
counters and if-tests it will work. byLine by itself won't try to 
interpret the characters (won't auto-decode them), so it won't 
trigger an exception if there are invalid utf-8 characters.




Re: Splitting up large dirty file

2018-05-15 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 15 May 2018 at 20:36:21 UTC, Dennis wrote:

I have a file with two problems:
- It's too big to fit in memory (apparently, I thought 1.5 Gb 
would fit but I get an out of memory error when using 
std.file.read)
- It is dirty (contains invalid Unicode characters, null bytes 
in the middle of lines)


I want to write a program that splits it up into multiple 
files, with the splits happening every n lines. I keep 
encountering roadblocks though:


- You can't give Yes.useReplacementChar to `byLine` and 
`byLine` (or `readln`) throws an Exception upon encountering an 
invalid character.


Can you show the program you are using that throws when using 
byLine? I tried a very simple program that reads and outputs 
line-by-line, then fed it a file that contained invalid utf-8. I 
did not see an exception. The invalid utf-8 was created by taking 
part of this file: 
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt (a 
commonly used file with utf-8 edge cases), plus adding a number 
of random hex characters, including null. I don't see exceptions 
thrown.


The program I used:

int main(string[] args)
{
import std.stdio;
import std.conv : to;
try
{
auto inputStream = (args.length < 2 || args[1] == "-") ? 
stdin : args[1].File;
foreach (line; inputStream.byLine(KeepTerminator.yes)) 
write(line);

}
catch (Exception e)
{
stderr.writefln("Error [%s]: %s", args[0], e.msg);
return 1;
}
return 0;
}





Re: What's the proper way to use std.getopt?

2017-12-12 Thread Jon Degenhardt via Digitalmars-d-learn
On Monday, 11 December 2017 at 20:58:25 UTC, Jordi Gutiérrez 
Hermoso wrote:
What's the proper style, then? Can someone show me a good 
example of how to use getopt and the docstring it automatically 
generates?


The command line tools I published use the approach described in 
a number of the replies, but with a tad more structure. It's 
hardly perfect, but may be useful if you want more examples. See: 
 
https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/src/tsv-sample.d. See the main() routine and the TsvSampleOptions struct. Most of the tools have a similar pattern.


--Jon


Re: splitter string/char different behavior

2017-09-30 Thread Jon Degenhardt via Digitalmars-d-learn

On Saturday, 30 September 2017 at 17:17:17 UTC, SrMordred wrote:

writeln( "a.b.c".splitter('.').dropBack(1) ); //compiles ok
writeln( "a.b.c".splitter(".").dropBack(1) );

//error:
Error: template std.range.dropBack cannot deduce function from 
argument types !()(Result, int), candidates are:

(...)

Hm.. can someone explain whats going on?


Let's try again. I'm not sure the full explanation, but likely 
involves two separate template overloads being instantiated, each 
with a separate definition of the return type.


* "a.b.c".splitter('.') - This overload: 
https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L3696-L3703


* "a.b.c".splitter(".") - This overload: 
https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L3973-L3982


But why one supports dropBack and the other doesn't I don't know.


Re: splitter string/char different behavior

2017-09-30 Thread Jon Degenhardt via Digitalmars-d-learn

On Saturday, 30 September 2017 at 19:26:14 UTC, SrMordred wrote:
For "a.b.c"splitter(x), Range r is a string, r.front is a 
char. The template can only be instantiated if the predicate 
function is valid. The predicate function is "a == b". Since 
r.front is a char, then s must be a type that can be compared 
with '=='. A string and char cannot be compared with '==', 
which is why the a valid template instantiation could not be 
found.


Would it be correct to just update the documentation to say 
"Lazily splits a range using an char as a separator" ?   what 
is it; wchar and dchar too?


I notice the example that is there has ' '  as the element.


But this works:
writeln("a.b.c".splitter(".") );


Geez, my mistake. I'm sorry about that. It's dropback that's 
failing, not splitter.


Re: splitter string/char different behavior

2017-09-30 Thread Jon Degenhardt via Digitalmars-d-learn

On Saturday, 30 September 2017 at 17:17:17 UTC, SrMordred wrote:

writeln( "a.b.c".splitter('.').dropBack(1) ); //compiles ok
writeln( "a.b.c".splitter(".").dropBack(1) );

//error:
Error: template std.range.dropBack cannot deduce function from 
argument types !()(Result, int), candidates are:

(...)

Hm.. can someone explain whats going on?


It's easy to overlook, but documentation for splitter starts out:

 Lazily splits a range using an element as a separator.

An element of a string is a char, not a string. It needs to be 
read somewhat literally, but it is correct.


It's also part of template constraint, useful once you've become 
accustomed to reading them:


auto splitter(alias pred = "a == b", Range, Separator)(Range 
r, Separator s)

if (is(typeof(binaryFun!pred(r.front, s)) : bool) && 

For "a.b.c"splitter(x), Range r is a string, r.front is a char. 
The template can only be instantiated if the predicate function 
is valid. The predicate function is "a == b". Since r.front is a 
char, then s must be a type that can be compared with '=='. A 
string and char cannot be compared with '==', which is why the a 
valid template instantiation could not be found.




Re: Region-based memory management and GC?

2017-09-30 Thread Jon Degenhardt via Digitalmars-d-learn

On Saturday, 30 September 2017 at 07:41:21 UTC, Igor wrote:
On Friday, 29 September 2017 at 22:13:01 UTC, Jon Degenhardt 
wrote:
Have there been any investigations into using region-based 
memory management (aka memory arenas) in D, possibly in 
conjunction with GC allocated memory?


Sounds like just want to use 
https://dlang.org/phobos/std_experimental_allocator_building_blocks_region.html.


Wow, thanks, I did not know about this. Will check it out.


Region-based memory management and GC?

2017-09-29 Thread Jon Degenhardt via Digitalmars-d-learn
Have there been any investigations into using region-based memory 
management (aka memory arenas) in D, possibly in conjunction with 
GC allocated memory? This would be a very speculative idea, but 
it'd be interesting to know if there have been looks at this area.


My own interest is request-response applications, where memory 
allocated as part of a specific request can be discarded as a 
single block when the processing of that request completes, 
without running destructors. I've also seen some papers 
describing GC systems targeting big data platforms that 
incorporate this idea. eg. 
http://www.ics.uci.edu/~khanhtn1/papers/osdi16.pdf


--Jon


Re: DUB and LTO?

2017-09-05 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 5 September 2017 at 11:36:06 UTC, Sönke Ludwig wrote:

Am 24.01.2017 um 17:02 schrieb Las:

How do I enable LTO in DUB in a sane way?
I could add it to dflags, but I only want it on release builds.



You can put a "buildTypes" section in your package recipe and 
override default dflags or lflags there just for the "release" 
build type. See 
https://code.dlang.org/package-format?lang=json#build-types


There are examples in my dub.json files. One here: 
https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/dub.json#L24-L28. All the dub.json files in the repo are setup this way. Turns on LTO (thin) for LDC on OS X, not used for other builds. Works in Travis-CI for the combos of os x and linux with ldc and dmd.


--Jon


Re: Help Required on Getopt

2017-09-01 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 1 September 2017 at 19:04:39 UTC, Daniel Kozak wrote:
I have same issue.  How this help you?  Catching exception does 
not help. How do I catch exception and still print help message?


Your are correct, sorry about that. What my response showed is 
how to avoid printing the full stack trace and instead printing a 
more nicely formatted error message. And separately, how to print 
formatted help. But, you are correct in that you can't directly 
print the formatted help text from the catch block as shown.


In particular, the GetoptResult returned by getopt is not 
available. I don't have any examples that try to work around 
this. Presumably one could call getopt again to get the options 
list, then generate the formatted help. It'd be an annoyance, 
though perhaps judicious use of AliasSeq might make the code 
structure reasonable.


--Jon


Re: Help Required on Getopt

2017-09-01 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 1 September 2017 at 13:13:39 UTC, Vino.B wrote:

Hi All,

  When i run the below program without any arguments "D1.d -r" 
it is throwing error, but i need it to show the help menu


[snip...]


Hi Vino,

To get good error message behavior you need to put the construct 
in a try-catch block. Then you can choose how to respond. An 
example here: 
https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-append/src/tsv-append.d#L138-L194. This code prints outs the error message from the exception. In your case: "Missing value for argument -r.". But, you could also print out the help text as well. There is an example of that as well in the above code block, look for the 'if (r.helpWanted)' test.


--Jon


Re: General performance tip about possibly using the GC or not

2017-08-28 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 29 August 2017 at 00:52:11 UTC, Cecil Ward wrote:
I am vacillating - considering breaking a lifetime's C habits 
and letting the D garbage collector make life wonderful by just 
cleaning up after me and ruining my future C disciple by not 
deleting stuff myself.


The tsv command line tools I open-sourced haven't any problems 
with GC. They are only one type of app, perhaps better suited to 
GC than other apps, but still, it is a reasonable data point. 
I've done rather extensive benchmarking against similar tools 
written in native languages, mostly C. The D tools were faster, 
often by significant margins. The important part is not that they 
were faster on any particular benchmark, but that they did well 
against a fair variety of tools written by a fair number of 
different programmers, including several standard unix tools. The 
tools were programmed using the standard library where possible, 
without resorting to low-level optimizations.


I don't know if the exercise says anything about GC vs manual 
memory management from the perspective of maximum possible code 
optimization. But, I do think it is suggestive of benefits that 
may occur in more regular programming, in that GC allows you to 
spend more time on other aspects of your program, and less time 
on memory management details.


That said, all the caveats, suggestions, etc. given by others in 
this thread apply to my programs to. GC is hardly a free lunch.


Benchmarks on the tsv utilities: 
https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md


Blog post describing some of the techniques used: 
https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/


--Jon


Re: std.range.put vs R.put: Best practices?

2017-08-21 Thread Jon Degenhardt via Digitalmars-d-learn

On Monday, 21 August 2017 at 05:58:01 UTC, Jonathan M Davis wrote:
On Monday, August 21, 2017 02:34:23 Mike Parker via 
Digitalmars-d-learn wrote:
On Sunday, 20 August 2017 at 18:08:27 UTC, Jon Degenhardt 
wrote:

> Documentation for std.range.put
> (https://dlang.org/phobos/std_range_primitives.html#.put) has
>
> the intriguing line:
>> put should not be used "UFCS-style", e.g. r.put(e). Doing 
>> this
>> may call R.put directly, by-passing any transformation 
>> feature

>> provided by Range.put. put(r, e) is prefered.
>
> This raises the question of whether std.range.put is always 
> preferred over calling an output range's 'put' method, or if 
> there are times when calling an output range's 'put' method 
> directly is preferred. Also, it seems an easy oversight to 
> unintentionally call the wrong one.

>
> Does anyone have recommendations or best practice 
> suggestions for which form to use and when?


It's recommended to always use the utility function in 
std.range unless you are working with an output range that has 
a well known put implementation. The issue is that put can be 
implemented to take any number or type of arguments, but as 
long as it has an implementation with one parameter of the 
range's element type, then the utility function will do the 
right thing internally whether you pass multiple elements, a 
single element, an array... It's particularly useful in 
generic code where most ranges are used. But again, if you are 
working with a specific range type then you can do as you 
like. Also, when the output range is a dynamic array, UFCS 
with the utility function is fine.


As for mitigating the risk of calling the wrong one, when you 
do so you'll either get a compile-time error because of a 
parameter mismatch or it will do the right thing. If there's 
another likely outcome, I'm unaware of it.


To add to that, the free function put handles putting different 
character types to a range of characters (IIRC, it also handles 
putting entire strings as well), whereas a particular 
implementation of put probably doesn't. In principle, a 
specific range type could do everything that the free function 
does, but it's highly unlikely that it will.


In general, it's really just better to use the free function 
put, and arguably, we should have used a different function 
name for the output ranges themselves with the idea that the 
free function would always be the one called, and it would call 
the special function that the output ranges defined. 
Unfortunately, however, that's not how it works. In general, 
IMHO, output ranges really weren't thought out well enough. 
It's more like they were added as a countepart to input ranges 
because Andrei felt like they needed to be there rather than 
having them be fully fleshed out on their own. The result is a 
basic idea that's very powerful but that suffers in the details 
and probably needs at least a minor redesign (e.g. the output 
API has no concept of an output range that's full).


In any case, I'd just suggest that you never use put with UFCS. 
Unfortunately, if you're using UFCS enough, it becomes habit to 
just call the function as if it were a member function, which 
is then a problem when using output ranges, but we're kind of 
stuck at this point. On the bright side, it's really only 
likely to cause issues in generic code where the member 
function might work with your tests but not everything that's 
passed to it. In other cases, if what you're doing doesn't work 
with the member function, then the code won't compile, and 
you'll know to switch to using the free function.




Mike, Jonathan - Thanks for the detailed responses!

Yes, by habit I use UFCS, there is where potential for the wrong 
call comes from. I agree also that output ranges are very 
powerful in concept, but the details are not fully fleshed out at 
this point. A few enhancements could make it much more compelling.


--Jon


std.range.put vs R.put: Best practices?

2017-08-20 Thread Jon Degenhardt via Digitalmars-d-learn
Documentation for std.range.put 
(https://dlang.org/phobos/std_range_primitives.html#.put) has the 
intriguing line:


put should not be used "UFCS-style", e.g. r.put(e). Doing this 
may call R.put directly, by-passing any transformation feature 
provided by Range.put. put(r, e) is prefered.


This raises the question of whether std.range.put is always 
preferred over calling an output range's 'put' method, or if 
there are times when calling an output range's 'put' method 
directly is preferred. Also, it seems an easy oversight to 
unintentionally call the wrong one.


Does anyone have recommendations or best practice suggestions for 
which form to use and when?


--Jon


Re: Efficiently streaming data to associative array

2017-08-10 Thread Jon Degenhardt via Digitalmars-d-learn
On Wednesday, 9 August 2017 at 13:36:46 UTC, Steven Schveighoffer 
wrote:

On 8/8/17 3:43 PM, Anonymouse wrote:
On Tuesday, 8 August 2017 at 16:00:17 UTC, Steven 
Schveighoffer wrote:
I wouldn't use formattedRead, as I think this is going to 
allocate temporaries for a and b.


What would you suggest to use in its stead? My use-case is 
similar to the OP's in that I have a string of tokens that I 
want split into variables.


using splitter(","), and then parsing each field using 
appropriate function (e.g. to!)


For example, the OP's code, I would do:

auto r = line.splitter(",");
a = r.front;
r.popFront;
b = r.front;
r.popFront;
c = r.front.to!int;

It would be nice if formattedRead didn't use appender, and 
instead sliced, but I'm not sure it can be fixed.


Note, one could make a template that does this automatically in 
one line.


-Steve


The blog post Steve referred to has examples of this type 
processing while iterating over lines in a file. A couple 
different ways to access the elements are shown. AA access is 
addressed also: 
https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/


--Jon


Re: Too slow readln

2017-07-16 Thread Jon Degenhardt via Digitalmars-d-learn

On Sunday, 16 July 2017 at 17:03:27 UTC, unDEFER wrote:

[snip]

How to write in D grep not slower than GNU grep?


GNU grep is pretty fast, it's tough to beat it reading one line 
at a time. That's because it can play a bit of a trick and do the 
initial match ignoring line boundaries and correct line 
boundaries later. There's a good discussion in this thread ("Why 
GNU grep is fast" by Mike Haertel): 
https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html


--Jon


Re: Getopt default int init and zero

2017-05-20 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 19 May 2017 at 12:09:38 UTC, Suliman wrote:
I would like to check if user specified `0` as getopt 
parameter. But the problem that `int`'s are default in `0`. So 
if user did not specified nothing `int x` will be zero, and all 
other code will work as if it's zero.


One way to do this is the use a callback function or delegate. 
Have the callback set both the main variable and a boolean 
tracking whether the option was entered.


--Jon


Re: Processing a gzipped csv-file by line-by-line

2017-05-11 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 10 May 2017 at 22:20:52 UTC, Nordlöw wrote:
What's fastest way to on-the-fly-decompress and process a 
gzipped csv-fil line by line?


Is it possible to combine

http://dlang.org/phobos/std_zlib.html

with some stream variant of

File(path).byLineFast

?


I was curious what byLineFast was, I'm guessing it's from here: 
https://github.com/biod/BioD/blob/master/bio/core/utils/bylinefast.d.


I didn't test it, but it appears it may pre-date the speed 
improvements made to std.stdio.byLine perhaps a year and a half 
ago. If so, it might be worth comparing it to the current Phobos 
version, and of course iopipe.


As mentioned in one of the other replies, byLine and variants 
aren't appropriate for CSV with escapes. For that, a real CSV 
parser is needed. As an alternative, run a converter that 
converts from csv to another format.


--Jon


Re: Command Line Parsing

2017-04-15 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 12 April 2017 at 09:51:34 UTC, Russel Winder wrote:
Are Argon https://github.com/markuslaker/Argon or darg  
https://github. com/jasonwhite/darg getting traction as the 
default command line handling system for D or are they just 
peripheral and everyone just uses std.getopt 
https://dlang.org/phobos/std_getopt.html ?


I use std.getopt in my tools. Overall it's pretty good, and the 
reliability of a package in the standard library has value. That 
said, I've bumped up against it's limits, and looking at the 
code, it's not clear to how extend it to more advanced use cases. 
There may be a case for introducing a next generation package.


--Jon


Re: length = 0 clears reserve

2017-04-11 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 11 April 2017 at 20:00:48 UTC, Jethro wrote:

On Tuesday, 11 April 2017 at 03:00:29 UTC, Jon Degenhardt wrote:
On Tuesday, 11 April 2017 at 01:59:57 UTC, Jonathan M Davis 
wrote:
On Tuesday, April 11, 2017 01:42:32 Jethro via 
Digitalmars-d-learn wrote:

[...]


You can't reuse the memory of a dynamic array by simply 
setting its length to 0. If that were allowed, it would risk 
allow dynamic arrays to stomp on each others memory (since 
there is no guarantee that there are no other dynamic arrays 
referring to the same memory). However, if you know that 
there are no other dynamic arrays referrin to the same 
memory, then you can call assumeSafeAppend on the dynamic 
array, and then the runtime will assume that there are no 
other dynamic arrays referring to the same memory.


[snip]


Another technique that works for many cases is to use an 
Appender (std.array). Appender supports reserve and clear, the 
latter setting the length to zero without reallocating. A 
typical use case is an algorithm doing a series of appends, 
then setting the length to zero and starts appending again.


--Jon


Appender reports clear? Are you sure?

Seems appender is no different than string, maybe worse? string 
as assumeSafeAppend, reserve and clear(although clear 
necessarily reallocates. They should have a function called 
empty, which resets the length to zero but doesn't reallocate.


See the Appender.clear documentation 
(https://dlang.org/phobos/std_array.html#.Appender.clear), the 
key piece being:


Clears the managed array. This allows the elements of the 
array to be reused for appending.


I've tried using both dynamic array and appender in this way, 
setting the length of the dynamic array to zero vs using 
Appender.clear, in this cycle of fill-the-array by appending, 
operate on the array, clearing, and repeating. Appender is 
dramatically faster. And, if you look at GC reports you find that 
setting a dynamic array to zero creates garbage to collect, while 
Appender.clear does not. (Use the --DRT-gcopt=profile:1 command 
line option to see GC reports, described here: 
https://dlang.org/spec/garbage.html#gc_config).




Re: length = 0 clears reserve

2017-04-10 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 11 April 2017 at 01:59:57 UTC, Jonathan M Davis wrote:
On Tuesday, April 11, 2017 01:42:32 Jethro via 
Digitalmars-d-learn wrote:
arrays have the ability to reserve but when setting the length 
to 0, it removes the reserve!! ;/


char[] buf;
buf.reserve = 1000;
buf.length = 0;
assert(buf.capacity == 0);

But I simply want to clear the buffer, not change it's 
reserve/capacity.


I've tried to hack by setting the length to 0 through a 
pointer, but that still clears the capacity!


I want to do this because I want to be able to reuse the array 
without ever reallocating(I'll set the capacity to the max 
that will ever be used, I don't have to worry about conflicts 
since it will always be ran serially).


[snip]


You can't reuse the memory of a dynamic array by simply setting 
its length to 0. If that were allowed, it would risk allow 
dynamic arrays to stomp on each others memory (since there is 
no guarantee that there are no other dynamic arrays referring 
to the same memory). However, if you know that there are no 
other dynamic arrays referrin to the same memory, then you can 
call assumeSafeAppend on the dynamic array, and then the 
runtime will assume that there are no other dynamic arrays 
referring to the same memory.


[snip]


Another technique that works for many cases is to use an Appender 
(std.array). Appender supports reserve and clear, the latter 
setting the length to zero without reallocating. A typical use 
case is an algorithm doing a series of appends, then setting the 
length to zero and starts appending again.


--Jon


Re: pointer not aligned

2017-04-02 Thread Jon Degenhardt via Digitalmars-d-learn

On Friday, 31 March 2017 at 04:41:10 UTC, Joel wrote:

Linking...
ld: warning: pointer not aligned at address 0x10017A4C9 
(_D30TypeInfo_AxS3std4file8DirEntry6__initZ + 16 from 
.dub/build/application-debug-posix.osx-x86_64-dmd_2072-EFDCDF4D45F944F7A9B1AEA5C32F81ED/spellit.o)

...

and this goes on forever!


Issue: https://issues.dlang.org/show_bug.cgi?id=17289


Re: Output range and writeln style functions

2017-01-23 Thread Jon Degenhardt via Digitalmars-d-learn

On Monday, 23 January 2017 at 22:20:59 UTC, Ali Çehreli wrote:

On 01/23/2017 12:48 PM, Jon Degenhardt wrote:
[snip]
> So, what I'm really wondering is if there is built-in  way

 to get closer to:

  outputStream.writefln(...);



If it's about formatted output then perhaps formattedWrite?

  https://dlang.org/phobos/std_format.html#.formattedWrite

The same function is used with stdout and an Appender:

[snip]

Ali


Oh, that is better, thanks!

--Jon



Re: Output range and writeln style functions

2017-01-23 Thread Jon Degenhardt via Digitalmars-d-learn

On Monday, 23 January 2017 at 08:03:14 UTC, Ali Çehreli wrote:

On 01/22/2017 01:54 PM, Jon Degenhardt wrote:
I've been increasingly using output ranges in my code (the 
"component
programming" model described in several articles on the D 
site). It
works very well, except that it would often be more convenient 
to use
writeln style functions rather than 'put'. Especially when you 
start by
drafting a sketch of code using writeln functions, then 
convert it an

output range.

Seems an obvious thing, I'm wondering if I missed something. 
Are there

ways to use writeln style functions with output ranges?

--Jon


I don't think I understand the question. :)

If you need a variadic put(), then I've come up with the 
following mildly tested AllAppender. Just as a reminder, I've 
also used std.range.tee that allows tapping into the stream to 
see what's flying through:


[snip]

Ali


So I guess the is answer is "no" :)

It's mainly about consistency of the output primitives. Includes 
variadic args, formatting, and names of the primitives. I keep 
finding myself starting with something like:


void writeLuckyNumber(string name, int luckyNumber)
   {
   writefln("Hello %s, your lucky number is %d", name, 
luckyNumber);

   }

and then re-factoring it as:

   void writeLuckyNumber(OutputRange)
   (OutputRange outputStream, string name, int luckyNumber)
   if (isOutputRange!(OutputRange, char))
   {
   import std.format;
   outputStream.put(
   format("Hello %s, your lucky number is %d\n", name, 
luckyNumber));

   }

Not bad, but the actual output statements are a bit harder to 
read, especially if people reading your code are not familiar 
with output ranges. So, what I'm really wondering is if there is 
built-in way to get closer to:


  outputStream.writefln(...);

 that I've overlooked.


--Jon


Output range and writeln style functions

2017-01-22 Thread Jon Degenhardt via Digitalmars-d-learn
I've been increasingly using output ranges in my code (the 
"component programming" model described in several articles on 
the D site). It works very well, except that it would often be 
more convenient to use writeln style functions rather than 'put'. 
Especially when you start by drafting a sketch of code using 
writeln functions, then convert it an output range.


Seems an obvious thing, I'm wondering if I missed something. Are 
there ways to use writeln style functions with output ranges?


--Jon


Re: compile-time test against dmd/phobos version number

2017-01-06 Thread Jon Degenhardt via Digitalmars-d-learn

On Saturday, 7 January 2017 at 02:41:54 UTC, ketmar wrote:
On Saturday, 7 January 2017 at 02:30:53 UTC, Jon Degenhardt 
wrote:
Is there a way to make a compile time check against the 
dmd/phobos version number? Functionally, what I'd like to 
achieve would be equivalent to:


version(dmdVersion >= 2.070.1)
{

}
else
{
...
}


static if (__VERSION__ == 2072) { wow, it's dmd 2.072! }


Perfect, thank you!


compile-time test against dmd/phobos version number

2017-01-06 Thread Jon Degenhardt via Digitalmars-d-learn
Is there a way to make a compile time check against the 
dmd/phobos version number? Functionally, what I'd like to achieve 
would be equivalent to:


version(dmdVersion >= 2.070.1)
{

}
else
{
...
}

I think I've seen something like this, probably using 'static 
if', but can't find it now. What I'm really trying to do is test 
for existence of a specific enhancement in phobos, if it's 
present, use it, otherwise don't. Testing for a particular phobos 
release number seems the obvious thing to do.


--Jon


Re: Constructing a variadic template parameter with source in two files

2016-12-22 Thread Jon Degenhardt via Digitalmars-d-learn

On Thursday, 22 December 2016 at 07:33:42 UTC, Ali Çehreli wrote:

On 12/21/2016 07:59 PM, Jon Degenhardt wrote:

> construct the 'opts' parameter from
> definitions stored in two or more files. The reason for doing
this is to
> create a customization mechanism where-by there are a number
of default
> capabilities built-in to the main code base, but someone can
customize
> their copy of the code, putting definitions in a separate
file, and have
> it added in at compile time, including modifying command line
arguments.

I'm not sure this is any better than your mixin solution but 
getopt can be called multiple times on the same arguments. So, 
for example common code can parse them for its arguments and 
special code can parse them for its arguments. [...]


Yes, that might work, thanks. I'll need to work on the code 
structure a bit (there are a couple other nuances to account 
for), but might be able to make it work. The mixin approach feels 
a bit brittle.


--Jon



Constructing a variadic template parameter with source in two files

2016-12-21 Thread Jon Degenhardt via Digitalmars-d-learn
I'd like to find a way to define programming constructs in one 
file and reference them in a getopt call defined in another file. 
getopt uses variadic template argument, so the argument list must 
be known at compile time. The std.getopt.getopt signature:


 GetoptResult getopt(T...)(ref string[] args, T opts)

So, what I'm trying to do is construct the 'opts' parameter from 
definitions stored in two or more files. The reason for doing 
this is to create a customization mechanism where-by there are a 
number of default capabilities built-in to the main code base, 
but someone can customize their copy of the code, putting 
definitions in a separate file, and have it added in at compile 
time, including modifying command line arguments.


I found a way to do this with a mixin template, shown below. 
However, it doesn't strike me as a particularly modular design. 
My question - Is there a better approach?


The solution I identified is below. The '--say-hello' option is 
built-in (defined in app.d), the '--say-hello-world' command is 
defined in custom_commands.d. Running:


$ ./app --say-hello --say-hello-world

will print:

 Hello
 Hello World

Which is the goal. But, is there a better way? Help appreciated.

--Jon

=== command_base.d ===
/* API for defining "commands". */
interface Command
{
string exec();
}

class BaseCommand : Command
{
private string _result;
this (string result) { _result = result; }
final string exec() { return _result; }
}

=== custom_commands.d ===
/* Defines custom commands and a mixin for generating the getopt 
argument.
 * Note that 'commandArgHandler' is defined in app.d, not visible 
in this file.

 */
import command_base;

class HelloWorldCommand : BaseCommand
{
this() { super("Hello World"); }
}

mixin template CustomCommandDeclarations()
{
import std.meta;

auto pHelloWorldHandler = 
!HelloWorldCommand;


alias CustomCommandOptions = AliasSeq!(
"say-hello-world",  "Print 'hello world'.", 
pHelloWorldHandler,

);
}

=== app.d ===
/* This puts it all together. It creates built-in commands and 
uses the mixin from
 * custom_commands.d to declare commands and construct the getopt 
argument.

 */
import std.stdio;
import command_base;

class HelloCommand : BaseCommand
{
this() { super("Hello"); }
}

struct CmdOptions
{
import std.meta;
Command[] commands;

void commandArgHandler(DerivedCommand : BaseCommand)()
{
commands ~= new DerivedCommand();
}

bool processArgs (ref string[] cmdArgs)
{
import std.getopt;
import custom_commands;

auto pHelloHandler = !HelloCommand;

alias BuiltinCommandOptions = AliasSeq!(
"say-hello",  "Print 'hello'.", pHelloHandler,
);

mixin CustomCommandDeclarations;
auto CommandOptions = AliasSeq!(BuiltinCommandOptions, 
CustomCommandOptions);

auto r = getopt(cmdArgs, CommandOptions);
if (r.helpWanted) defaultGetoptPrinter("Options:", 
r.options);
return !r.helpWanted;  // Return true if execution should 
continue

}
}

void main(string[] cmdArgs)
{
CmdOptions cmdopt;

if (cmdopt.processArgs(cmdArgs))
foreach (cmd; cmdopt.commands)
writeln(cmd.exec());
}



Re: [Semi-OT] I don't want to leave this language!

2016-12-07 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 7 December 2016 at 16:33:03 UTC, bachmeier wrote:
On Wednesday, 7 December 2016 at 12:12:56 UTC, Ilya Yaroshenko 
wrote:


R, Matlab, Python, Mathematica, Gauss, and Julia use C libs. 
--Ilya


You can call into those same C libs using D. Only if you want a 
pure D solution do you need to be able to rewrite those 
libraries and get the same performance. D is a fine solution 
for the academic or the working statistician that is doing 
day-to-day analysis. The GC and runtime are not going to be an 
obstacle for most of them (and most won't even know anything 
about them).


A cycle I think is common is for a researcher (industry or 
academic) to write functionality in native R code, then when 
trying to scale it, finds native R code is too slow, and switches 
to C/C++ to create a library used in R. C/C++ is chosen not 
because it the preferred choice, but because it is the common 
choice.


In such situations, the performance need is often to be quite a 
bit faster than native R code, not that it reach zero overhead. 
My personal opinion, but I do think D would be a very good choice 
here, run-time, phobos, gc, etc., included. The larger barrier to 
entry is more about ease of getting started, community (are 
others using this approach), etc., and less about having the 
absolutely most optimal performance. (There are obviously areas 
where the most optimal performance is critical, Mir seems to be 
targeting a number of them.)


For D to compete directly with R, Python, Julia, in these 
communities then some additional capabilities are probably 
needed, like a repl, standard scientific packages, etc.


Re: Impressed with Appender - Is there design/implementation description?

2016-12-06 Thread Jon Degenhardt via Digitalmars-d-learn
On Tuesday, 6 December 2016 at 15:29:59 UTC, Jonathan M Davis 
wrote:
On Tuesday, December 06, 2016 13:19:22 Anonymouse via 
Digitalmars-d-learn wrote:

On Tuesday, 6 December 2016 at 10:52:44 UTC, thedeemon wrote:

[...]

> 2. Up until 4 KB it reallocates when growing, but after 4 KB 
> the array lives in a larger pool of memory where it can 
> often grow a lot without reallocating, so in many scenarios 
> where other allocations do not interfere, the data array of 
> appender grows in place without copying any data, thanks to 
> GC.extend() method.


I always assumed it kept its own manually allocated array on a 
malloc heap :O


No. The main thing that Appender does is reduce the number of 
checks required for whether there's room for the array to 
append in place, because that check is a good chunk of why ~= 
is expensive for arrays.

[...]


Thanks everyone for the explanations. I should probably look into 
my data and see how often I'm reaching the 4kb size triggering 
GC.extend() use.


--Jon



Impressed with Appender - Is there design/implementation description?

2016-12-04 Thread Jon Degenhardt via Digitalmars-d-learn
I've been using Appender quite a bit recently, typically when I 
need append-only arrays with variable and unknown final sizes. I 
had been prepared to write a custom data structure when I started 
using it with large amounts of data, but very nicely this has not 
surfaced as a need. Appender has held up quite well.


I haven't actually benchmarked it against competing data 
structures, nor have I studied the implementation. I'd be very 
interested in understanding the design and how it compares to 
other data structures. Are there any write-ups or articles 
describing it?


--Jon


Re: passing static arrays to each! with a ref param [Re: Why can't static arrays be sorted?]

2016-10-11 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 11 October 2016 at 19:46:31 UTC, Jon Degenhardt wrote:

On Tuesday, 11 October 2016 at 18:18:41 UTC, ag0aep6g wrote:

On 10/11/2016 06:24 AM, Jon Degenhardt wrote:
The example I gave uses ref parameters. On the surface it 
would seem
reasonable to that passing a static array by ref would allow 
it to be

modified, without having to slice it first.


Your ref parameters are only for the per-element operations. 
You're not passing the array as a whole by reference. And you 
can't, because `each` itself takes the whole range by copy.


So, the by-ref increments themselves do work, but they're 
applied to a copy of your original static array.


I see. Thanks for the explanation. I wasn't thinking it through 
properly. Also, I guess I had assumed that the intent was that 
each! be able to modify the elements, and therefore the whole 
array it would be pass by reference, but didn't consider it 
properly.


Another perspective where the current behavior could be confusing 
is that it is somewhat natural to assume that 'each' is the 
functional equivalent of foreach, and that they can be used 
interchangeably. However, for static arrays they cannot be.





Re: passing static arrays to each! with a ref param [Re: Why can't static arrays be sorted?]

2016-10-11 Thread Jon Degenhardt via Digitalmars-d-learn

On Tuesday, 11 October 2016 at 18:18:41 UTC, ag0aep6g wrote:

On 10/11/2016 06:24 AM, Jon Degenhardt wrote:
The example I gave uses ref parameters. On the surface it 
would seem
reasonable to that passing a static array by ref would allow 
it to be

modified, without having to slice it first.


Your ref parameters are only for the per-element operations. 
You're not passing the array as a whole by reference. And you 
can't, because `each` itself takes the whole range by copy.


So, the by-ref increments themselves do work, but they're 
applied to a copy of your original static array.


I see. Thanks for the explanation. I wasn't thinking it through 
properly. Also, I guess I had assumed that the intent was that 
each! be able to modify the elements, and therefore the whole 
array it would be pass by reference, but didn't consider it 
properly.


I'm not going to make any suggestions about whether the behavior 
should be changed. At some point when I get a bit of time I'll 
try to submit a documentation change to make the current behavior 
clearer.


--Jon


passing static arrays to each! with a ref param [Re: Why can't static arrays be sorted?]

2016-10-10 Thread Jon Degenhardt via Digitalmars-d-learn
On Monday, 10 October 2016 at 16:46:55 UTC, Jonathan M Davis 
wrote:
On Monday, October 10, 2016 16:29:41 TheGag96 via 
Digitalmars-d-learn wrote:
On Saturday, 8 October 2016 at 21:14:43 UTC, Jon Degenhardt 
wrote:
> This distinction is a bit on the nuanced side. Is it 
> behaving as it should?

>
> --Jon

I think so? It's not being modified in the second case because 
the array is being passed by value... "x" there is a reference 
to an element of the copy created to be passed to each(). I 
assume there's a good reason why ranges in general are passed 
by value into these functions -- except in this one case, the 
stuff inside range types copied when passed by value won't be 
whole arrays, I'm guessing.


Whether it's by value depends entirely on the type of the 
range. They're passed around, and copying them has whatever 
semantics it has. In most cases, it copies the state of the 
range but doesn't copy all of the elements (e.g. that's what 
happens with a dynamic array, since it gets sliced). But if a 
range is a class, then it's definitely a reference type.  The 
only way to properly save the state of a range is to call save.


But passing by ref would make no sense at all with input 
ranges. It would completely kill chaining them. Almost all 
range-based functions return rvalues.


- Jonathan M Davis


The example I gave uses ref parameters. On the surface it would 
seem reasonable to that passing a static array by ref would allow 
it to be modified, without having to slice it first. The 
documentation says:


// If the range supports it, the value can be mutated in place
   arr.each!((ref n) => n++);
   assert(arr == [1, 2, 3, 4, 5]);

but, 'arr' is a dynamic array, so technically it's not describing 
a static array (the opApply case).


Expanding the example, using foreach with ref parameters will 
modify the static array in place, without slicing it. I would 
have expected each! with a ref parameter to behave the same.


At a minimum this could be better documented, but it may also be 
a bug.


Example:

T increment(T)(ref T x) { return x++; }

void main()
{
import std.algorithm : each;

int[] dynamicArray = [1, 2, 3, 4, 5];
int[5] staticArray = [1, 2, 3, 4, 5];

dynamicArray.each!(x => x++); // Dynamic array by 
value

assert(dynamicArray == [1, 2, 3, 4, 5]);  // ==> Not modified

dynamicArray.each!((ref x) => x++);   // Dynamic array by 
ref

assert(dynamicArray == [2, 3, 4, 5, 6]);  // ==> Modified

staticArray[].each!((ref x) => x++);  // Slice of static 
array, by ref

assert(staticArray == [2, 3, 4, 5, 6]);   // ==> Modified

staticArray.each!((ref x) => x++);// Static array by 
ref

assert(staticArray == [2, 3, 4, 5, 6]);   // ==> Not Modified

/* Similar to above, using foreach and ref params. */
foreach (ref x; dynamicArray) x.increment;
assert(dynamicArray == [3, 4, 5, 6, 7]);  // Dynamic array => 
Modified


foreach (ref x; staticArray[]) x.increment;
assert(staticArray == [3, 4, 5, 6, 7]);   // Static array 
slice => Modified


foreach (ref x; staticArray) x.increment;
assert(staticArray == [4, 5, 6, 7, 8]);   // Static array => 
Modified

}



Re: Why can't static arrays be sorted?

2016-10-08 Thread Jon Degenhardt via Digitalmars-d-learn

On Thursday, 6 October 2016 at 20:11:17 UTC, ag0aep6g wrote:

On 10/06/2016 09:54 PM, TheGag96 wrote:
Interestingly enough, I found that using .each() actually 
compiles

without the []

[...]

why can the compiler consider it a range here but not
.sort()?


each is not restricted to ranges. It accepts other 
`foreach`-ables, too. The documentation says that it "also 
supports opApply-based iterators", but it's really anything 
that foreach accepts.

  [snip]

Thanks! Explains some things. I knew each! was callable in 
different circumstances than other functional operations, but 
hadn't quite figured it out. Looks like reduce! and fold! also 
take iterables.


There also appears to be a distinction between the iterator and 
range cases when a ref parameter is used. As it iterator, each! 
won't modify the reference. Example:


void main()
{
import std.algorithm : each;

int[] dynamicArray = [1, 2, 3, 4, 5];
int[5] staticArray = [1, 2, 3, 4, 5];

dynamicArray.each!((ref x) => x++);
assert(dynamicArray == [2, 3, 4, 5, 6]); // modified

staticArray.each!((ref x) => x++);
assert(staticArray == [1, 2, 3, 4, 5]);  // not modified

staticArray[].each!((ref x) => x++);
assert(staticArray == [2, 3, 4, 5, 6]);  // modified
}

This distinction is a bit on the nuanced side. Is it behaving as 
it should?


--Jon


Re: Iterate over two arguments at once

2016-09-19 Thread Jon Degenhardt via Digitalmars-d-learn

On Monday, 19 September 2016 at 18:10:22 UTC, bachmeier wrote:

Suppose I want to iterate over two arrays at once:

foreach(v1, v2; [1.5, 2.5, 3.5], [4.5, 5.5, 6.5]) {
  ...
}

I have seen a way to do this but cannot remember what it is and 
cannot find it.


range.lockstep:  https://dlang.org/phobos/std_range.html#lockstep


Re: Instantiating a class with range template parameter

2016-09-08 Thread Jon Degenhardt via Digitalmars-d-learn
On Thursday, 8 September 2016 at 08:44:54 UTC, Lodovico Giaretta 
wrote:
On Thursday, 8 September 2016 at 08:20:49 UTC, Jon Degenhardt 
wrote:


[snip]



I think that

auto x = new Derived!(typeof(stdout.lockingTextWriter()))(); // 
note the parenthesis


should work.



But usually, you save the writer inside the object and make a 
free function called `derived` (same as the class, but with 
lowercase first). You define it this way:


auto derived(OutputRange)(auto ref OutputRange writer)
{
auto result = new Derived!OutputRange();
result.writer = writer; // save the writer in a field of 
the object

return result;
}

void main()
{
auto x = derived(stdout.lockingTextWriter);
x.writeString("Hello world");   // the writer is saved in 
the object, no need to pass it

}


Yes, the form you suggested works, thanks! And thanks for the 
class structuring suggestion, it has some nice properties.


Instantiating a class with range template parameter

2016-09-08 Thread Jon Degenhardt via Digitalmars-d-learn
I've been generalizing output routines by passing an OutputRange 
as an argument. This gets interesting when the output routine is 
an virtual function. Virtual functions cannot be templates, so 
instead the template parameters need to be part of class 
definition and specified when instantiating the class.


An example is below. It works fine. One thing I can't figure out: 
how to provide the range parameter without first declaring a 
variable of the appropriate type. What works is something like:


auto writer = stdout.lockingTextWriter;
auto x = new Derived!(typeof(writer));

Other forms I've tried fail to compile. For example, this fails:

auto x = new Derived!(typeof(stdout.lockingTextWriter));

I'm curious if this can be done without declaring the variable 
first. Anyone happen to know?


--Jon

Full example:

import std.stdio;
import std.range;

class Base(OutputRange)
{
abstract void writeString(OutputRange r, string s);
}

class Derived(OutputRange) : Base!OutputRange
{
override void writeString(OutputRange r, string s)
{
put(r, s);
put(r, '\n');
}
}

void main()
{
auto writer = stdout.lockingTextWriter;
auto x = new Derived!(typeof(writer));
x.writeString(writer, "Hello World");
}



Re: Template constraints for reference/value types?

2016-09-06 Thread Jon Degenhardt via Digitalmars-d-learn
On Wednesday, 7 September 2016 at 00:40:27 UTC, Jonathan M Davis 
wrote:
On Tuesday, September 06, 2016 21:16:05 Jon Degenhardt via 
Digitalmars-d-learn wrote:

On Tuesday, 6 September 2016 at 21:00:53 UTC, Lodovico Giaretta

wrote:
> On Tuesday, 6 September 2016 at 20:46:54 UTC, Jon Degenhardt
>
> wrote:
>> Is there a way to constrain template arguments to reference 
>> or value types? I'd like to do something like:

>>
>> T foo(T)(T x)
>>
>> if (isReferenceType!T)
>>
>> { ... }
>>
>> --Jon
>
> You can use `if(is(T : class) || is(T : interface))`.
>
> If you also need other types, std.traits contains a bunch of 
> useful templates: isArray, isAssociativeArray, isPointer, ...


Thanks. This looks like a practical approach.


It'll get you most of the way there, but I don't think that 
it's actually possible to test for reference types in the 
general case


[snip]

- Jonathan M Davis


Thanks, very helpful. I've concluded that what I wanted to do 
isn't worth pursuing at the moment (see the thread on associative 
arrays in the General forum). However, your description is 
helpful to understand the details involved.


Re: Template constraints for reference/value types?

2016-09-06 Thread Jon Degenhardt via Digitalmars-d-learn
On Tuesday, 6 September 2016 at 21:00:53 UTC, Lodovico Giaretta 
wrote:
On Tuesday, 6 September 2016 at 20:46:54 UTC, Jon Degenhardt 
wrote:
Is there a way to constrain template arguments to reference or 
value types? I'd like to do something like:


T foo(T)(T x)
if (isReferenceType!T)
{ ... }

--Jon


You can use `if(is(T : class) || is(T : interface))`.

If you also need other types, std.traits contains a bunch of 
useful templates: isArray, isAssociativeArray, isPointer, ...


Thanks. This looks like a practical approach.


Template constraints for reference/value types?

2016-09-06 Thread Jon Degenhardt via Digitalmars-d-learn
Is there a way to constrain template arguments to reference or 
value types? I'd like to do something like:


T foo(T)(T x)
if (isReferenceType!T)
{ ... }

--Jon


Re: Why D isn't the next "big thing" already

2016-07-31 Thread Jon Degenhardt via Digitalmars-d-learn

On Saturday, 30 July 2016 at 22:52:23 UTC, bachmeier wrote:

On Saturday, 30 July 2016 at 12:30:55 UTC, LaTeigne wrote:

On Saturday, 30 July 2016 at 12:24:55 UTC, ketmar wrote:

On Saturday, 30 July 2016 at 12:18:08 UTC, LaTeigne wrote:

it you think that you know the things better than somebody 
who actually *lived* there in those times... well, keep 
thinking that. also, don't forget to teach physics to 
physicians, medicine to medics, and so on. i'm pretty sure 
that you will have a great success as a stupidiest comic they 
ever seen in their life.


also, don't bother answering me, i won't see it anyway.


Fucking schyzo ;)
Have you took your little pills today ?


Well this is beautiful marketing for the language. At some 
point, the leadership will need to put away ideology and get 
realistic about what belongs on this site.


I agree with this sentiment. One of D's strengths is the helpful 
responses on the Learn forum. It is something the D community can 
be proud of. Participants in such personal attacks may view it as 
primarily as a 1-1 interchange, but they do take away from this 
strength. Better would be to move personal conflicts to some 
other venue.


Re: Is there a way to clear an OutBuffer?

2016-05-25 Thread Jon Degenhardt via Digitalmars-d-learn

On Wednesday, 25 May 2016 at 19:42:43 UTC, Gary Willoughby wrote:

On Monday, 23 May 2016 at 03:03:12 UTC, Jon Degenhardt wrote:

Currently not possible. Enhancement request perhaps?

Looking at the implementation, setting its 'offset' member 
seems to work. Based on example from documentation:


import std.outbuffer;

void main() {
OutBuffer b = new OutBuffer();
b.writefln("a%sb", 16);
assert(b.toString() == "a16b\n");

b.offset = 0;
b.writefln("a%sb", 16);
assert(b.toString() == "a16b\n");
}

Bug report perhaps? :)

Ali


Thanks. Enhancement request: 
https://issues.dlang.org/show_bug.cgi?id=16062


Is there a consensus on this? Does this really need a clear 
method seeing as though you can reset the offset directly?


As an end-user, I'd have more confidence using a documented 
mechanism. If it's setting a public member variable, fine, if 
it's a method, also fine.


The 'offset' member is not part of the publicly documented API.  
Looking at the implementation, it doesn't appear 'offset' is 
intended to be part of the API. Personally, I'd add a method to 
keep 'offset' out of the public API. However, simply documenting 
it is an option as well.




Re: Is there a way to clear an OutBuffer?

2016-05-22 Thread Jon Degenhardt via Digitalmars-d-learn

On Sunday, 22 May 2016 at 23:01:07 UTC, Ali Çehreli wrote:

On 05/22/2016 11:59 AM, Jon Degenhardt wrote:
Is there a way to clear an OutBuffer, but without freeing the 
internally
managed buffer? Something similar to std.array.appender.clear 
method.
Intent would be to reuse the OutBuffer, but without 
reallocating memory

for the buffer.

--Jon


Currently not possible. Enhancement request perhaps?

Looking at the implementation, setting its 'offset' member 
seems to work. Based on example from documentation:


import std.outbuffer;

void main() {
OutBuffer b = new OutBuffer();
b.writefln("a%sb", 16);
assert(b.toString() == "a16b\n");

b.offset = 0;
b.writefln("a%sb", 16);
assert(b.toString() == "a16b\n");
}

Bug report perhaps? :)

Ali


Thanks. Enhancement request: 
https://issues.dlang.org/show_bug.cgi?id=16062


Is there a way to clear an OutBuffer?

2016-05-22 Thread Jon Degenhardt via Digitalmars-d-learn
Is there a way to clear an OutBuffer, but without freeing the 
internally managed buffer? Something similar to 
std.array.appender.clear method. Intent would be to reuse the 
OutBuffer, but without reallocating memory for the buffer.


--Jon