Re: Ranges, constantly frustrating
This turned into a bit of a full spec so I would understand if you TL;DR but it would be nice to get some feedback if you have the time.. On Fri, 14 Feb 2014 17:34:46 -, bearophile bearophileh...@lycos.com wrote: Regan Heath: In my case I didn't need any of these. I don't understand. What I meant here is that I don't need the advantages provided by enumerate like the starting index. One thing I am unclear about from your response is what you mean by implicit in this context? Do you mean the process of inferring things (like the types in foreach)? (taken from subsequent reply) Isn't this discussion about adding an index to a range? No, it's not. The counter I want would only be an index if the range was indexable, otherwise it's a count of foreach iterations (starting from 0). This counter is (if you like) an index into the result set which is not necessarily also an index into the source range (which may not be indexable). What we currently have with foreach is an index and only for indexable things. I want to instead generalise this to be a counter which is an index when the thing being enumerated is indexable, otherwise it is a count or index into the result set. Lets call this change scheme #0. It solves my issue, and interestingly also would have meant we didn't need to add byKey or byValue to AA's, instead we could have simply made keys/values indexable ranges and not broken any existing code. Further details of scheme #0 below. (taken from subsequent reply) If you want all those schemes built in a language (and to use them without adding .enumerate) you risk making a mess. In this case explicit is better than implicit. Have a read of what I have below and let me know if you think it's a mess. Scheme #2 has more rules, and might be called a mess perhaps. But, scheme #1 is fairly clean and simple and I think better overall. The one downside is that without some additional syntax it cannot put tuple components nicely in context with descriptive variable names, so there is that. To be fair to all 3 schemes below, they mostly just work for simple cases and/or cases where different types are used for key/values in AA's and tuples. The more complicated rules only kick in to deal with the cases where there is ambiguity (AA's with the same type for key and value and tuples with multiple components of the same type). Anyway, on to the details.. *** Scheme 0) So, what I want is for foreach to simply increment a counter after each call to the body of the foreach, giving me a counter from 0 to N (or infinity/wrap). It would do this when prompted to do so by a variable being supplied in the foreach statement in the usual way (for arrays/opApply) This counter would not be defined/understood to be an index into the object being enumerated necessarily (as it currently is), instead if the object is indexable then it would indeed be an index, otherwise it's a count (index into the result set). I had not been considering associative arrays until now, given current support (without built in tuples) they do not seem to be a special case to me. Foreach over byKey() should look/function identically to foreach over keys, likewise for byValue(). The only difference is that in the byKey()/byValue() case the counter is not necessarily an index into anything, though it would be if the underlying byKey() range was indexable. The syntax for this, is the same as we have for arrays/classes with opApply today. In other words, it just works and my example would compile and run as one might expect. This seems to me to be intuitive, useful and easy to implement. Further, I believe it leaves the door open to having built in tuples (or using library extensions like enumerate()), with similarly clean syntax and no mess. *** So, what if we had built in tuples? Well, seems to me we could do foreach over AAs/tuples in one of 2 ways or even a combination of both: Scheme 1) for AA's/tuples the value given to the foreach body is a voldemort (unnamed) type with a public property member for each component of the AA/tuple. In the case of AA's this would then be key and value, for tuples it might be a, b, .., z, aa, bb, .. and so on. foreach(x; AA) {}// Use x.key and x.value foreach(i, x; AA) {} // Use i, x.key and x.value foreach(int i, x; AA) {} // Use i, x.key and x.value Extra/better: For non-AA tuples we could allow the members to be named using some sort of syntax, i.e. foreach(i, (x.bob, x.fred); AA) {} // Use i, x.bob and x.fred or foreach(i, x { int bob; string fred }; AA) {} // Use i, x.bob and x.fred or foreach(i, new x { int bob; string fred }; AA) {} // Use i, x.bob and x.fred Lets look at your examples re-written for scheme #1 foreach (v; AA) {} foreach (x; AA) { .. use x.value .. } // better? worse? foreach (k, v; AA) {} foreach (x;
Re: Ranges, constantly frustrating
On Fri, 14 Feb 2014 02:48:51 -, Jesse Phillips jesse.k.phillip...@gmail.com wrote: On Thursday, 13 February 2014 at 14:30:41 UTC, Regan Heath wrote: Don't get me wrong, counting the elements as you iterate over them is useful, but it isn't the index into the range you're likely after. Nope, not what I am after. If I was, I'd iterate over the original range instead or keep a line count manually. Maybe a better way to phrase this is, while counting may be what you're implementation needs, it is not immediately obvious what 'i' should be. Someone who desires an index into the original array will expect 'i' to be that; even though it can be explained that .take() is not the same range as the original. Thus it is better to be explicit with the .enumerate function. FWIW I disagree. I think it's immediately and intuitively obvious what 'i' should be when you're foreaching over X items taken from another range, even if you do not know take returns another range. Compare it to calling a function on a range and foreaching on the result, you would intuitively and immediately expect 'i' to relate to the result, not the input. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: Ranges, constantly frustrating
On Friday, 14 February 2014 at 12:10:51 UTC, Regan Heath wrote: FWIW I disagree. I think it's immediately and intuitively obvious what 'i' should be when you're foreaching over X items taken from another range, even if you do not know take returns another range. Compare it to calling a function on a range and foreaching on the result, you would intuitively and immediately expect 'i' to relate to the result, not the input. R How should it behave on ranges without length, such as infinite ranges? Also, `enumerate` has the advantage of the `start` parameter, which usefulness is demonstrated in `enumerate`'s example as well as in an additional example in the bug report. I'm not yet sure whether I think it should be implemented at the language or library level, but I think the library approach has some advantages.
Re: Ranges, constantly frustrating
Regan Heath: FWIW I disagree. I think it's immediately and intuitively obvious what 'i' should be when you're foreaching over X items taken from another range, even if you do not know take returns another range. Compare it to calling a function on a range and foreaching on the result, you would intuitively and immediately expect 'i' to relate to the result, not the input. Using enumerate has several advantages. It gives a bit longer code, but it keeps as much complexity as possible out of the language. So the language gets simpler to implement and its compiler is smaller and simpler to debug. Also, using enumerate is more explicit, if you have an associative array you can iterate it in many ways: foreach (v; AA) {} foreach (k, v; AA) {} foreach (k; AA.byKeys) {} foreach (i, k; AA.byKeys.enumerate) {} foreach (i, v; AA.byValues.enumerate) {} foreach (k, v; AA.byPairs) {} foreach (i, k, v; AA.byPairs.enumerate) {} If you want all those schemes built in a language (and to use them without adding .enumerate) you risk making a mess. In this case explicit is better than implicit. Python does the same with its enumerate function and keeps the for loop simple: for k in my_dict: pass for i, v in enumerate(my_dict.itervalues()): pass etc. In D we have a mess because tuples are not built-in. Instead of having a built-in functionality similar to what enumerate does, it's WAY better to have built-in tuples. Finding what's important and what is not important to have as built-ins in a language is an essential and subtle design problem. Bye, bearophile
Re: Ranges, constantly frustrating
On Fri, 14 Feb 2014 13:14:51 -, bearophile bearophileh...@lycos.com wrote: Regan Heath: FWIW I disagree. I think it's immediately and intuitively obvious what 'i' should be when you're foreaching over X items taken from another range, even if you do not know take returns another range. Compare it to calling a function on a range and foreaching on the result, you would intuitively and immediately expect 'i' to relate to the result, not the input. Using enumerate has several advantages. In my case I didn't need any of these. Simple things should be simple and intuitive to write. Yes, we want enumerate *as well* especially for the more complex cases but we also want the basics to be simple, intuitive and easy. That's all I'm saying here. This seems to me to be very low hanging fruit. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: Ranges, constantly frustrating
On Fri, 14 Feb 2014 12:29:49 -, Jakob Ovrum jakobov...@gmail.com wrote: On Friday, 14 February 2014 at 12:10:51 UTC, Regan Heath wrote: FWIW I disagree. I think it's immediately and intuitively obvious what 'i' should be when you're foreaching over X items taken from another range, even if you do not know take returns another range. Compare it to calling a function on a range and foreaching on the result, you would intuitively and immediately expect 'i' to relate to the result, not the input. R How should it behave on ranges without length, such as infinite ranges? In exactly the same way. It just counts up until you break out of the foreach, or the 'i' value wraps around. In fact the behaviour I want is so trivial I think it could be provided by foreach itself, for iterations of anything. In which case whether 'i' was conceptually an index or simply a count would depend on whether the range passed to foreach (after all skip, take, etc) was itself indexable. Also, `enumerate` has the advantage of the `start` parameter, which usefulness is demonstrated in `enumerate`'s example as well as in an additional example in the bug report. Sure, if you need more functionality reach for enumerate. We can have both; sensible default behaviour AND enumerate for more complicated cases. In my case, enumerate w/ start wouldn't have helped (my file was blocks of 6 lines, where I wanted to skip lines 1, 3, and 6 *of each block*) I'm not yet sure whether I think it should be implemented at the language or library level, but I think the library approach has some advantages. Certainly, for the more complex usage. But I reckon we want both enumerate and a simple language solution which would do what I've been trying to describe. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: Ranges, constantly frustrating
Regan Heath: In my case I didn't need any of these. I don't understand. Bye, bearophile
Re: Ranges, constantly frustrating
Isn't this discussion about adding an index to a range? If it is, then I have shown why adding it in the language is a bad idea. Bye, bearophile
Re: Ranges, constantly frustrating
On Friday, 14 February 2014 at 17:42:53 UTC, bearophile wrote: Isn't this discussion about adding an index to a range? If it is, then I have shown why adding it in the language is a bad idea. As far as I understand it, it's about adding an index to _foreach_, as is already supported for arrays: foreach(v; [1,2,3,4]) writeln(v); foreach(i, v; [1,2,3,4]) writeln(i, = , v); But for ranges, the second form is not possible: foreach(v; iota(4)) // ok writeln(v); foreach(i, v; iota(4))// Error: cannot infer argument types writeln(i, = , v);
Re: Ranges, constantly frustrating
Marc Schütz: As far as I understand it, it's about adding an index to _foreach_, as is already supported for arrays: foreach(v; [1,2,3,4]) writeln(v); foreach(i, v; [1,2,3,4]) writeln(i, = , v); But for ranges, the second form is not possible: foreach(v; iota(4)) // ok writeln(v); foreach(i, v; iota(4))// Error: cannot infer argument types writeln(i, = , v); I see. In my post I have explained why this is a bad idea (it's not explicit so it gives confusion, and it complicates the language/compiler). A better design is to remove the auto-indexing feature for arrays too, and use .enumerate in all cases, as in Python. Bye, bearophile
Re: Ranges, constantly frustrating
On Wed, 12 Feb 2014 11:08:57 -, Jakob Ovrum jakobov...@gmail.com wrote: On Wednesday, 12 February 2014 at 10:44:57 UTC, Regan Heath wrote: Ahh.. so this is a limitation of the range interface. Any plans to fix this? R Did my original reply not arrive? It is the first reply in the thread... It did, thanks. It would be better if this was part of the language and just worked as expected, but this is just about as good. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: Ranges, constantly frustrating
On Wed, 12 Feb 2014 21:01:58 -, Jesse Phillips jesse.k.phillip...@gmail.com wrote: On Wednesday, 12 February 2014 at 10:52:13 UTC, Regan Heath wrote: On Tue, 11 Feb 2014 19:48:40 -, Jesse Phillips jesse.k.phillip...@gmail.com wrote: On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote: Things like this should just work.. File input ... auto range = input.byLine(); while(!range.empty) { range.popFront(); foreach (i, line; range.take(4)) //Error: cannot infer argument types { It isn't *required* to (input/forward), but it could (random access). I think we even have a template to test if it's indexable as we can optimise some algorithms based on this. You chopped of your own comment prompting this response, in which I am responding to a minor side-point, which I think has confused the actual issue. All I was saying above was that a range might well have an index, and we can test for that, but it's not relevant to the foreach issue below. What do you expect 'i' to be? Is it the line number? Is it the index within the line where 'take' begins? Where 'take' stops? If I say take(5) I expect 0,1,2,3,4. The index into the take range itself. I don't see how these two replies can coexist. 'range.take(5)' is a different range from 'range.' Yes, exactly, meaning that it can trivially count the items it returns, starting from 0, and give those to me as 'i'. *That's all I want* 'range may not traverse in index order (personally haven't seen such a range). But more importantly you're not dealing with random access ranges. The index you're receiving from take(5) can't be used on the range. A forward range can do what I am describing above, it's trivial. Don't get me wrong, counting the elements as you iterate over them is useful, but it isn't the index into the range you're likely after. Nope, not what I am after. If I was, I'd iterate over the original range instead or keep a line count manually. Maybe the number is needed to correspond to a line number. Nope. The file contains records of 5 lines plus a blank line. I want 0, 1, 2, 3, 4, 5 so I can skip lines 0, 2, and 5 *of each record*. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: Ranges, constantly frustrating
On Thursday, 13 February 2014 at 14:30:41 UTC, Regan Heath wrote: Don't get me wrong, counting the elements as you iterate over them is useful, but it isn't the index into the range you're likely after. Nope, not what I am after. If I was, I'd iterate over the original range instead or keep a line count manually. Maybe a better way to phrase this is, while counting may be what you're implementation needs, it is not immediately obvious what 'i' should be. Someone who desires an index into the original array will expect 'i' to be that; even though it can be explained that .take() is not the same range as the original. Thus it is better to be explicit with the .enumerate function.
Re: Ranges, constantly frustrating
On Tue, 11 Feb 2014 17:11:46 -, Ali Çehreli acehr...@yahoo.com wrote: On 02/11/2014 06:25 AM, Rene Zwanenburg wrote: On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote: foreach (i, line; range.take(4)) //Error: cannot infer argument types { ..etc.. } foreach (i, line; iota(size_t.max).zip(range.take(4))) { } There is also the following, relying on tuples' automatic expansion in foreach: foreach (i, element; zip(sequence!n, range.take(4))) { // ... } Thanks for the workarounds. :) Both seem needlessly opaque, but I realise you're not suggesting these are better than the original, just that they actually work today. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: Ranges, constantly frustrating
On Tue, 11 Feb 2014 19:48:40 -, Jesse Phillips jesse.k.phillip...@gmail.com wrote: On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote: Things like this should just work.. File input ... auto range = input.byLine(); while(!range.empty) { range.popFront(); foreach (i, line; range.take(4)) //Error: cannot infer argument types { ..etc.. } range.popFront(); } Tried adding 'int' and 'char[]' or 'auto' .. no dice. Can someone explain why this fails, and if this is a permanent or temporary limitation of D/MD. R In case the other replies weren't clear enough. A range does not have an index. It isn't *required* to (input/forward), but it could (random access). I think we even have a template to test if it's indexable as we can optimise some algorithms based on this. What do you expect 'i' to be? Is it the line number? Is it the index within the line where 'take' begins? Where 'take' stops? If I say take(5) I expect 0,1,2,3,4. The index into the take range itself. The reason I wanted it was I was parsing blocks of data over 6 lines - I wanted to ignore the first and last and process the middle 4. In fact I wanted to skip the 2nd of those 4 as well, but there was not single function (I could find) which would do all that so I coded the while above. There is a feature of foreach and tuple() which results in the tuple getting expanded automatically. And also the opApply overload taking a delegate with both parameters. byLine has its own issues with reuse of the buffer, it isn't inherent to ranges. I haven't really used it (needed it from std.process), when I wanted to read a large file I went with wrapping std.mmap: https://github.com/JesseKPhillips/libosm/blob/master/source/util/filerange.d Cool, thanks. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: Ranges, constantly frustrating
On Wednesday, 12 February 2014 at 10:44:57 UTC, Regan Heath wrote: Ahh.. so this is a limitation of the range interface. Any plans to fix this? R Did my original reply not arrive? It is the first reply in the thread... Reproduced: See this pull request[1] and the linked enhancement report. Also note that calling `r.popFront()` without checking `r.empty` is a program error (so it's recommended to at least put in an assert). [1] https://github.com/D-Programming-Language/phobos/pull/1866
Ranges, constantly frustrating
Things like this should just work.. File input ... auto range = input.byLine(); while(!range.empty) { range.popFront(); foreach (i, line; range.take(4)) //Error: cannot infer argument types { ..etc.. } range.popFront(); } Tried adding 'int' and 'char[]' or 'auto' .. no dice. Can someone explain why this fails, and if this is a permanent or temporary limitation of D/MD. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: Ranges, constantly frustrating
On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote: Things like this should just work.. File input ... auto range = input.byLine(); while(!range.empty) { range.popFront(); foreach (i, line; range.take(4)) //Error: cannot infer argument types { ..etc.. } range.popFront(); } Tried adding 'int' and 'char[]' or 'auto' .. no dice. Can someone explain why this fails, and if this is a permanent or temporary limitation of D/MD. R Is foreach(i, val; aggregate) even defined if aggr is not an array or associated array? It is not in the docs: http://dlang.org/statement#ForeachStatement
Re: Ranges, constantly frustrating
On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote: Things like this should just work.. File input ... auto range = input.byLine(); while(!range.empty) { range.popFront(); foreach (i, line; range.take(4)) //Error: cannot infer argument types { ..etc.. } range.popFront(); } Tried adding 'int' and 'char[]' or 'auto' .. no dice. Can someone explain why this fails, and if this is a permanent or temporary limitation of D/MD. R See this pull request[1] and the linked enhancement report. Also note that calling `r.popFront()` without checking `r.empty` is a program error (so it's recommended to at least put in an assert). [1] https://github.com/D-Programming-Language/phobos/pull/1866
Re: Ranges, constantly frustrating
On Tue, 11 Feb 2014 10:52:39 -, Tobias Pankrath tob...@pankrath.net wrote: Further, the naive solution of adding .array gets you in all sorts of trouble :p (The whole byLine buffer re-use issue). This should be simple and easy, dare I say it trivial.. or am I just being dense here. R The second naive solution would be to use readText and splitLines. The file is huge in my case :) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: Ranges, constantly frustrating
On Tue, 11 Feb 2014 10:58:17 -, Tobias Pankrath tob...@pankrath.net wrote: On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote: Things like this should just work.. File input ... auto range = input.byLine(); while(!range.empty) { range.popFront(); foreach (i, line; range.take(4)) //Error: cannot infer argument types { ..etc.. } range.popFront(); } Tried adding 'int' and 'char[]' or 'auto' .. no dice. Can someone explain why this fails, and if this is a permanent or temporary limitation of D/MD. R Is foreach(i, val; aggregate) even defined if aggr is not an array or associated array? It is not in the docs: http://dlang.org/statement#ForeachStatement import std.stdio; struct S1 { private int[] elements = [9,8,7]; int opApply (int delegate (ref uint, ref int) block) { foreach (uint i, int n ; this.elements) block(i, n); return 0; } } void main() { S1 range; foreach(uint i, int x; range) { writefln(%d is %d, i, x); } } R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Re: Ranges, constantly frustrating
On Tuesday, 11 February 2014 at 13:00:19 UTC, Regan Heath wrote: import std.stdio; struct S1 { private int[] elements = [9,8,7]; int opApply (int delegate (ref uint, ref int) block) { foreach (uint i, int n ; this.elements) block(i, n); return 0; } } void main() { S1 range; foreach(uint i, int x; range) { writefln(%d is %d, i, x); } } R byLine does not use opApply https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1389
Re: Ranges, constantly frustrating
On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote: Things like this should just work.. File input ... auto range = input.byLine(); while(!range.empty) { range.popFront(); foreach (i, line; range.take(4)) //Error: cannot infer argument types { ..etc.. } range.popFront(); } Tried adding 'int' and 'char[]' or 'auto' .. no dice. Can someone explain why this fails, and if this is a permanent or temporary limitation of D/MD. R foreach (i, line; iota(size_t.max).zip(range.take(4))) { }
Re: Ranges, constantly frustrating
On 02/11/2014 06:25 AM, Rene Zwanenburg wrote: On Tuesday, 11 February 2014 at 10:10:27 UTC, Regan Heath wrote: foreach (i, line; range.take(4)) //Error: cannot infer argument types { ..etc.. } foreach (i, line; iota(size_t.max).zip(range.take(4))) { } There is also the following, relying on tuples' automatic expansion in foreach: foreach (i, element; zip(sequence!n, range.take(4))) { // ... } Ali
Re: Ranges, constantly frustrating
On Tuesday, 11 February 2014 at 10:52:40 UTC, Tobias Pankrath wrote: The second naive solution would be to use readText and splitLines. That's the sort of thing I always do because then I understand what's going on, and when there's a bug I can find it easily! But then I'm not writing libraries. Steve
Re: Ranges, constantly frustrating
On Tuesday, 11 February 2014 at 19:48:41 UTC, Jesse Phillips wrote: In case the other replies weren't clear enough. A range does not have an index. What do you expect 'i' to be? In case of foreach(i, x; range) I would expect it to be iteration number of this particular foreach. I miss it sometimes, have to create another variable and increment it. I didn't know about automatic tuple expansion though, that looks better.