Re: Reading files using delimiters/terminators
On Tuesday, 29 December 2020 at 14:50:41 UTC, Steven Schveighoffer wrote: Are you on Windows? If so, your double newlines might be \r\n\r\n, depending on what editor you used to create the input. Use a hexdump program to see what the newlines are in your input file. I've tried \r\n\r\n as well, which sadly also did not work. Using vscode I have also switched between CRLF and LF, which also did not do the trick. I'm getting the sense the implementation might have a specific workaround for \r\n / CRLF line-endings, though I haven't checked the sourcecode yet. Note that this is not really a problem for me specifically, I've long used a different approach, however it seemed like a design issue. I'll try replicating this in isolation later, maybe something was wrong last time I tried.
Re: Reading files using delimiters/terminators
On 12/26/20 7:13 PM, Rekel wrote: I'm trying to read a file with entries seperated by '\n\n' (empty line), with entries containing '\n'. I thought the File.readLine(KeepTerminator, Terminator) might work, as it seems to accept strings as terminators, since there seems to have been a thread regarding '\r\n' seperators. I don't know if there's some underlying reason, but when I try to use "\n\n" as a terminator, I end up getting the entire file into 1 char[], so it's not delimited. Should this work or is there a reason one cannot use byLine like this? For context, I'm trying this with the puzzle input of day 6 of this year's advent of code. (https://adventofcode.com/) Are you on Windows? If so, your double newlines might be \r\n\r\n, depending on what editor you used to create the input. Use a hexdump program to see what the newlines are in your input file. Now, you would think that the underlying C stream would do this for you. I'm not sure how it works exactly, as I don't use Windows. -Steve
Re: Reading files using delimiters/terminators
http://ddili.org/ders/d.en/index.html This seems very promising :) I doubt I'd still be considering D if it weren't for this awesome learning forum, thanks for all the help!
Re: Reading files using delimiters/terminators
On Sunday, 27 December 2020 at 23:18:37 UTC, Rekel wrote: Update; Any clue why there's both "std.file" and "std.io.File"? I was mostly unaware of the former. The very first paragraph at the top of the `std.file` documentation explains it: "Functions in this module handle files as a unit, e.g., read or write one file at a time. For opening files and manipulating them via handles refer to module std.stdio." https://dlang.org/phobos/std_file.html
Re: Reading files using delimiters/terminators
On 28.12.20 00:12, Rekel via Digitalmars-d-learn wrote: is there a reason to use either 'splitter' or 'split'? split gives you a newly allocated array with the results, splitter is lazy equivalent and doesn't allocate. Feel free using either, doesn't matter much with these small puzzle inputs. Sidetangent, don't mean to bash the learning tour, as it's been really useful for getting started, but I'm surprised stuff like tuples and files arent mentioned there. Especially since the documentation tends to trip me up, with stuff like 'isSomeString' mentioning 'built in string types', while I haven't been able to find that concept elsewhere, let alone functionality one can expect in this case (like .length and the like), and stuff like 'countUntil' not being called 'indexOf', although it also exists and does basically the same thing. Also assumeUnique seems to be a thing? Might be worth discussing that in a new topic. The stdlib is vast and has tons of useful utilities, not all of which can be explained in detail in a series of overview posts. Ali's "Programming in D" [1], which has a free online version, functions as an excellent in-depth introduction to the language, going over all the important topics. Regarding function names and docs: Yes, some might seem slightly off coming from other languages (e.g. find vs. dropWhile, until vs. takeWhile, cumulativeFold vs scan/accumulate, etc.), but it's all in there somewhere, implemented with the most care to not waste precious cycles. Might makes it harder to grok going over the implementation or docs for very the first time, but it gets easier after a while. Furthermore, alternative names are often times mentioned in the docs so a quick google search should bring you to the right place. [1] http://ddili.org/ders/d.en/index.html
Re: Reading files using delimiters/terminators
On 12/27/20 3:12 PM, Rekel wrote: > is there a reason to use > either 'splitter' or 'split'? I'm not sure I see why the difference > would matter in the end. splitter() is a lazy range algorithm. split() is a range algorithm as well but it is eager; it will put the results in an array that it grows. The string elements would not be copies of the original range; they will still be just the pair of .ptr and .length but it can be expensive if there are a lot of parts. Further, if you want to process just a small number of the initial parts, then being eager would be wasteful. As all lazy range algorithms, splitter() is just an iteration object waiting to be used. It does not allocate any array but serves the parts one by one. You can filter the parts as you iterate over or you can stop at any point. For example, the following would take the first 3 non-empty lines: import std.stdio; import std.range; import std.algorithm; void main() { auto s = "hello\n\nworld\n\n\nand\nmoon"; writefln!"%(%s, %)"(s.splitter('\n').filter!(part => !part.empty).take(3)); } > Sidetangent, don't mean to bash the learning tour, as it's been really > useful for getting started, but I'm surprised stuff like tuples and > files arent mentioned there. Alternative place to search: :) http://ddili.org/ders/d.en/ix.html > Especially since the documentation tends to trip me up, with stuff like > 'isSomeString' mentioning 'built in string types', while I haven't been > able to find that concept elsewhere, Built in strings are just arrays of character types: char[], wchar[], and dchar[]. Commonly used by their respective immutable aliases: string, wstring, and dstring. > 'countUntil' not being called 'indexOf' countUntil() is more general because it works with any range while indexOf requires a string. > assumeUnique seems to be a thing? That appears in the index I posted above as well. ;) Ali
Re: Reading files using delimiters/terminators
On Sunday, 27 December 2020 at 23:12:46 UTC, Rekel wrote: Sidetangent, don't mean to bash the learning tour, as it's been really useful for getting started, but I'm surprised stuff like tuples and files arent mentioned there. Update; Any clue why there's both "std.file" and "std.io.File"? I was mostly unaware of the former.
Re: Reading files using delimiters/terminators
On Sunday, 27 December 2020 at 13:27:49 UTC, oddp wrote: foreach (group; readText("input").splitter("\n\n")) { ... } Also, on other days, when the input is more uniform, there's always https://dlang.org/library/std/file/slurp.html which makes reading it in even easier, e.g. day02: alias Record = Tuple!(int, "low", int, "high", char, "needle", string, "hay"); auto input = slurp!Record("input", "%d-%d %s: %s"); P.S.: would've loved to have had multiwayIntersection in the stdlib for day06 part2, especially when there's already multiwayUnion in setops. fold!setIntersection felt a bit clunky. Oh my, all these things are new to me, haha, thanks a lot! I'll be looking into those (slurp & tuple). By the way, is there a reason to use either 'splitter' or 'split'? I'm not sure I see why the difference would matter in the end. Sidetangent, don't mean to bash the learning tour, as it's been really useful for getting started, but I'm surprised stuff like tuples and files arent mentioned there. Especially since the documentation tends to trip me up, with stuff like 'isSomeString' mentioning 'built in string types', while I haven't been able to find that concept elsewhere, let alone functionality one can expect in this case (like .length and the like), and stuff like 'countUntil' not being called 'indexOf', although it also exists and does basically the same thing. Also assumeUnique seems to be a thing?
Re: Reading files using delimiters/terminators
On Sunday, 27 December 2020 at 13:21:44 UTC, Rekel wrote: On Sunday, 27 December 2020 at 02:41:12 UTC, Jesse Phillips wrote: Unfortunately std.csv is character based and not string. https://dlang.org/phobos/std_csv.html#.csvReader But your use case sounds like splitter is more aligned with your needs. https://dlang.org/phobos/std_algorithm_iteration.html#.splitter But I'm not using csv right? Additionally, shouldnt byLine also work with "\r\n"? Right, you weren't using csv. I'm not familiar with the file terminater to known why it didn't work. byline would allow \r\n as well as \n
Re: Reading files using delimiters/terminators
On 27.12.20 01:13, Rekel via Digitalmars-d-learn wrote: For context, I'm trying this with the puzzle input of day 6 of this year's advent of code. (https://adventofcode.com/) For that specific puzzle I simply did: foreach (group; readText("input").splitter("\n\n")) { ... } Since the input is never that big, I prefer reading in the whole thing and then do the processing. Also, on other days, when the input is more uniform, there's always https://dlang.org/library/std/file/slurp.html which makes reading it in even easier, e.g. day02: alias Record = Tuple!(int, "low", int, "high", char, "needle", string, "hay"); auto input = slurp!Record("input", "%d-%d %s: %s"); P.S.: would've loved to have had multiwayIntersection in the stdlib for day06 part2, especially when there's already multiwayUnion in setops. fold!setIntersection felt a bit clunky.
Re: Reading files using delimiters/terminators
On Sunday, 27 December 2020 at 02:41:12 UTC, Jesse Phillips wrote: Unfortunately std.csv is character based and not string. https://dlang.org/phobos/std_csv.html#.csvReader But your use case sounds like splitter is more aligned with your needs. https://dlang.org/phobos/std_algorithm_iteration.html#.splitter But I'm not using csv right? Additionally, shouldnt byLine also work with "\r\n"?
Re: Reading files using delimiters/terminators
On 12/26/20 4:13 PM, Rekel wrote: I'm trying to read a file with entries seperated by '\n\n' (empty line), with entries containing '\n'. I thought the File.readLine(KeepTerminator, Terminator) might work, as it seems to accept strings as terminators, since there seems to have been a thread regarding '\r\n' seperators. I don't know if there's some underlying reason, but when I try to use "\n\n" as a terminator, I end up getting the entire file into 1 char[], so it's not delimited. Should this work or is there a reason one cannot use byLine like this? For context, I'm trying this with the puzzle input of day 6 of this year's advent of code. (https://adventofcode.com/) byLine should work: import std.stdio; void main() { auto f = File("deneme.d"); // Warning: byLine reuses an internal buffer. Call byLineCopy // if potentially parsed strings into the line need to persist. foreach (line; f.byLine) { if (line.length == 0) { writeln("EMPTY LINE"); } else { writeln(line); } } } Ali
Re: Reading files using delimiters/terminators
On Sunday, 27 December 2020 at 00:13:30 UTC, Rekel wrote: I'm trying to read a file with entries seperated by '\n\n' (empty line), with entries containing '\n'. I thought the File.readLine(KeepTerminator, Terminator) might work, as it seems to accept strings as terminators, since there seems to have been a thread regarding '\r\n' seperators. I don't know if there's some underlying reason, but when I try to use "\n\n" as a terminator, I end up getting the entire file into 1 char[], so it's not delimited. Should this work or is there a reason one cannot use byLine like this? For context, I'm trying this with the puzzle input of day 6 of this year's advent of code. (https://adventofcode.com/) Unfortunately std.csv is character based and not string. https://dlang.org/phobos/std_csv.html#.csvReader But your use case sounds like splitter is more aligned with your needs. https://dlang.org/phobos/std_algorithm_iteration.html#.splitter
Reading files using delimiters/terminators
I'm trying to read a file with entries seperated by '\n\n' (empty line), with entries containing '\n'. I thought the File.readLine(KeepTerminator, Terminator) might work, as it seems to accept strings as terminators, since there seems to have been a thread regarding '\r\n' seperators. I don't know if there's some underlying reason, but when I try to use "\n\n" as a terminator, I end up getting the entire file into 1 char[], so it's not delimited. Should this work or is there a reason one cannot use byLine like this? For context, I'm trying this with the puzzle input of day 6 of this year's advent of code. (https://adventofcode.com/)