Re: Reading a file of words line by line

2020-01-16 Thread mark via Digitalmars-d-learn

On Thursday, 16 January 2020 at 10:10:02 UTC, dwdv wrote:
On 2020-01-16 04:54, Jesse Phillips via Digitalmars-d-learn 
wrote:

[...]

[...]

isn't far off, but could also be (sans imports):

return File(filename).byLine
.map!(line => line.until!(not!isAlpha))
.filter!(word => word.count == wordsize)
.map!(word => word.to!string.toUpper)
.assocArray(0.repeat);


That's what I'm now using -- thanks!
(Now I can try the next bit.)


Re: Reading a file of words line by line

2020-01-16 Thread dwdv via Digitalmars-d-learn

On 2020-01-16 04:54, Jesse Phillips via Digitalmars-d-learn wrote:

[...]
.map!(word => word.to!string.toUpper)
.array
.sort
.uniq
.map!(x => tuple (x, 0))
.assocArray ;



.each!(word => words[word.to!string.toUpper] = 0);

isn't far off, but could also be (sans imports):

return File(filename).byLine
.map!(line => line.until!(not!isAlpha))
.filter!(word => word.count == wordsize)
.map!(word => word.to!string.toUpper)
.assocArray(0.repeat);


Re: Reading a file of words line by line

2020-01-15 Thread Jesse Phillips via Digitalmars-d-learn

On Wednesday, 15 January 2020 at 19:50:31 UTC, mark wrote:
I really do need a set for the next part of the program, but 
taking your code and ideas I have now reduced the function to 
this:


WordSet getWords(string filename, int wordsize) {
WordSet words;
File(filename).byLine
.map!(line => line.until!(not!isAlpha))
.filter!(word => word.count == wordsize)
.each!(word => words[word.to!string.toUpper] = 0);
return words;
}

This is also 4x faster than my version that used a regex -- 
thanks!


Why did you use string.count rather than string.length?


Your solution is fine, but also



void main () {

auto file = ["word one", "my word", "word"] ;
writeln (uniqueWords(file, 4));
}

auto uniqueWords(string[] file, uint wordsize) {
import std.algorithm, std.array, std.conv, std.functional, 
std.uni;


return file
.map!(line => line.until!(not!isAlpha))
.filter!(word => word.count == wordsize)
.map!(word => word.to!string.toUpper)
.array
.sort
.uniq
.map!(x => tuple (x, 0))
.assocArray ;
}




Re: Reading a file of words line by line

2020-01-15 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Jan 15, 2020 at 07:50:31PM +, mark via Digitalmars-d-learn wrote:
[...]
> Why did you use string.count rather than string.length?

The .length of a `string` type is the number of bytes that it occupies,
which is not necessarily the same thing as the number of characters in
the string. E.g., if you receive a Unicode string, there may be
multi-byte characters in it.


T

-- 
A computer doesn't mind if its programs are put to purposes that don't match 
their names. -- D. Knuth


Re: Reading a file of words line by line

2020-01-15 Thread mark via Digitalmars-d-learn
I really do need a set for the next part of the program, but 
taking your code and ideas I have now reduced the function to 
this:


WordSet getWords(string filename, int wordsize) {
WordSet words;
File(filename).byLine
.map!(line => line.until!(not!isAlpha))
.filter!(word => word.count == wordsize)
.each!(word => words[word.to!string.toUpper] = 0);
return words;
}

This is also 4x faster than my version that used a regex -- 
thanks!


Why did you use string.count rather than string.length?



Re: Reading a file of words line by line

2020-01-15 Thread dwdv via Digitalmars-d-learn

On 2020-01-15 16:34, mark via Digitalmars-d-learn wrote:

Is this as compact as it _reasonably_ can be?


How about this?

auto uniqueWords(string filename, uint wordsize) {
import std.algorithm, std.array, std.conv, std.functional, std.uni;

return File(filename).byLine
.map!(line => line.until!(not!isAlpha))
.filter!(word => word.count == wordsize)
.map!(word => word.to!string.toUpper)
.array
.sort
.uniq;
}


Re: Reading a file of words line by line

2020-01-15 Thread mark via Digitalmars-d-learn
Thanks for the ideas, I've now reduced the size of the getWords() 
function (even allowing for moving the imports to the top of the 
file) to this:


WordSet getWords(string filename, int wordsize) {
string bareWord(string line) {
auto rx = ctRegex!(r"^([a-z]+)", "i");
auto match = matchFirst(line, rx);
return match.empty ? "" : match.hit.to!string;
}
WordSet words;
slurp!string(filename, "%s")
.map!(line => bareWord(line))
.filter!(word => word.length == wordsize)
.each!(word => words[word.toUpper] = 0);
return words;
}

Is this as compact as it _reasonably_ can be?


Re: Reading a file of words line by line

2020-01-14 Thread mipri via Digitalmars-d-learn

On Tuesday, 14 January 2020 at 16:39:16 UTC, mark wrote:
I can't help feeling that the foreach loop's block is rather 
more verbose than it could be?





WordSet words;
auto rx = ctRegex!(r"^[a-z]+", "i");
auto file = File(filename);
foreach (line; file.byLine) {
auto match = matchFirst(line, rx);
if (!match.empty()) {
	auto word = match.hit().to!string; // I hope this assumes 
UTF-8?

if (word.length == wordsize) {
words[word.toUpper] = 0;
}
}
}
return words;
}



One thing I picked up during Advent of Code last year was
std.file.slurp, which was great for reading 90% of the input
files from that contest. With that, I'd do this more like

  int[string] words;
  slurp!string("input.txt", "%s").each!(w => words[w] = 0);

Where "%s" is what slurp() expects to find on each line, and
'string' is the type it returns from that. With just a list of
words this isn't very interesting. Some of my uses from the
contest are:

  auto input = slurp!(int, int, int)(args[1], "z=%d>")

  .map!(p => Moon([p[0], p[1], p[2]])).array;

  Tuple!(string, string)[] input =
  slurp!(string, string)("input.txt", "%s)%s");

Of course if you want to validate the input as you're reading
it, you still have to do extra work, but it could be in a
.filter!



Re: Reading a file of words line by line

2020-01-14 Thread mark via Digitalmars-d-learn

Should I have closed the file, i.e.,:

auto file = File(filename);
scope(exit) file.close(); // Add this?



Reading a file of words line by line

2020-01-14 Thread mark via Digitalmars-d-learn
As part of learning D I want to read a file that contains one 
word per line (plus optional junk after the word) and creates a 
set of all the unique words of a particular length (uppercased).


D doesn't appear to have a set type so I'm faking using an 
associative array whose values are always 0.


I can't help feeling that the foreach loop's block is rather more 
verbose than it could be?



#!/usr/bin/env rdmd
import std.stdio;

immutable WORDFILE = "/usr/share/hunspell/en_GB.dic";
immutable WORDSIZE = 4; // Should be even

alias WordSet = int[string]; // key = word; value = 0

void main() {
import core.time;

auto start = MonoTime.currTime;
auto words = getWords(WORDFILE, WORDSIZE);
// TODO
writeln(words.length, " words");
writeln(MonoTime.currTime - start);
}

WordSet getWords(string filename, int wordsize) {
import std.conv;
import std.regex;
import std.uni;

WordSet words;
auto rx = ctRegex!(r"^[a-z]+", "i");
auto file = File(filename);
foreach (line; file.byLine) {
auto match = matchFirst(line, rx);
if (!match.empty()) {
	auto word = match.hit().to!string; // I hope this assumes 
UTF-8?

if (word.length == wordsize) {
words[word.toUpper] = 0;
}
}
}
return words;
}


PS I'm using ldc on Linux and think that rdmd is excellent. For 
lots of small Python programs I have I'm wondering how many would 
be faster using D and rdmd (which I think caches binaries). Also 
I've now got Mike Parker's "Learning D" on order.