I finally got a threaded version that works, and a lot more cleanly than using send/receive. (But performance is dismal, see the end.)

Here's the heart of the solution:

    void readPackages() {
        import std.algorithm: max;
        import std.array: array;
        import std.parallelism: taskPool, totalCPUs;
        import std.file: dirEntries, FileException, SpanMode;

        try {
auto filenames = dirEntries(PACKAGE_DIR, PACKAGE_PATTERN,
                                        SpanMode.shallow).array;
foreach (debs; taskPool.map!readPackageFile(filenames))
                foreach (deb; debs)
                    debForName[deb.name] = deb.dup;
        } catch (FileException err) {
            import std.stdio: stderr;
            stderr.writeln("failed to read packages: ", err);
        }
    }

I had to change readPackageFile (and the functions it calls), e.g.,:

Deb[] readPackageFile(string filename) {
    import std.file: FileException;
    import std.range: enumerate;
    import std.stdio: File, stderr;

    Deb[] debs;
    Deb deb;
    try {
bool inDescription = false; // Descriptions can by multi-line bool inContinuation = false; // Other things can be multi-line
        auto file = File(filename);
        foreach(lino, line; file.byLine.enumerate(1))
readPackageLine(debs, deb, filename, lino, line, inDescription,
                            inContinuation);
        if (deb.valid)
            debs ~= deb.dup;
    } catch (FileException err) {
        stderr.writefln("error: %s: failed to read packages: %s",
                        filename, err);
    }
    return debs;
}

I also changed main() to do some timings & to allow me to compare outputs:

void main(const string[] args) {
    import std.datetime.stopwatch: AutoStart, StopWatch;
    import std.stdio: stderr, writeln;

    auto model = Model();
    auto timer = StopWatch(AutoStart.yes);
    model.readPackages();
stderr.writefln("read %,d packages in %s", model.length, timer.peek);
    if (args.length > 1)
        foreach (deb; model.debForName)
            writeln(deb);
}


This produces the same output as the single-threaded version.

Here's the output of a typical single-threaded version's run:

read 65,480 packages in 1 sec, 314 ms, 798 μs, and 7 hnsecs

And here's the output of a typical task-based multi-threaded version's run:

read 65,480 packages in 1 sec, 377 ms, 605 μs, and 3 hnsecs

In fact, the multi-threaded has never yet been as fast as the single-threaded version!

I've put both versions on github in case anyone's interested:
https://github.com/mark-summerfield/d-debtest-experiment

Reply via email to