Jesse,

I was unaware of your efforts. At first glance, your lib looks pretty good. I definitely think Phobos needs a real CSV parser, as I seem to write ad-hoc ones all the time. Since your module mostly looks a little further along and better engineered than mine (mine was really just a prototype that I spent about half a day on), maybe we should focus on getting yours up to Phobos quality. The one major feature yours is missing, though, is the ability for csvText() to extract a subset of the available columns by header. I also like the idea of doing things by column header instead of hard coding the column order because it's less brittle if the layout changes.

--David Simcha

On 1/29/2011 10:47 PM, Jesse Phillips wrote:
That is about the same as what I have, though I was attempting to
handle custom delimiters for fields, records, and quote.

https://github.com/he-the-great/JPDLibs/tree/csv

But about your code. I was getting a Range Violation with your
unittests active. Also you don't handle a quoted empty field
correctly. Otherwise you pass the unittest I ported from mine:

https://gist.github.com/802502

On Sat, Jan 29, 2011 at 3:44 PM, David Simcha<[email protected]>  wrote:
I've written a small module for reading CSV and similar delimited files.
  I've been meaning to do this for a while.  Basically, it allows reading a
CSV file with O(1) memory usage (i.e. it can be parsed one character at a
time) to a range of ranges of cells.  Quotes, escaped quotes, etc. are
handled properly.  I tested it on a nasty CSV file produced by Affymetrix,
and it works rather well.

CSVRange also allows for iteration over rows as a range of structs.  For
example, let's say you had a file:

Height,Weight,Shoe Size
6.5,210,13
...

You could read this file lazily into a range of structs with something like:

struct Person
{
    float height;
    uint weight;
    uint shoeSize;
}

auto csvRange = csvFile(someCharacterRange, ',');
auto structs = csvStructRange(csvRange, ["Height", "Weight", "Shoe Size"]);

// Iterate lazily through the rows.
foreach(s; structs) {
    // Do stuff.
}

Note that this still works even if you have tons of columns you don't care
about in the file.

Code:

http://dsource.org/projects/scrapple/browser/trunk/csvRange/csvRange.d

Docs:

http://cis.jhu.edu/~dsimcha/csvRange.html


_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos




_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos

Reply via email to