Without having studied the code closely, I could say that asking for an input range with slicing is quite a tall order that virtually restricts you to random-access ranges.

An input range only allows you to move one character forward and never save your position or go back. A range with slicing in this context means that we can confidently calculate how much of the range we need to take, and that automatically requires the range to be able to go forward and then restart from a previous position.

Regarding overall design and user-level API, it may be reasonable to assume that:

1. CSV readers are usually often for reading an entire file through the end, so optimizations that are mostly applicable to reading one single line are unnecessary. At the same time, optimizations for repeated use of empty/front/popFront are likely to be beneficial.

2. An entire line's representation as strings must fit in memory as a requirement.

As such, David's implementation that works on a character stream is the most general and the theoretical perfect one because one character of lookahead is all CSV needs. At the same time, if an implementation assuming (1) and (2) above has considerable advantages (speed, convenience) then it might trump the theoretically perfect one.

David, LockingTextReader in std.stdio implements character level input straight from a file. It does so very slowly by means of getc/ungetc, which is the only portable way. I know how to make it faster on Linux and OSX and Walter knows how to make it faster on Windows, but we never got around to it.


Andrei

On 01/30/2011 02:21 PM, Jesse Phillips wrote:
In some way your comments apply to my implementation, I'll answer
assuming the latest design I have.

https://github.com/he-the-great/JPDLibs/tree/separator

And as David mentioned, it probably would be best to go with mine, but
it too is not ready for inclusion.

On Sun, Jan 30, 2011 at 11:11 AM, Andrei Alexandrescu<[email protected]>  wrote:
Code review:

#50 RefRange is really ForceInputRange. What do you need it for? It's
unusual to want to reduce the capabilities of a range.

#78 isCharRange is incorrect. Correct version:

enum bool isCharRange = isInputRange!R&&  isSomeChar!(ElementType!R);

#100 Why not struct?

Answering for all of the above as mine doesn't use any of these.

I make use of two ranges and a token function. Most of the work is
done by csvNextToken and has the most unittests against it. I then
build a range which iterates each token until the end of the record,
this is struct Record. On top of that I have a range which iterates
over each Record, this is RecordList.

I avoid the need to create reference semantics by having both Record
and csvNextToken take a ref parameter of the Range.

#102 private Appender!(char[]) _front;

As my result may not be an array, I use contactination. I do not have
a constraint on this yet...

#306 This is CsvRange not a CsvFile as it builds on another range (that may
or may not be backed up by a file)

As said, mine are named Record and RecordList, and I believe these are
good names as we aren't dealing with lines or files. The only concern
I have is that Record has many means, though I believe I am using it
correctly for CSV row data.

CsvRecord? DataRecord?

#387 The name is confusing - it's a class with struct in its name.

I believe this is an implementation detail. I use csvText!CastTo(...)
where CastTo can be a struct. It will be interesting to see what I do
when I get heading support put in.

#582 We also need a way to read CSV files into string arrays in case the
user just wants to do the parsing and decide on typing later. Seemingly the
current design forces choice of type before parsing.

Mine defaults to a slice of the original range, with the exception of
quoted entries. A helper function to return a Range[][] could be made.
But I think just leaving standard Range semantics is best.

Documentation review:

* O(1) is a bit inaccurate - memory consumed is proportional to that of one
element. What you might have meant is that it does not depend on the number
of lines in a file or on the number of CSV elements in a line.

I believe my new method has a better footprint then David's, but it no
longer operates on an InputRange, but a range with slicing and
appending. If there is no quoted data it just returns the slice of the
needed data, otherwise it appends slices of the data to a new Range.

It may be useful to have an implementation which doesn't make use of these.

* A few artifacts have no examples.

I haven't done much for examples either.

* The example should compile. getCharRange() does not exist. FWIW your
design should work with byLine().

As David said, you really can't do this. You can make it work, but it
is more trouble then it is worth. I built my design off of the ideas
found in Splitter.

Feedback and suggestion are welcome, but it really hasn't seen the
comment or code standard which it needs for easy understanding.
_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos
_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos

Reply via email to