Re: [phobos] CSVRange: RFC

Andrei Alexandrescu Sun, 30 Jan 2011 22:53:23 -0800

Without having studied the code closely, I could say that asking for aninput range with slicing is quite a tall order that virtually restrictsyou to random-access ranges.

An input range only allows you to move one character forward and neversave your position or go back. A range with slicing in this contextmeans that we can confidently calculate how much of the range we need totake, and that automatically requires the range to be able to go forwardand then restart from a previous position.

Regarding overall design and user-level API, it may be reasonable toassume that:

1. CSV readers are usually often for reading an entire file through theend, so optimizations that are mostly applicable to reading one singleline are unnecessary. At the same time, optimizations for repeated useof empty/front/popFront are likely to be beneficial.

2. An entire line's representation as strings must fit in memory as arequirement.

As such, David's implementation that works on a character stream is themost general and the theoretical perfect one because one character oflookahead is all CSV needs. At the same time, if an implementationassuming (1) and (2) above has considerable advantages (speed,convenience) then it might trump the theoretically perfect one.

David, LockingTextReader in std.stdio implements character level inputstraight from a file. It does so very slowly by means of getc/ungetc,which is the only portable way. I know how to make it faster on Linuxand OSX and Walter knows how to make it faster on Windows, but we nevergot around to it.



Andrei

On 01/30/2011 02:21 PM, Jesse Phillips wrote:

In some way your comments apply to my implementation, I'll answer
assuming the latest design I have.

https://github.com/he-the-great/JPDLibs/tree/separator

And as David mentioned, it probably would be best to go with mine, but
it too is not ready for inclusion.

On Sun, Jan 30, 2011 at 11:11 AM, Andrei Alexandrescu<[email protected]>  wrote:

Code review:

#50 RefRange is really ForceInputRange. What do you need it for? It's
unusual to want to reduce the capabilities of a range.

#78 isCharRange is incorrect. Correct version:

enum bool isCharRange = isInputRange!R&&  isSomeChar!(ElementType!R);

#100 Why not struct?


Answering for all of the above as mine doesn't use any of these.

I make use of two ranges and a token function. Most of the work is
done by csvNextToken and has the most unittests against it. I then
build a range which iterates each token until the end of the record,
this is struct Record. On top of that I have a range which iterates
over each Record, this is RecordList.

I avoid the need to create reference semantics by having both Record
and csvNextToken take a ref parameter of the Range.

#102 private Appender!(char[]) _front;


As my result may not be an array, I use contactination. I do not have
a constraint on this yet...

#306 This is CsvRange not a CsvFile as it builds on another range (that may
or may not be backed up by a file)


As said, mine are named Record and RecordList, and I believe these are
good names as we aren't dealing with lines or files. The only concern
I have is that Record has many means, though I believe I am using it
correctly for CSV row data.

CsvRecord? DataRecord?

#387 The name is confusing - it's a class with struct in its name.


I believe this is an implementation detail. I use csvText!CastTo(...)
where CastTo can be a struct. It will be interesting to see what I do
when I get heading support put in.

#582 We also need a way to read CSV files into string arrays in case the
user just wants to do the parsing and decide on typing later. Seemingly the
current design forces choice of type before parsing.


Mine defaults to a slice of the original range, with the exception of
quoted entries. A helper function to return a Range[][] could be made.
But I think just leaving standard Range semantics is best.

Documentation review:

* O(1) is a bit inaccurate - memory consumed is proportional to that of one
element. What you might have meant is that it does not depend on the number
of lines in a file or on the number of CSV elements in a line.


I believe my new method has a better footprint then David's, but it no
longer operates on an InputRange, but a range with slicing and
appending. If there is no quoted data it just returns the slice of the
needed data, otherwise it appends slices of the data to a new Range.

It may be useful to have an implementation which doesn't make use of these.

* A few artifacts have no examples.


I haven't done much for examples either.

* The example should compile. getCharRange() does not exist. FWIW your
design should work with byLine().


As David said, you really can't do this. You can make it work, but it
is more trouble then it is worth. I built my design off of the ideas
found in Splitter.

Feedback and suggestion are welcome, but it really hasn't seen the
comment or code standard which it needs for easy understanding.
_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos

_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos

Re: [phobos] CSVRange: RFC

Reply via email to