On Mar 31, 2007, at 11:48 AM, Yonik Seeley wrote:
On 3/31/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
On a tab-delimited file I just got from a client, I got this error:
SEVERE: java.io.IOException: (line 119986) invalid char between
encapsualted token end delimiter
at org.apache.commons.csv.CSVParser.encapsulatedTokenLexer
(CSVParser.java:499)
This may just be a problem with the file,
It sounds like there is a field that looks like it's encapsulated, but
then has some other non-whitespace characters after that.
I was able to reproduce your exception via:
curl 'http://localhost:8983/solr/update/csv?stream.body=id,name%
0A"10"oops,wow'
Notice the oops after the quoted 10.
Is your file a "real" CSV file?
It's tab-delimited, no encapsulation stuff going on either, just
simply tabs separating fields.
If there is no escaping at all (no tabs in field values, no newlines,
etc), perhaps try setting the encapsulator to something that won't
occur in the file.
voila!
I used &encapsulator=%1f and a few minutes later ~1.8M records were
indexed!
I have another tab-delimited file to bring in, but only some of the
columns should be imported. Is it possible with this loader to skip
over columns in the data file not desired in Solr? Certainly I can
transform the file before loading, so its not a problem, just
curious.
LOL... I did implement that originally, and then forgot about it.
The "skip" param already implemented skipping particular fields, and
then I went and added code to read "skip" as skipLines. I'll fix
that.
The other way to skip fields is to give them a zero length name.
So if you wanted to skip the second column, use
fieldnames=id,,title_text,qty_display,etc
Very cool!
This CSV importer will prove very handy. Already has.
Erik