Re: [Solr Wiki] Update of "UpdateCSV" by YonikSeeley

Erik Hatcher Sat, 31 Mar 2007 19:00:27 -0800


On Mar 31, 2007, at 11:48 AM, Yonik Seeley wrote:

On 3/31/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:

On a tab-delimited file I just got from a client, I got this error:

SEVERE: java.io.IOException: (line 119986) invalid char between
encapsualted token end delimiter
         at org.apache.commons.csv.CSVParser.encapsulatedTokenLexer
(CSVParser.java:499)

This may just be a problem with the file,


It sounds like there is a field that looks like it's encapsulated, but
then has some other non-whitespace characters after that.

I was able to reproduce your exception via:

curl 'http://localhost:8983/solr/update/csv?stream.body=id,name%0A"10"oops,wow'


Notice the oops after the quoted 10.

Is your file a "real" CSV file?

It's tab-delimited, no encapsulation stuff going on either, justsimply tabs separating fields.

If there is no escaping at all (no tabs in field values, no newlines,
etc), perhaps try setting the encapsulator to something that won't
occur in the file.


voila!

I used &encapsulator=%1f and a few minutes later ~1.8M records wereindexed!

I have another tab-delimited file to bring in, but only some of the
columns should be imported.  Is it possible with this loader to skip
over columns in the data file not desired in Solr?  Certainly I can

transform the file before loading, so its not a problem, justcurious.


LOL... I did implement that originally, and then forgot about it.
The "skip" param already implemented skipping particular fields, and
then I went and added code to read "skip" as skipLines.  I'll fix
that.

The other way to skip fields is to give them a zero length name.
So if you wanted to skip the second column, use
fieldnames=id,,title_text,qty_display,etc


Very cool!

This CSV importer will prove very handy.  Already has.

        Erik

Re: [Solr Wiki] Update of "UpdateCSV" by YonikSeeley

Reply via email to