On 3/31/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
On a tab-delimited file I just got from a client, I got this error:

SEVERE: java.io.IOException: (line 119986) invalid char between
encapsualted token end delimiter
         at org.apache.commons.csv.CSVParser.encapsulatedTokenLexer
(CSVParser.java:499)

This may just be a problem with the file,

It sounds like there is a field that looks like it's encapsulated, but
then has some other non-whitespace characters after that.

I was able to reproduce your exception via:
curl 'http://localhost:8983/solr/update/csv?stream.body=id,name%0A"10"oops,wow'

Notice the oops after the quoted 10.

Is your file a "real" CSV file?  How is escaping handled?
If there is no escaping at all (no tabs in field values, no newlines,
etc), perhaps try setting the encapsulator to something that won't
occur in the file.

or perhaps I need to
specify an encoding (not quite sure what it is on that file, but it
doesn't appear to be UTF8 as TextEdit complained about it).  The file
is brand new to me, and fairly large (~150MB).  The command I'm using
to import is:

        curl "http://localhost:8983/solr/update/csv?stream.file=/Users/erik/
Desktop/data.txt&separator=%
09&fieldnames=id,name_text,title_text,qty_display,price_display,config_d
isplay,category_facet"

I have another tab-delimited file to bring in, but only some of the
columns should be imported.  Is it possible with this loader to skip
over columns in the data file not desired in Solr?  Certainly I can
transform the file before loading, so its not a problem, just curious.

LOL... I did implement that originally, and then forgot about it.
The "skip" param already implemented skipping particular fields, and
then I went and added code to read "skip" as skipLines.  I'll fix
that.

The other way to skip fields is to give them a zero length name.
So if you wanted to skip the second column, use
fieldnames=id,,title_text,qty_display,etc

I'll document that.

Thanks for refreshing my memory :-)

-Yonik

Reply via email to