On Tue, Jun 21, 2011 at 2:15 AM, Rafał Kuć <r....@solr.pl> wrote: > Hello! > > Once again thanks for the response ;) So the solution is to generate > the data files once again and either adding the space after doubled > encapsulator
Maybe... I can't tell if the file is encoded correctly or not since I don't know what the decoded values are supposed to be from your example. -Yonik http://www.lucidimagination.com > or changing the encapsulator to the character that does > not occur in the filed values (of course the one taht will be > split). > > > -- > Regards, > Rafał Kuć > http://solr.pl > >> Multi-valued CSV fields are double encoded. > >> We start with: "aaa ""bbb""ccc"' >> Then decoding one leve, we get: aaa "bbb"ccc >> Decoding again to get individual values results in a decode error >> because the encapsulator appears unescaped in the middle of the second >> value (i.e. invalid CSV). > >> One easier way to fix this is to use a different encapsulator for the >> sub-values of a multi-valued field by adding f.title.encapsulator=%27 >> (a single quote char) > >> But I can't really tell you exactly how to encode or specify options >> to the CSV loader when I don't know what the actual values you want >> after "aaa ""bbb""ccc"' is decoded. > >> -Yonik >> http://www.lucidimagination.com > > > >> On Mon, Jun 20, 2011 at 5:46 PM, Rafał Kuć <r....@solr.pl> wrote: >>> Hi! >>> >>> Yonik, thanks for the reply. I just realized that the example I gave >>> was not full - the error is returned by Solr only when the field is >>> multivalued and the values in the fields are splited. For example, the >>> following curl command give me the mentioned error: >>> >>> curl >>> 'http://localhost:8983/solr/update/csv?fieldnames=id,title&commit=true&en >>> capsulator=%22&f.title.split=true&f.title.separator=%20' -H >>> 'Content-type:text/plain' -d '"1","aaa ""bbb""ccc"' >>> >>> while the following is executed without any problem: >>> curl >>> 'http://localhost:8983/solr/update/csv?fieldnames=id,title&commit=true&en >>> capsulator=%22&f.title.split=true&f.title.separator=%20' -H >>> 'Content-type:text/plain' -d '"1","aaa ""bbb"" ccc"' >>> >>> The only difference between those two is the additional space >>> character in between bbb"" and ccc in the second example. >>> >>> Am I doing something wrong ? ;) >>> >>> -- >>> Regards, >>> Rafał Kuć >>> http://solr.pl >>> >>>> This works fine for me: >>> >>>> curl http://localhost:8983/solr/update/csv -H >>>> 'Content-type:text/plain' -d 'id,name >>>> "1","aaa ""bbb"" ccc"' >>> >>>> -Yonik >>>> http://www.lucidimagination.com >>> >>> >>>> On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć <r....@solr.pl> wrote: >>>>> Hello! >>>>> >>>>> I have a question about the CSV update handler. Lets say I have the >>>>> following file sent to CSV update handler using curl: >>>>> >>>>> id,name >>>>> "1","aaa ""bbb""ccc" >>>>> >>>>> It throws an error, saying that: >>>>> Error 400 java.io.IOException: (line 0) invalid char between encapsulated >>>>> token end delimiter >>>>> >>>>> If I change the contents of the file to: >>>>> >>>>> id,name >>>>> "1","aaa ""bbb"" ccc" >>>>> >>>>> it works without a problem. This anyone encountered this ? Is it know >>>>> behavior ? >>>>> >>>>> -- >>>>> Regards, >>>>> Rafał Kuć >>>>> >>>>> >>>>> >>> >>> >>> >>> >>> > > > > >