On Tue, Jun 21, 2011 at 2:15 AM, Rafał Kuć <r....@solr.pl> wrote:
> Hello!
>
> Once again thanks for the response ;) So the solution is to generate
> the data files once again and either adding the space after doubled
> encapsulator

Maybe...
I can't tell if the file is encoded correctly or not since I don't
know what the decoded values are supposed to be from your example.

-Yonik
http://www.lucidimagination.com

> or changing the encapsulator to the character that does
> not occur in the filed values (of course the one taht will be
> split).
>
>
> --
> Regards,
>  Rafał Kuć
>  http://solr.pl
>
>> Multi-valued CSV fields are double encoded.
>
>> We start with: "aaa ""bbb""ccc"'
>> Then decoding one leve, we get:  aaa "bbb"ccc
>> Decoding again to get individual values results in a decode error
>> because the encapsulator appears unescaped in the middle of the second
>> value (i.e. invalid CSV).
>
>> One easier way to fix this is to use a different encapsulator for the
>> sub-values of a multi-valued field by adding f.title.encapsulator=%27
>> (a single quote char)
>
>> But I can't really tell you exactly how to encode or specify options
>> to the CSV loader when I don't know what the actual values you want
>> after "aaa ""bbb""ccc"' is decoded.
>
>> -Yonik
>> http://www.lucidimagination.com
>
>
>
>> On Mon, Jun 20, 2011 at 5:46 PM, Rafał Kuć <r....@solr.pl> wrote:
>>> Hi!
>>>
>>>  Yonik, thanks for the reply. I just realized that the example I gave
>>> was not full - the error is returned by Solr only when the field is
>>> multivalued and the values in the fields are splited. For example, the
>>> following curl command give me the mentioned error:
>>>
>>> curl
>>> 'http://localhost:8983/solr/update/csv?fieldnames=id,title&commit=true&en
>>> capsulator=%22&f.title.split=true&f.title.separator=%20' -H
>>> 'Content-type:text/plain' -d '"1","aaa ""bbb""ccc"'
>>>
>>> while the following is executed without any problem:
>>> curl
>>> 'http://localhost:8983/solr/update/csv?fieldnames=id,title&commit=true&en
>>> capsulator=%22&f.title.split=true&f.title.separator=%20' -H
>>> 'Content-type:text/plain' -d '"1","aaa ""bbb"" ccc"'
>>>
>>> The only difference between those two is the additional space
>>> character in between bbb"" and ccc in the second example.
>>>
>>> Am I doing something wrong ? ;)
>>>
>>> --
>>> Regards,
>>>  Rafał Kuć
>>>  http://solr.pl
>>>
>>>> This works fine for me:
>>>
>>>> curl http://localhost:8983/solr/update/csv -H
>>>> 'Content-type:text/plain' -d 'id,name
>>>> "1","aaa ""bbb"" ccc"'
>>>
>>>> -Yonik
>>>> http://www.lucidimagination.com
>>>
>>>
>>>> On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć <r....@solr.pl> wrote:
>>>>> Hello!
>>>>>
>>>>>  I have a question about the CSV update handler. Lets say I have the
>>>>> following file sent to CSV update handler using curl:
>>>>>
>>>>> id,name
>>>>> "1","aaa ""bbb""ccc"
>>>>>
>>>>> It throws an error, saying that:
>>>>> Error 400 java.io.IOException: (line 0) invalid char between encapsulated 
>>>>> token end delimiter
>>>>>
>>>>> If I change the contents of the file to:
>>>>>
>>>>> id,name
>>>>> "1","aaa ""bbb"" ccc"
>>>>>
>>>>> it works without a problem. This anyone encountered this ? Is it know 
>>>>> behavior ?
>>>>>
>>>>> --
>>>>> Regards,
>>>>>  Rafał Kuć
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>>
>
>
>
>
>

Reply via email to