thank you for the hints - i use haskell and assume that between the
strings which i see and what is sent 'on the wire' is converted. i am
not familiar with your comment about the difference between utf8
encoding and utf8 on the wire. in the material that you pointed to i do
not see such a conversion mentioned. can you give me another pointer?
i will read more about what haskell does in encoding utf8. what i
understand is that a umlaut (U+00E4) is encoded in three bytes...
i assume you will fix the differences in the decoders to assure that the
return code and the store action corresponds.
thank you for the help!
andrew
--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
+43 1 58801 12710 direct
Geoinformation, TU Wien +43 1 58801 12700 office
Gusshausstr. 27-29 +43 1 55801 12799 fax
1040 Wien Austria +43 676 419 25 72 mobil
On 03/28/2017 10:57 PM, Andy Seaborne wrote:
>
>
> On 28/03/17 21:35, Andrew U Frank wrote:
>> the problem/bug is not related to the BOM character but seemingly to
>> many UTF-8.
>>
>> i get (consistently) a return code of 204 when the fuseki server is
>> running without -v and 500 when running with -v if any of the literatls
>> contains a "strange" (nonASCII?) UTF-8. the current problem is the
>> character รค (code point 228 - character a with diaresis, german umlaut).
>> if i remove the character, the triples (all of the request) are stored,
>> if it is in the literat, none is stored.
>
> (can we stick to hex please?)
>
> 228 = U+00E4
>
> I suspect that codepoints are not being encoded into UTF-8 correctly.
> That is what the java-based decoder that you hit via "-v" is saying.
>
> For example, U+00E4 is 3 bytes : c3 a4 0a : in UTF-8 on the wire.
>
> What is definitely wrong is sending the codepoint as a byte directly :
> xE4 or two bytes 00 E4.
>
>>
>> i understand that a request encoded as application/sparql-update must be
>> coded as UTF8 which my literal is - or is there some special encoding
>> necessary for the german a umlaut? i do not think that the triples
>> should be encoded as latin1 or similar??
>
> Can you confirm that on the wire it is c3 a4 0a?
>
>>
>> i tried to POST with curl or wget, but did not succeed (i have not much
>> experience with these outside of simplest case).
>>
>> in any case, it is likely a bug when the response with or without -v in
>> the fuseki start makes a difference?
>
> Hitting different decoders.
>
> Strictly, it is an error and it should be 500. javacc
> bytes-to-character seems to be too lax.
>
>>
>> thank you for the help!
>>
>> andrew
>>
>>