On 28/03/17 21:35, Andrew U Frank wrote:
the problem/bug is not related to the BOM character but seemingly to
many UTF-8.

i get (consistently) a return code of 204 when the fuseki server is
running without -v and 500 when running with -v if any of the literatls
contains a "strange" (nonASCII?) UTF-8. the current problem is the
character รค (code point 228 - character a with diaresis, german umlaut).
if i remove the character, the triples (all of the request) are stored,
if it is in the literat, none is stored.

(can we stick to hex please?)

228 = U+00E4

I suspect that codepoints are not being encoded into UTF-8 correctly. That is what the java-based decoder that you hit via "-v" is saying.

For example, U+00E4 is 3 bytes : c3 a4 0a : in UTF-8 on the wire.

What is definitely wrong is sending the codepoint as a byte directly : xE4 or two bytes 00 E4.


i understand that a request encoded as application/sparql-update must be
coded as UTF8 which my literal is - or is there some special encoding
necessary for the german a umlaut? i do not think that the triples
should be encoded as latin1 or similar??

Can you confirm that on the wire it is c3 a4 0a?


i tried to POST with curl or wget, but did not succeed (i have not much
experience with these outside of simplest case).

in any case, it is likely a bug when the response with or without -v in
the fuseki start makes a difference?

Hitting different decoders.

Strictly, it is an error and it should be 500. javacc bytes-to-character seems to be too lax.


thank you for the help!

andrew


Reply via email to