Hello,

Hoping to get some insight on where to further troubleshoot this issue. The 
scenario is we have a web application which accepts URL encoded UTF-8 
characters (Cyrillic text in this instance) and then our web application sends 
this data to a Flume agent via HTTPSource with the JSONHandler. This agent then 
in turn sends the event along via Avro sink to another Flume agent which writes 
it to HDFS using the HDFS sink.

We initially noticed the data was no longer valid in the HDFS file and after 
investigating have found the following:


-          The initial POST is correct, verified via a network trace and 
looking at binary data on the wire.

-          The Avro event sent from the Flume agent is mangled, again verified 
via network trace and looking at the binary payload.

We do not explicitly set the content type header on the POST from our 
application as documentation states if not set then UTF-8 will be assumed.

Can anyone elaborate on when/why this data is being corrupted?

Thanks,
Paul Chavez

Reply via email to