This sounds like the Ruby implementation does not correctly use UTF-8 on your platform for encoding strings. It may be a bug, but I am not knowledgeable enough on the Ruby implementation to know for sure.
The Avro specification states that "a string is encoded as a long followed by that many bytes of UTF-8 encoded character data." (http://avro.apache.org/docs/current/spec.html#binary_encode_primitive). If you think that the Ruby implementation does not adhere to the spec, please file a bug in JIRA. Thanks! -Scott On 1/4/12 3:59 AM, "kafka0102 kafka0102" <[email protected]> wrote: > Hi. > I use avro's java and ruby clients. When they comunite, the ruby client always > encode(decode) the multi-byte chars(utf-8) to latin1. For now, when the data > is multi-byte chars,I first encode Iconv.conv("UTF8", "LATIN1",data) in the > ruby client, and then decoded it Utils.conv(data, "ISO-8859-1","UTF-8"); in > the java server.It works,but too ugly. I see the avro ruby client using > StringIO to pack the data, but I cannot find ways to make it support > multi-byte chars. > Can anyone help me?
