Thanks a lot Doug
On Thu, Aug 11, 2011 at 5:02 PM, Doug Cutting <[email protected]> wrote: > This is for performance. > > A Utf8 may be efficiently compared to other Utf8's, e.g., when sorting, > without decoding the UTF-8 bytes into characters. A Utf8 may also be > reused, so when iterating through a large number of values (e.g., in a > MapReduce job) only a single instance need be allocated, while String > would require an allocation per iteration. > > Note that String may be used when writing data, but that data is > generally read as Utf8. The toString() method may be called whenever a > String is required. If only equality or ordering is needed, and not > substring operations, then leaving values as Utf8 is generally faster > than converting to String. > > Doug > > On 08/11/2011 04:36 PM, Yang wrote: >> if I declare a field to be "string", the generated java implementation >> uses avro......Utf8 for that, >> >> I was wondering what is the thinking behind this, and what is the >> proper way to use the Utf8 value ----- >> oftentimes in my logic, I need to compare the value against other >> String's, or store them into other databases , which >> of course do not know about Utf8, so that I'd have to transform them >> into String's. so it seems being Utf8 unnecessarily >> asks for a lot of transformations. >> >> or I guess I'm not getting the correct usage ? >> >> Thanks >> Yang >
