UTFDataFormatException (encoded string too long) is thrown when storing strings > 65536 bytes (in UTF8 form) using BinStorage() -------------------------------------------------------------------------------------------------------------------------------
Key: PIG-560 URL: https://issues.apache.org/jira/browse/PIG-560 Project: Pig Issue Type: Bug Affects Versions: types_branch Reporter: Pradeep Kamath Fix For: types_branch BinStorage() uses DataOutput.writeUTF() and DataInput.readUTF() Java API to write out Strings as UTF-8 bytes and to read them back. From the Javadoc - "First, the total number of bytes needed to represent all the characters of s is calculated. If this number is larger than 65535, then a UTFDataFormatException is thrown. " (because the writeUTF() API uses 2 bytes to represent the number of bytes). A way to get around this would be to not use writeUTF()/ReadUTF() and instead hand convert the string to the corresponding UTF-8 byte[] (using String.getBytes("UTF-8") and then write the length of the byte array as an int - this will allow a size of upto 2^32 (2 raised to 32). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.