I guess there is no solution or workaround for the addBinaryContent bug, so I have to write code to read directly from segment data. If not writing Java, I guess I have to do readseg-dump and then parse the output text file.
-- original message -- I think I have an instance of the known bug https://issues.apache.org/jira/browse/NUTCH-2186 I need to keep raw html in my Solr index (or somewhere) so that an external tool can access it and parse it. So, I added addBinaryContent and base64 to my indexing command. On the very first segment, I get a bunch of failures with messages that say "String length must be a multiple of four." The same is true if I omit the base64 argument. Is there a workaround or fix for this issue? I am using Nutch 1.12 and Solr 5.4.1.

