I think I have an instance of the known bug
https://issues.apache.org/jira/browse/NUTCH-2186
I need to keep raw html in my Solr index (or somewhere) so that an external
tool can access it and parse it. So, I added addBinaryContent and base64 to my
indexing command. On the very first segment, I get a bunch of failures with
messages that say "String length must be a multiple of four." The same is true
if I omit the base64 argument.
Is there a workaround or fix for this issue? I am using Nutch 1.12 and Solr
5.4.1.