I guess there is no solution or workaround for the addBinaryContent bug, so I 
have to write code to read directly from segment data. If not writing Java, I 
guess I have to do readseg-dump and then parse the output text file.


-- original message --
I think I have an instance of the known bug 
https://issues.apache.org/jira/browse/NUTCH-2186

I need to keep raw html in my Solr index (or somewhere) so that an external 
tool can access it and parse it. So, I added addBinaryContent and base64 to my 
indexing command. On the very first segment, I get a bunch of failures with 
messages that say "String length must be a multiple of four." The same is true 
if I omit the base64 argument.

Is there a workaround or fix for this issue? I am using Nutch 1.12 and Solr 
5.4.1.

Reply via email to