Hi Michael, can you share more information regarding Nutch and Solr version and at least one document to make the problem reproducible. Looks like that's not a general problem - at least, I'm not able to reproduce it, indexing with -addBinaryContent -base64 succeeds (recent Nutch snapshot / master, Solr 6.6.0).
Thanks, Sebastian On 10/20/2017 06:46 PM, Michael Coffey wrote: > I guess there is no solution or workaround for the addBinaryContent bug, so I > have to write code to read directly from segment data. If not writing Java, I > guess I have to do readseg-dump and then parse the output text file. > > > -- original message -- > I think I have an instance of the known bug > https://issues.apache.org/jira/browse/NUTCH-2186 > > I need to keep raw html in my Solr index (or somewhere) so that an external > tool can access it and parse it. So, I added addBinaryContent and base64 to > my indexing command. On the very first segment, I get a bunch of failures > with messages that say "String length must be a multiple of four." The same > is true if I omit the base64 argument. > > Is there a workaround or fix for this issue? I am using Nutch 1.12 and Solr > 5.4.1. >