Hi Michael,

can you share more information regarding Nutch and Solr version and at least 
one document
to make the problem reproducible. Looks like that's not a general problem - at 
least,
I'm not able to reproduce it, indexing with -addBinaryContent -base64 succeeds 
(recent
Nutch snapshot / master, Solr 6.6.0).

Thanks,
Sebastian

On 10/20/2017 06:46 PM, Michael Coffey wrote:
> I guess there is no solution or workaround for the addBinaryContent bug, so I 
> have to write code to read directly from segment data. If not writing Java, I 
> guess I have to do readseg-dump and then parse the output text file.
> 
> 
> -- original message --
> I think I have an instance of the known bug 
> https://issues.apache.org/jira/browse/NUTCH-2186
> 
> I need to keep raw html in my Solr index (or somewhere) so that an external 
> tool can access it and parse it. So, I added addBinaryContent and base64 to 
> my indexing command. On the very first segment, I get a bunch of failures 
> with messages that say "String length must be a multiple of four." The same 
> is true if I omit the base64 argument.
> 
> Is there a workaround or fix for this issue? I am using Nutch 1.12 and Solr 
> 5.4.1.
> 

Reply via email to