Re: addBinaryContent and string length must be a multiple of four

2017-10-24 Thread Sebastian Nagel
Hi Michael, I tried to reproduce the problem with the current Nutch master and Solr 6.6.0 without success, resp. indexing the binary content succeeded: - that's the case for two of the URLs you sent - those from buzz.money.cnn.com are blocked somehow (fetching failed) Building Nutch isn't

Re: addBinaryContent and string length must be a multiple of four

2017-10-23 Thread Michael Coffey
Thanks for the reply! I'm not sure the best way to illustrate the issue, as I struggle with solr log management within docker. However, here are a few URLs that have exhibited the problem. In each case, Solr complains "Error adding field 'binaryContent'" ... "msg=String length must be a

Re: addBinaryContent and string length must be a multiple of four

2017-10-23 Thread Sebastian Nagel
Hi Michael, can you share more information regarding Nutch and Solr version and at least one document to make the problem reproducible. Looks like that's not a general problem - at least, I'm not able to reproduce it, indexing with -addBinaryContent -base64 succeeds (recent Nutch snapshot /

Re: addBinaryContent and string length must be a multiple of four

2017-10-20 Thread Michael Coffey
I guess there is no solution or workaround for the addBinaryContent bug, so I have to write code to read directly from segment data. If not writing Java, I guess I have to do readseg-dump and then parse the output text file. -- original message -- I think I have an instance of the known bug