Hi Michael,
I tried to reproduce the problem with the current Nutch master and Solr 6.6.0
without success, resp. indexing the binary content succeeded:
- that's the case for two of the URLs you sent
- those from buzz.money.cnn.com are blocked somehow (fetching failed)
Building Nutch isn't
Thanks for the reply!
I'm not sure the best way to illustrate the issue, as I struggle with solr log
management within docker. However, here are a few URLs that have exhibited the
problem. In each case, Solr complains "Error adding field 'binaryContent'" ...
"msg=String length must be a
Hi Michael,
can you share more information regarding Nutch and Solr version and at least
one document
to make the problem reproducible. Looks like that's not a general problem - at
least,
I'm not able to reproduce it, indexing with -addBinaryContent -base64 succeeds
(recent
Nutch snapshot /
I guess there is no solution or workaround for the addBinaryContent bug, so I
have to write code to read directly from segment data. If not writing Java, I
guess I have to do readseg-dump and then parse the output text file.
-- original message --
I think I have an instance of the known bug
4 matches
Mail list logo