Thanks for the reply!

I'm not sure the best way to illustrate the issue, as I struggle with solr log 
management within docker. However, here are a few URLs that have exhibited the 
problem. In each case, Solr complains "Error adding field 'binaryContent'" ... 
"msg=String length must be a multiple of four"


http://cnnfn.cnn.com/2017/03/07/investing/carl-icahn-betting-against-trump-rally/index.html


http://buzz.money.cnn.com/author/ctymkiw/

http://abcnews.go.com/GMA/video/rose-mcgowan-dropped-agent-calling-sexist-casting-note-32047448

http://buzz.money.cnn.com/tag/investing/

Meanwhile, the following URL also gets an "error adding field" message but with 
"msg=Illegal character" instead of "String length must be a multiple of four". 
Don't know if it's related.

http://buzz.money.cnn.com/author/byheatherlong/


All tests done with Nutch 1.12, Solr 5.4.1.

BTW, I wouldn't mind updating Nutch and Solr. What is your recommended 
most-stable combination of versions? I am using Hadoop 2.7.3 (from Hortonworks).


At one point, Lewis John McG reported on such an issue in 
https://issues.apache.org/jira/browse/NUTCH-2186

Reply via email to