Thanks for the reply! I'm not sure the best way to illustrate the issue, as I struggle with solr log management within docker. However, here are a few URLs that have exhibited the problem. In each case, Solr complains "Error adding field 'binaryContent'" ... "msg=String length must be a multiple of four"
http://cnnfn.cnn.com/2017/03/07/investing/carl-icahn-betting-against-trump-rally/index.html http://buzz.money.cnn.com/author/ctymkiw/ http://abcnews.go.com/GMA/video/rose-mcgowan-dropped-agent-calling-sexist-casting-note-32047448 http://buzz.money.cnn.com/tag/investing/ Meanwhile, the following URL also gets an "error adding field" message but with "msg=Illegal character" instead of "String length must be a multiple of four". Don't know if it's related. http://buzz.money.cnn.com/author/byheatherlong/ All tests done with Nutch 1.12, Solr 5.4.1. BTW, I wouldn't mind updating Nutch and Solr. What is your recommended most-stable combination of versions? I am using Hadoop 2.7.3 (from Hortonworks). At one point, Lewis John McG reported on such an issue in https://issues.apache.org/jira/browse/NUTCH-2186