Hi Michael,

I tried to reproduce the problem with the current Nutch master and Solr 6.6.0
without success, resp. indexing the binary content succeeded:
- that's the case for two of the URLs you sent
- those from buzz.money.cnn.com are blocked somehow (fetching failed)

Building Nutch isn't difficult:
 git clone http://github.com/apache/nutch.git
 cd nutch
 ant
You'll find the Nutch runtime is in runtime/local/ or runtime/deploy/ (for 
usage on Hadoop).

The tutorial
  https://wiki.apache.org/nutch/NutchTutorial
should be already up-to-date on how to use recent
Solr versions.


Best,
Sebastian



{
  "responseHeader":{
    "status":0,
    "QTime":2,
    "params":{

"q":"id:http\\://cnnfn.cnn.com/2017/03/07/investing/carl-icahn-betting-against-trump-rally/index.html",
      "indent":"on",
      "wt":"json",
      "_":"1508829081797"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "date":"2017-10-24T07:01:05.593Z",
        "author":"Matt Egan",
        "title":"Trump adviser Carl Icahn bets against the Trump rally - Mar. 
7, 2017",
        "type":["application/xhtml+xml",
          "application",
          "xhtml+xml"],

"url":"http://cnnfn.cnn.com/2017/03/07/investing/carl-icahn-betting-against-trump-rally/index.html";,
        "content":"Trump adviser Carl Icahn bets against the Trump rally - Mar. 
7, ...",
        "tstamp":"2017-10-24T07:01:05.593Z",
        "segment":"20171024090054",
        "digest":"cff265f11bd74bd104f3c6e1c7185484",
        "boost":1.0,

"id":"http://cnnfn.cnn.com/2017/03/07/investing/carl-icahn-betting-against-trump-rally/index.html";,
        "_version_":1582121409782480896,

"binaryContent":"+IDxzY3JpcHQgdHlwZT0idGV4dC9qYXZhc2NyaXB0Ij4gdmFyIHVybFByZT0iaHR0cDovL21hcmtld...""}]
  }}


On 10/24/2017 01:07 AM, Michael Coffey wrote:
> http://cnnfn.cnn.com/2017/03/07/investing/carl-icahn-betting-against-trump-rally/index.html
> 
> 
> http://buzz.money.cnn.com/author/ctymkiw/
> 
> http://abcnews.go.com/GMA/video/rose-mcgowan-dropped-agent-calling-sexist-casting-note-32047448
> 
> http://buzz.money.cnn.com/tag/investing/
> 
> Meanwhile, the following URL also gets an "error adding field" message but 
> with "msg=Illegal character" instead of "String length must be a multiple of 
> four". Don't know if it's related.
> 
> http://buzz.money.cnn.com/author/byheatherlong/

Reply via email to