Hi Yongyao,

this looks like a configuration issue of the index.
In case of Solr (plugin indexer-solr):
 inlinks and outlinks should be configured as multivalued

That's the default for Solr 5, older versions need to specify
this in the index configuration schema.

Please, open also an issue on
  https://issues.apache.org/jira/browse/NUTCH
to add appropriate values to the default schema.xml

But what Nutch version and what indexer are you using?

Best,
Sebastian

On 04/18/2017 09:12 PM, Yongyao Jiang wrote:
> Hi,
> 
> I have crawled 10K web pages with "index-links" turned on, and
> "linkdb.ignore.internal.links" set to false. But pretty much all pages I
> have got only have one outlink and one inlink. This makes me very confused.
> 
> Here is a sample,
> 
>              {
>                "inlinks": "http://www.planetary.org/blogs/bruce-betts/";,
>                "tstamp": "2017-04-18T15:45:31.457Z",
>                "nutch_score": 0.439538,
>                "segment": "20170418154526",
>                "digest": "1ef28e97795b40be08d312f630b1728f",
>                "host": "www.planetary.org",
>                "boost": "1.0",
>                "contentLength": "10355",
>                "outlinks": "http://ajax.googleapis.com/";,
>                 }
> 
> Thanks,
> Yongyao
> 

Reply via email to