Hi Yongyao, this looks like a configuration issue of the index. In case of Solr (plugin indexer-solr): inlinks and outlinks should be configured as multivalued
That's the default for Solr 5, older versions need to specify this in the index configuration schema. Please, open also an issue on https://issues.apache.org/jira/browse/NUTCH to add appropriate values to the default schema.xml But what Nutch version and what indexer are you using? Best, Sebastian On 04/18/2017 09:12 PM, Yongyao Jiang wrote: > Hi, > > I have crawled 10K web pages with "index-links" turned on, and > "linkdb.ignore.internal.links" set to false. But pretty much all pages I > have got only have one outlink and one inlink. This makes me very confused. > > Here is a sample, > > { > "inlinks": "http://www.planetary.org/blogs/bruce-betts/", > "tstamp": "2017-04-18T15:45:31.457Z", > "nutch_score": 0.439538, > "segment": "20170418154526", > "digest": "1ef28e97795b40be08d312f630b1728f", > "host": "www.planetary.org", > "boost": "1.0", > "contentLength": "10355", > "outlinks": "http://ajax.googleapis.com/", > } > > Thanks, > Yongyao >

