Hi Sebastian,

I am using Nutch 1.x, built from the source code of the master branch. And
the indexer is ES 2.3.5.

Thanks,
Yongyao

On Thu, Apr 20, 2017 at 6:03 AM, Sebastian Nagel <[email protected]
> wrote:

> Hi Yongyao,
>
> this looks like a configuration issue of the index.
> In case of Solr (plugin indexer-solr):
>  inlinks and outlinks should be configured as multivalued
>
> That's the default for Solr 5, older versions need to specify
> this in the index configuration schema.
>
> Please, open also an issue on
>   https://issues.apache.org/jira/browse/NUTCH
> to add appropriate values to the default schema.xml
>
> But what Nutch version and what indexer are you using?
>
> Best,
> Sebastian
>
> On 04/18/2017 09:12 PM, Yongyao Jiang wrote:
> > Hi,
> >
> > I have crawled 10K web pages with "index-links" turned on, and
> > "linkdb.ignore.internal.links" set to false. But pretty much all pages I
> > have got only have one outlink and one inlink. This makes me very
> confused.
> >
> > Here is a sample,
> >
> >              {
> >                "inlinks": "http://www.planetary.org/blogs/bruce-betts/";,
> >                "tstamp": "2017-04-18T15:45:31.457Z",
> >                "nutch_score": 0.439538,
> >                "segment": "20170418154526",
> >                "digest": "1ef28e97795b40be08d312f630b1728f",
> >                "host": "www.planetary.org",
> >                "boost": "1.0",
> >                "contentLength": "10355",
> >                "outlinks": "http://ajax.googleapis.com/";,
> >                 }
> >
> > Thanks,
> > Yongyao
> >
>
>


-- 
Yongyao Jiang
https://www.linkedin.com/in/yongyao-jiang-42516164
Ph.D. Student in Earth Systems and GeoInformation Sciences
NSF Spatiotemporal Innovation Center
George Mason University

Reply via email to