Hello,

I tried to read the segment containing the site which I am sure has a link towards another site and I was surprised to find out that the outlinks
stored all belong to the same domain. I came across this

https://issues.apache.org/jira/browse/NUTCH-1346

It seems a patch is available for 1.6. I am currently using 1.2. The latest release for Nutch is 1.4. Would it be safe to switch directly to 1.6?



On 5/29/2012 10:19 AM, Dustine Rene Bernasor wrote:
Hello,

Whenever I set link.ignore.internal.host and link.ignore.internal.domain
in nutch-site.xml to "true", I get the "No links to process, is the
webgraph empty?" error when performing LinkRank. However, if I set it to
"false", LinkRank works just fine. I have been searching about this
error but I haven't found anything conclusive so far.  Btw, I have also
set both the link.ignore.limit.page and the link.ignore.limit.domain to
"true".

Furthermore, if I perform NodeReader on a certain page A, it says that
that that page has 0 inlinks and outlinks but I know that there's
another page B that links to A. But if I do the NodeReader on B it says
there's 1 inlink and 1 outlink although B has links to many other sites.

I hope someone can shed light on this matter.

Thanks.

Dustine


Reply via email to