Hello,
I tried to read the segment containing the site which I am sure has a
link towards another site and I was surprised to find out that the outlinks
stored all belong to the same domain. I came across this
https://issues.apache.org/jira/browse/NUTCH-1346
It seems a patch is available for 1.6. I am currently using 1.2. The
latest release for Nutch is 1.4. Would it be safe to switch directly to
1.6?
On 5/29/2012 10:19 AM, Dustine Rene Bernasor wrote:
Hello,
Whenever I set link.ignore.internal.host and link.ignore.internal.domain
in nutch-site.xml to "true", I get the "No links to process, is the
webgraph empty?" error when performing LinkRank. However, if I set it to
"false", LinkRank works just fine. I have been searching about this
error but I haven't found anything conclusive so far. Btw, I have also
set both the link.ignore.limit.page and the link.ignore.limit.domain to
"true".
Furthermore, if I perform NodeReader on a certain page A, it says that
that that page has 0 inlinks and outlinks but I know that there's
another page B that links to A. But if I do the NodeReader on B it says
there's 1 inlink and 1 outlink although B has links to many other sites.
I hope someone can shed light on this matter.
Thanks.
Dustine