Hi,

I noticed that for other urls in the seed inlinks are saved as ol. I checked 
the code and figured out that this is done with the part that saves anchors. 
So, in my case inlinks are saved as anchors in the field ol in hbase. But, for 
one of the ulrs, titile and inlinks are not retrieved, although its parse 
status marked success/ok (1/0), args=[]. 

Alex.

 

 

 

-----Original Message-----
From: kiran chitturi <[email protected]>
To: user <[email protected]>
Sent: Wed, Feb 13, 2013 12:40 pm
Subject: Re: nutch cannot retrive title and inlinks of a domain


Hi Alex,

Inlinks does not work with me now for the same domain [0] currently. I am
using Nutch-2.x and Hbase. Does the inlinks get saved for you for some of
the crawl seeds ?

Surprising, the title does not get saved. Did you try using parsechecker ?


[0] - http://www.mail-archive.com/[email protected]/msg08627.html


On Wed, Feb 13, 2013 at 3:26 PM, <[email protected]> wrote:

> Hello,
>
> I noticed that nutch cannot retrieve title and inlinks of one of the
> domains in the seed list. However, if I run identical code from the server
> where this domain is hosted then it correctly parses it. The surprising
> thing is that in both cases this urls has
>
> status: 2 (status_fetched)
> parseStatus:    success/ok (1/0), args=[]
>
>
> I used nutch-2.1 with hbase-0.92.1 and nutch 1.4.
>
>
> Any ideas why this happens?
>
> Thanks.
>
> Alex.
>



-- 
Kiran Chitturi

 

Reply via email to