I agree with Sebastian suggestion that you can use a network traffic
analyzer to analyzer the HTTP request and response headers between nutch
and browser. Maybe they send different request headers.


On Fri, Aug 2, 2013 at 7:16 AM, A Laxmi <[email protected]> wrote:

> Sebastian - thanks for your help!
>
> I can access the link from a browser without any issue. I am getting fetch
> failed with http code = 403 only while crawler is trying to fetch
>
> On Thursday, August 1, 2013, Sebastian Nagel <[email protected]>
> wrote:
> > Hi,
> >
> > why are you sure that you didn't get a real 403 (forbidden)?
> > - the answering web server logs a delivery with 200 (ok)?
> > - a network traffic analyzer (wireshark, tcpdump) shows
> >   that HTTP response headers have a different status code?
> >
> > In general, servers may deliver different responses to a crawler
> > and a browser, or even deny to deliver a document.
> >
> > Sebastian
> >
> > On 08/01/2013 10:56 PM, A Laxmi wrote:
> >> For some reason, I am not able to crawl, the fetcher seems to have an
> >> issue. It complains - "*fetch of http://www.someurldomain.com/ failed
> with:
> >> Http code = 403, url = *
> >> *http://www.someurldomain.com/";
> >>
> >> *
> >> Please help. I tried to google this issue but could not find anything
> that
> >> can address this issue
> >>
> >
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to