Hi again,
I've been looking deeper and error may be because the server filters the
requests by the User-Agent headers value. In fact, if I make requests with
curl or wget to the server with User-Agent value as Mozilla/4.0, it returns
the url content correctly:
wget-U Mozilla/4.0 <url>
curl-A "Mozilla/4.0" <url>

Therefore, my goal now is to configure nutch so that the User-Agent headers
value be correct. To do this, I modified the nutch-default.xml file:
 <name> http.agent.name </ name>
 <value> "Mozilla/4.0" </ value>

Is it enough?

Thanks

2010/11/16 Markus Jelsma-2 [via Lucene] <
[email protected]<ml-node%[email protected]>
>

> definately!
>
> On Tuesday 16 November 2010 18:28:17 matinte wrote:
>
> > The url does exist but for example, when I try curl <url> it returns:
> > curl: (56) Failure when receiving data from the peer
> >
> > It could be a problem of the server?
> >
> > 2010/11/16 Markus Jelsma-2 [via Lucene] <
> > [hidden email] 
> > <http://user/SendEmail.jtp?type=node&node=1912155&i=0><ml-node%2B1912044-590307235-
>
> > [hidden email] <http://user/SendEmail.jtp?type=node&node=1912155&i=1>>
> >
> > > That should generate an IOException if i'm not mistaken.
> > >
> > > On Tuesday 16 November 2010 18:16:45 Ye T Thet wrote:
> > > > Matinte,
> > > >
> > > > I have encountered that before.
> > > >
> > > > In my experience, it is caused by <url>. The url you are trying to
> > > > crawl does not exists or server is not responding.
> > > >
> > > > Warm Regards,
> > > >
> > > > YT Thet
> > > >
> > > > On Wed, Nov 17, 2010 at 12:44 AM, matinte <[hidden
> > > > email]<http://user/SendEmail.jtp?type=node&node=1912044&i=0>>
> > >
> > > wrote:
> > > > > Hi,
> > > > > I am trying to crawl with a seed url given but I'm having the next
> > >
> > > error:
> > > > > ...
> > > > > fetch of <url> failed with: java.io.EOFException
> > > > > -finishing thread FetcherThread, activeThreads=0
> > > > > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
> > > > > -activeThreads=0
> > > > > Fetcher: done
> > > > >
> > > > > Do you have any idea?
> > > > >
> > > > > Thanks in advance
> > > > > --
> > >
> > > > > View this message in context:
> > >
> http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p<http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p?by-user=t>
> > > <
> http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847<http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847?by-user=t>
> > > p?by-user=t>
> > >
> > > > > 1911847.html Sent from the Nutch - User mailing list archive at
> > > > > Nabble.com.
> > >
> > > --
> > > Markus Jelsma - CTO - Openindex
> > > http://www.linkedin.com/in/markus17
> > > 050-8536600 / 06-50258350
> > >
> > >
> > > ------------------------------
> > >
> > >  View message @
> > >
> > >
> http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p<http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p?by-user=t>
> > > 1912044.html To unsubscribe from Fetch error during crawling, click
> > > here<
> http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsu<http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsu&by-user=t>
> > >
> bscribe_by_code&node=1911847&code=bWlndWVsLnRpbnRlQGdtYWlsLmNvbXwxOTExODQ
> > > 3fC0xODMzNjA4OTYy>.
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536600 / 06-50258350
>
>
> ------------------------------
>  View message @
> http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p1912155.html
> To unsubscribe from Fetch error during crawling, click 
> here<http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=1911847&code=bWlndWVsLnRpbnRlQGdtYWlsLmNvbXwxOTExODQ3fC0xODMzNjA4OTYy>.
>
>
>

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p1924795.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to