Yes but encapsulate the string in double quotes. Also check the other 
http.agent.* configuration options, they control what's between the parentheses 
etc.

On Thursday 18 November 2010 17:08:09 matinte wrote:
> Hi again,
> I've been looking deeper and error may be because the server filters the
> requests by the User-Agent headers value. In fact, if I make requests with
> curl or wget to the server with User-Agent value as Mozilla/4.0, it returns
> the url content correctly:
> wget-U Mozilla/4.0 <url>
> curl-A "Mozilla/4.0" <url>
> 
> Therefore, my goal now is to configure nutch so that the User-Agent headers
> value be correct. To do this, I modified the nutch-default.xml file:
>  <name> http.agent.name </ name>
>  <value> "Mozilla/4.0" </ value>
> 
> Is it enough?
> 
> Thanks
> 
> 2010/11/16 Markus Jelsma-2 [via Lucene] <
> ml-node+1912155-1367979579-224...@n3.nabble.com<ml-node%2B1912155-136797957
> 9-224...@n3.nabble.com>
> 
> > definately!
> > 
> > On Tuesday 16 November 2010 18:28:17 matinte wrote:
> > > The url does exist but for example, when I try curl <url> it returns:
> > > curl: (56) Failure when receiving data from the peer
> > > 
> > > It could be a problem of the server?
> > > 
> > > 2010/11/16 Markus Jelsma-2 [via Lucene] <
> > > [hidden email]
> > > <http://user/SendEmail.jtp?type=node&node=1912155&i=0><ml-node%2B19120
> > > 44-590307235-
> > > 
> > > [hidden email] <http://user/SendEmail.jtp?type=node&node=1912155&i=1>>
> > > 
> > > > That should generate an IOException if i'm not mistaken.
> > > > 
> > > > On Tuesday 16 November 2010 18:16:45 Ye T Thet wrote:
> > > > > Matinte,
> > > > > 
> > > > > I have encountered that before.
> > > > > 
> > > > > In my experience, it is caused by <url>. The url you are trying to
> > > > > crawl does not exists or server is not responding.
> > > > > 
> > > > > Warm Regards,
> > > > > 
> > > > > YT Thet
> > > > > 
> > > > > On Wed, Nov 17, 2010 at 12:44 AM, matinte <[hidden
> > > > > email]<http://user/SendEmail.jtp?type=node&node=1912044&i=0>>
> > > > 
> > > > wrote:
> > > > > > Hi,
> > > > > > I am trying to crawl with a seed url given but I'm having the
> > > > > > next
> > > > 
> > > > error:
> > > > > > ...
> > > > > > fetch of <url> failed with: java.io.EOFException
> > > > > > -finishing thread FetcherThread, activeThreads=0
> > > > > > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
> > > > > > -activeThreads=0
> > > > > > Fetcher: done
> > > > > > 
> > > > > > Do you have any idea?
> > > > > > 
> > > > > > Thanks in advance
> > > > > > --
> > 
> > > > > > View this message in context:
> > http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p
> > <http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847
> > p?by-user=t>
> > 
> > > > <
> > 
> > http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847<
> > http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847?
> > by-user=t>
> > 
> > > > p?by-user=t>
> > > > 
> > > > > > 1911847.html Sent from the Nutch - User mailing list archive at
> > > > > > Nabble.com.
> > > > 
> > > > --
> > > > Markus Jelsma - CTO - Openindex
> > > > http://www.linkedin.com/in/markus17
> > > > 050-8536600 / 06-50258350
> > > > 
> > > > 
> > > > ------------------------------
> > > > 
> > > >  View message @
> > 
> > http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p
> > <http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847
> > p?by-user=t>
> > 
> > > > 1912044.html To unsubscribe from Fetch error during crawling, click
> > > > here<
> > 
> > http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsu<http:
> > //lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsu&by-user=t>
> > 
> > bscribe_by_code&node=1911847&code=bWlndWVsLnRpbnRlQGdtYWlsLmNvbXwxOTExODQ
> > 
> > > > 3fC0xODMzNjA4OTYy>.
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536600 / 06-50258350
> > 
> > 
> > ------------------------------
> > 
> >  View message @
> > 
> > http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p
> > 1912155.html To unsubscribe from Fetch error during crawling, click
> > here<http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsu
> > bscribe_by_code&node=1911847&code=bWlndWVsLnRpbnRlQGdtYWlsLmNvbXwxOTExODQ
> > 3fC0xODMzNjA4OTYy>.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Reply via email to