Yes but encapsulate the string in double quotes. Also check the other http.agent.* configuration options, they control what's between the parentheses etc.
On Thursday 18 November 2010 17:08:09 matinte wrote: > Hi again, > I've been looking deeper and error may be because the server filters the > requests by the User-Agent headers value. In fact, if I make requests with > curl or wget to the server with User-Agent value as Mozilla/4.0, it returns > the url content correctly: > wget-U Mozilla/4.0 <url> > curl-A "Mozilla/4.0" <url> > > Therefore, my goal now is to configure nutch so that the User-Agent headers > value be correct. To do this, I modified the nutch-default.xml file: > <name> http.agent.name </ name> > <value> "Mozilla/4.0" </ value> > > Is it enough? > > Thanks > > 2010/11/16 Markus Jelsma-2 [via Lucene] < > ml-node+1912155-1367979579-224...@n3.nabble.com<ml-node%2B1912155-136797957 > 9-224...@n3.nabble.com> > > > definately! > > > > On Tuesday 16 November 2010 18:28:17 matinte wrote: > > > The url does exist but for example, when I try curl <url> it returns: > > > curl: (56) Failure when receiving data from the peer > > > > > > It could be a problem of the server? > > > > > > 2010/11/16 Markus Jelsma-2 [via Lucene] < > > > [hidden email] > > > <http://user/SendEmail.jtp?type=node&node=1912155&i=0><ml-node%2B19120 > > > 44-590307235- > > > > > > [hidden email] <http://user/SendEmail.jtp?type=node&node=1912155&i=1>> > > > > > > > That should generate an IOException if i'm not mistaken. > > > > > > > > On Tuesday 16 November 2010 18:16:45 Ye T Thet wrote: > > > > > Matinte, > > > > > > > > > > I have encountered that before. > > > > > > > > > > In my experience, it is caused by <url>. The url you are trying to > > > > > crawl does not exists or server is not responding. > > > > > > > > > > Warm Regards, > > > > > > > > > > YT Thet > > > > > > > > > > On Wed, Nov 17, 2010 at 12:44 AM, matinte <[hidden > > > > > email]<http://user/SendEmail.jtp?type=node&node=1912044&i=0>> > > > > > > > > wrote: > > > > > > Hi, > > > > > > I am trying to crawl with a seed url given but I'm having the > > > > > > next > > > > > > > > error: > > > > > > ... > > > > > > fetch of <url> failed with: java.io.EOFException > > > > > > -finishing thread FetcherThread, activeThreads=0 > > > > > > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 > > > > > > -activeThreads=0 > > > > > > Fetcher: done > > > > > > > > > > > > Do you have any idea? > > > > > > > > > > > > Thanks in advance > > > > > > -- > > > > > > > > View this message in context: > > http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p > > <http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847 > > p?by-user=t> > > > > > > < > > > > http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847< > > http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847? > > by-user=t> > > > > > > p?by-user=t> > > > > > > > > > > 1911847.html Sent from the Nutch - User mailing list archive at > > > > > > Nabble.com. > > > > > > > > -- > > > > Markus Jelsma - CTO - Openindex > > > > http://www.linkedin.com/in/markus17 > > > > 050-8536600 / 06-50258350 > > > > > > > > > > > > ------------------------------ > > > > > > > > View message @ > > > > http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p > > <http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847 > > p?by-user=t> > > > > > > 1912044.html To unsubscribe from Fetch error during crawling, click > > > > here< > > > > http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsu<http: > > //lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsu&by-user=t> > > > > bscribe_by_code&node=1911847&code=bWlndWVsLnRpbnRlQGdtYWlsLmNvbXwxOTExODQ > > > > > > 3fC0xODMzNjA4OTYy>. > > > > -- > > Markus Jelsma - CTO - Openindex > > http://www.linkedin.com/in/markus17 > > 050-8536600 / 06-50258350 > > > > > > ------------------------------ > > > > View message @ > > > > http://lucene.472066.n3.nabble.com/Fetch-error-during-crawling-tp1911847p > > 1912155.html To unsubscribe from Fetch error during crawling, click > > here<http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsu > > bscribe_by_code&node=1911847&code=bWlndWVsLnRpbnRlQGdtYWlsLmNvbXwxOTExODQ > > 3fC0xODMzNjA4OTYy>. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350