This information is not enough to understand the problem.The log you have sent seems to be the messages that appear on the console, whereas I had requested for 'logs/hadoop.log' file.
The log in this file is usually in this format:- 2008-01-03 00:00:16,652 INFO fetcher.Fetcher - fetching http://www.example.com/ 2008-01-03 00:00:17,029 INFO fetcher.Fetcher - fetching http://www.example.net/ Please send the following information:- 1. The Nutch version you are using. (NUTCH-559v0.5 was generated against the trunk. If you are using Nutch-0.9, the patch might not go smoothly. You might have to manually compare whether the patch went through nicely.) 2. It would be better if you also send the output of your patch command. 3. The relevant logs from 'log/hadoop.log' with DEBUG enabled. Please make sure before sending that the log file has the DEBUG lines. 4. The output of a sample HTTP query to your proxy server with netcat or telnet. For example:- $ nc -v 192.168.101.1 80 intproxy [192.168.101.1] 80 (www) open GET http://www.google.com/ HTTP/1.0 Host: www.google.com HTTP/1.1 407 Proxy Authentication Required ( The Server requires authorization to fulfill the request. Access to the Web Proxy filter is denied. ) Via: 1.1 INTPROXY Proxy-Authenticate: Negotiate Proxy-Authenticate: Kerberos Proxy-Authenticate: NTLM Proxy-Authenticate: Basic realm="INTPROXY" Connection: Keep-Alive Proxy-Connection: Keep-Alive Pragma: no-cache Cache-Control: no-cache Content-Type: text/html Content-Length: 4119 Only the reponse header is enough as shown above. No need to send the complete response. 5. The values of 'http.proxy.realm' property you have used in your 'conf/nutch-site.xml'. (I assume you have provided the correct host, port, username and password in the other http.proxy.* properties. Ideally, ou should also set the http.agent.host property properly though I have never found this to cause a problem.) Regards, Susam Pal On Jan 3, 2008 12:47 PM, Nidhi malik <[EMAIL PROTECTED]> wrote: > I am sending my Hadoop file and I apllied also patch559V0.5 > > at the time of fetching I am getting this messages > --------------------------------------------------------- > Fetcher: starting > Fetcher: segment: crawl/segments/20080103125023 > Fetcher: threads: 10 > fetching http://www.w3schools.com/ > http.proxy.host = netmon.iitb.ac.in > http.proxy.port = 80 > http.timeout = 100000 > http.content.limit = 65536 > http.agent = digi/Nutch-0.9 (digvijay; http://www.google.com; > [EMAIL PROTECTED]) > protocol.plugin.check.blocking = true > protocol.plugin.check.robots = true > fetcher.server.delay = 5000 > http.max.delays = 100 > Configured Client > fetch of http://www.w3schools.com/ failed with: Http code=407, url= > http://www.w3schools.com/ > Fetcher: done > > ----------------------------------------------------------------------------
