Thank you for your response and help Otis! I greatly appreciate it and am sure others will.
I did a wget from the machine where I was running Nutch and got the following... -bash-2.05b$ wget http://v4:10000/lib --10:37:52-- http://v4:10000/lib => `lib.1' Resolving v4... done. Connecting to v4:10000... connected. HTTP request sent, awaiting response... 200 OK Length: 2,717 [text/html] 100%[====================================>] 2,717 2.59M/s ETA 00:00 10:37:52 (2.59 MB/s) - `lib.1' saved [2717/2717] Then I tried to telnet too and got a connection closed. -bash-2.05b$ telnet telnet> open (to) v4 10000 Trying xxx.xxx.231.40... Connected to xxxx.ebay.com (xxx.xxx.231.40). Escape character is '^]'. Connection closed by foreign host. Doesn't telnet service/ports need to be enabled on the other end's server first before we can telnet to it? Does the nutch crawler use telnet to fetch the URL? Apparently, we do not use proxy hosts and ports here at eBay in any of our APIs, so I am not sure how to get those. But I will still ask around if they know what proxy hosts and ports we are using. Also, when I browse the URL it is fine, so I checked my IE browser options and checked on the LAN Settings to look for the proxy address and port and we are not using any as well. Thanks, Ann Del Rio -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 30, 2008 10:17 AM To: [email protected] Subject: Re: Indexing XML-based document format per DITA standard Can you connect to it (telnet to it, for example) directly from the machine(s) where you are running Nutch? (this is a network issue, nothing to do with XML/parsing) Maybe you need to go through some eBay proxy? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: "Del Rio, Ann" <[EMAIL PROTECTED]> > To: [email protected] > Sent: Friday, May 30, 2008 6:24:01 PM > Subject: Indexing XML-based document format per DITA standard > > I added a new URL to index which is in a XML-based document format per > DITA standard and I get the following error. > > java.net.SocketException: Connection reset > 2008-05-27 17:56:58 ERROR Http at > java.net.SocketInputStream.read(SocketInputStream.java:168) > 2008-05-27 17:56:58 ERROR Http at > java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > 2008-05-27 17:56:58 ERROR Http at > java.io.BufferedInputStream.read(BufferedInputStream.java:235) > 2008-05-27 17:56:58 ERROR Http at > org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:77) > 2008-05-27 17:56:58 ERROR Http at > org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:105) > 2008-05-27 17:56:58 ERROR Http at > org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.j > av > a:1115) > 2008-05-27 17:56:58 ERROR Http at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpC > on > nectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1373) > 2008-05-27 17:56:58 ERROR Http at > org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethod > Ba > se.java:1832) > 2008-05-27 17:56:58 ERROR Http at > org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBa > se > .java:1590) > 2008-05-27 17:56:58 ERROR Http at > org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.ja > va > :995) > 2008-05-27 17:56:58 ERROR Http at > org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Http > Me > thodDirector.java:397) > 2008-05-27 17:56:58 ERROR Http at > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMet > ho > dDirector.java:170) > 2008-05-27 17:56:58 ERROR Http at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java > :3 > 96) > 2008-05-27 17:56:58 ERROR Http at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java > :3 > 24) > 2008-05-27 17:56:58 ERROR Http at > org.apache.nutch.protocol.httpclient.HttpResponse.(HttpResponse.ja > va:96) > 2008-05-27 17:56:58 ERROR Http at > org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:99) > 2008-05-27 17:56:58 ERROR Http at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase > .j > ava:219) > 2008-05-27 17:56:58 ERROR Http at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145) > 2008-05-27 17:56:58 INFO Fetcher fetch of > http://v4:10000/lib failed with: > java.net.SocketException: Connection reset > > i googled and found no solution so far... > > do i need to setup some config / host file to specify the ports? > the URL is an internal website. > > any response will be appreciated. > > Thanks, > Ann Del Rio > Senior Developer > eBay, Inc
