Thank you for your response and help Otis!
I greatly appreciate it and am sure others will.


I did a wget from the machine where I was running Nutch and got the
following...

-bash-2.05b$ wget http://v4:10000/lib
--10:37:52--  http://v4:10000/lib
           => `lib.1'
Resolving v4... done.
Connecting to v4:10000... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2,717 [text/html]
100%[====================================>] 2,717          2.59M/s
ETA 00:00
10:37:52 (2.59 MB/s) - `lib.1' saved [2717/2717]

Then I tried to telnet too and got a connection closed.

-bash-2.05b$ telnet
telnet> open
(to) v4 10000
Trying xxx.xxx.231.40...
Connected to xxxx.ebay.com (xxx.xxx.231.40).
Escape character is '^]'.
Connection closed by foreign host.

Doesn't telnet service/ports need to be enabled on the other end's
server first before we can telnet to it? Does the nutch crawler use
telnet to fetch the URL?

Apparently, we do not use proxy hosts and ports here at eBay in any of
our APIs, so I am not sure how to get those. But I will still ask around
if they know what proxy hosts and ports we are using.

Also, when I browse the URL it is fine, so I checked my IE browser
options and checked on the LAN Settings to look for the proxy address
and port and we are not using any as well. 


Thanks,
Ann Del Rio

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 30, 2008 10:17 AM
To: [email protected]
Subject: Re: Indexing XML-based document format per DITA standard

Can you connect to it (telnet to it, for example) directly from the
machine(s) where you are running Nutch?
(this is a network issue, nothing to do with XML/parsing)


Maybe you need to go through some eBay proxy?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: "Del Rio, Ann" <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Friday, May 30, 2008 6:24:01 PM
> Subject: Indexing XML-based document format per DITA standard
> 
> I added a new URL to index which is in a XML-based document format per

> DITA standard and I get the following error.
> 
> java.net.SocketException: Connection reset
> 2008-05-27 17:56:58 ERROR Http                 at
> java.net.SocketInputStream.read(SocketInputStream.java:168)
> 2008-05-27 17:56:58 ERROR Http                 at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 2008-05-27 17:56:58 ERROR Http                 at
> java.io.BufferedInputStream.read(BufferedInputStream.java:235)
> 2008-05-27 17:56:58 ERROR Http                 at
>
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:77)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:105)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.j
> av
> a:1115)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpC
> on
> nectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1373)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethod
> Ba
> se.java:1832)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBa
> se
> .java:1590)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.ja
> va
> :995)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Http
> Me
> thodDirector.java:397)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMet
> ho
> dDirector.java:170)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java
> :3
> 96)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java
> :3
> 24)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.nutch.protocol.httpclient.HttpResponse.(HttpResponse.ja
> va:96)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:99)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase
> .j
> ava:219)
> 2008-05-27 17:56:58 ERROR Http                 at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145)
> 2008-05-27 17:56:58 INFO  Fetcher              fetch of
> http://v4:10000/lib   failed with:
> java.net.SocketException: Connection reset
> 
> i googled and found no solution so far...
> 
> do i need to setup some config / host file to specify the ports?
> the URL is an internal website.
> 
> any response will be appreciated.
> 
> Thanks,
> Ann Del Rio
> Senior Developer
> eBay, Inc

Reply via email to