Re: invalid uri with three dots

2012-02-01 Thread remi tassing
Problem solved!

I replaced all whitespaces with %20 in the url before getting the
content in httpreaponse.java(Httpclient plugin).

Dirty solution? Yes, but it works for me now.

Remi

On Thursday, January 26, 2012, remi tassing tassingr...@gmail.com wrote:
 Hey guys,
 any ideas on how to properly escape non-URI characters?. I'm getting
invalid URI for urls that contain three dots, space...
 //Remi
 [1] https://issues.apache.org/jira/browse/HTTPCLIENT-858

 Ortwin Glück added a comment - 30/Jun/09 14:46
 Properly escape non-URI characters. HttpClient is not a browser and thus
does not, can not and will never try to fix invalid input.
 On Wed, Jan 18, 2012 at 4:51 PM, remi tassing tassingr...@gmail.com
wrote:

 I posted a question on this JIRA:
https://issues.apache.org/jira/browse/HTTPCLIENT-858?focusedCommentId=13188481#comment-13188481

 I looks like the same problem

 On Tue, Jan 17, 2012 at 6:41 PM, Markus Jelsma markus.jel...@openindex.io
wrote:

 this may also be an issue of protocolhttp-client.

 Hi Remi,

 This also looks like we need to document and address it.

 Can you log a Jira issue and we will try to get on to it. Can you also
have
 a look through some of the existing issues in case there is something
 similar, possibly relate them.

 Thank you in advance

 Lewis

 On Tue, Jan 17, 2012 at 9:38 AM, remi tassing tassingr...@gmail.com
wrote:
  Hi,
 
  The problem is really similar to this:
 
 
http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
  1856688.html
 
  Unfortunately, I have no clue on what to update in Nutch ...
 
  On Mon, Jan 16, 2012 at 4:41 PM, remi tassing tassingr...@gmail.com
 
  wrote:
   Hello Markus,
  
   thanks for the help!
  
   Just to clarify a little bit. In my previous message, uri1
   represented
 
  a
 
   normal, ordinary URL, I just didn't want to copy the exact URL.
  
   The weird part is that it all works in the browser...
  
  
   On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma 
 
  markus.jel...@openindex.io
 
wrote:
   This? https://uri1...From=stats
  
   That's not a correct or valid URL if you ask me.
  
   On Monday 16 January 2012 15:12:51 remi tassing wrote:
Hello ,
   
this is a snapshot of the log:
   
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
java.lang.IllegalArgumentException: Invalid uri
'https://uri1...From=stats': Invalid query
at
 
 
org.apache.commons.httpclient.HttpMethodBase.init(HttpMethodBase.java:2
  22
 
) at
 
 
org.apache.commons.httpclient.methods.GetMethod.init(GetMethod.java:89)
 
at
 
 

org.apache.nutch.protocol.httpclient.HttpResponse.init(HttpResponse.java:
79) at
  
   org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
  
at
 
 
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
  va
 
:224) at
   
org.apache.nutch.fetcher.Fetcher$FetcherThread.run


Re: invalid uri with three dots

2012-01-26 Thread remi tassing
Hey guys,

any ideas on how to properly escape non-URI characters?. I'm getting
invalid URI for urls that contain three dots, space...

//Remi

[1] https://issues.apache.org/jira/browse/HTTPCLIENT-858


Ortwin 
Glückhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=oglueck
added
a comment - 30/Jun/09 14:46
Properly escape non-URI characters. HttpClient is not a browser and thus
does not, can not and will never try to fix invalid input.

On Wed, Jan 18, 2012 at 4:51 PM, remi tassing tassingr...@gmail.com wrote:

 I posted a question on this JIRA:
 https://issues.apache.org/jira/browse/HTTPCLIENT-858?focusedCommentId=13188481#comment-13188481


 I looks like the same problem


 On Tue, Jan 17, 2012 at 6:41 PM, Markus Jelsma markus.jel...@openindex.io
  wrote:

 this may also be an issue of protocolhttp-client.

  Hi Remi,
 
  This also looks like we need to document and address it.
 
  Can you log a Jira issue and we will try to get on to it. Can you also
 have
  a look through some of the existing issues in case there is something
  similar, possibly relate them.
 
  Thank you in advance
 
  Lewis
 
  On Tue, Jan 17, 2012 at 9:38 AM, remi tassing tassingr...@gmail.com
 wrote:
   Hi,
  
   The problem is really similar to this:
  
  
 http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
   1856688.html
  
   Unfortunately, I have no clue on what to update in Nutch ...
  
   On Mon, Jan 16, 2012 at 4:41 PM, remi tassing tassingr...@gmail.com
  
   wrote:
Hello Markus,
   
thanks for the help!
   
Just to clarify a little bit. In my previous message, uri1
represented
  
   a
  
normal, ordinary URL, I just didn't want to copy the exact URL.
   
The weird part is that it all works in the browser...
   
   
On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma 
  
   markus.jel...@openindex.io
  
 wrote:
This? https://uri1...From=stats
   
That's not a correct or valid URL if you ask me.
   
On Monday 16 January 2012 15:12:51 remi tassing wrote:
 Hello ,

 this is a snapshot of the log:

 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 java.lang.IllegalArgumentException: Invalid uri
 'https://uri1...From=stats': Invalid query
 at
  
  
 org.apache.commons.httpclient.HttpMethodBase.init(HttpMethodBase.java:2
   22
  
 ) at
  
  
 org.apache.commons.httpclient.methods.GetMethod.init(GetMethod.java:89)
  
 at
  
  

 org.apache.nutch.protocol.httpclient.HttpResponse.init(HttpResponse.java:
 79) at
   
   
 org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
   
 at
  
  
 org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
   va
  
 :224) at


 org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
   
fetch
   
 of https://uri1...From=stats failed with:
 java.lang.IllegalArgumentException: Invalid uri
 'https://uri1...From=stats': Invalid query
 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96

 On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma

 markus.jel...@openindex.iowrote:
  copy the stack trace please
 
  On Monday 16 January 2012 14:58:46 remi tassing wrote:
   Hello all,
  
   I'm getting invalid uri error with some link that have
 three
  
   dots,
  
   i.e.  They work perfectly well in browsers (IE and
 Chrome)
   
but,
   
   apparently, not with Nutch.
  
   Is this a known issue? Any idea on how to handle it?
  
   Remi
 
  --
  Markus Jelsma - CTO - Openindex
   
--
Markus Jelsma - CTO - Openindex





Re: invalid uri with three dots

2012-01-18 Thread remi tassing
I posted a question on this JIRA:
https://issues.apache.org/jira/browse/HTTPCLIENT-858?focusedCommentId=13188481#comment-13188481


I looks like the same problem

On Tue, Jan 17, 2012 at 6:41 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 this may also be an issue of protocolhttp-client.

  Hi Remi,
 
  This also looks like we need to document and address it.
 
  Can you log a Jira issue and we will try to get on to it. Can you also
 have
  a look through some of the existing issues in case there is something
  similar, possibly relate them.
 
  Thank you in advance
 
  Lewis
 
  On Tue, Jan 17, 2012 at 9:38 AM, remi tassing tassingr...@gmail.com
 wrote:
   Hi,
  
   The problem is really similar to this:
  
  
 http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
   1856688.html
  
   Unfortunately, I have no clue on what to update in Nutch ...
  
   On Mon, Jan 16, 2012 at 4:41 PM, remi tassing tassingr...@gmail.com
  
   wrote:
Hello Markus,
   
thanks for the help!
   
Just to clarify a little bit. In my previous message, uri1
represented
  
   a
  
normal, ordinary URL, I just didn't want to copy the exact URL.
   
The weird part is that it all works in the browser...
   
   
On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma 
  
   markus.jel...@openindex.io
  
 wrote:
This? https://uri1...From=stats
   
That's not a correct or valid URL if you ask me.
   
On Monday 16 January 2012 15:12:51 remi tassing wrote:
 Hello ,

 this is a snapshot of the log:

 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 java.lang.IllegalArgumentException: Invalid uri
 'https://uri1...From=stats': Invalid query
 at
  
  
 org.apache.commons.httpclient.HttpMethodBase.init(HttpMethodBase.java:2
   22
  
 ) at
  
  
 org.apache.commons.httpclient.methods.GetMethod.init(GetMethod.java:89)
  
 at
  
  
 org.apache.nutch.protocol.httpclient.HttpResponse.init(HttpResponse.java:
 79) at
   
org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
   
 at
  
  
 org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
   va
  
 :224) at


 org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
   
fetch
   
 of https://uri1...From=stats failed with:
 java.lang.IllegalArgumentException: Invalid uri
 'https://uri1...From=stats': Invalid query
 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96

 On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma

 markus.jel...@openindex.iowrote:
  copy the stack trace please
 
  On Monday 16 January 2012 14:58:46 remi tassing wrote:
   Hello all,
  
   I'm getting invalid uri error with some link that have three
  
   dots,
  
   i.e.  They work perfectly well in browsers (IE and
 Chrome)
   
but,
   
   apparently, not with Nutch.
  
   Is this a known issue? Any idea on how to handle it?
  
   Remi
 
  --
  Markus Jelsma - CTO - Openindex
   
--
Markus Jelsma - CTO - Openindex



Re: invalid uri with three dots

2012-01-17 Thread remi tassing
Hi,

The problem is really similar to this:
http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td21856688.html

Unfortunately, I have no clue on what to update in Nutch ...

On Mon, Jan 16, 2012 at 4:41 PM, remi tassing tassingr...@gmail.com wrote:

 Hello Markus,

 thanks for the help!

 Just to clarify a little bit. In my previous message, uri1 represented a
 normal, ordinary URL, I just didn't want to copy the exact URL.

 The weird part is that it all works in the browser...


 On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma markus.jel...@openindex.io
  wrote:

 This? https://uri1...From=stats

 That's not a correct or valid URL if you ask me.

 On Monday 16 January 2012 15:12:51 remi tassing wrote:
  Hello ,
 
  this is a snapshot of the log:
 
  -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
  -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
  -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
  -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
  java.lang.IllegalArgumentException: Invalid uri
  'https://uri1...From=stats': Invalid query
  at
 
 org.apache.commons.httpclient.HttpMethodBase.init(HttpMethodBase.java:222
  ) at
 
 org.apache.commons.httpclient.methods.GetMethod.init(GetMethod.java:89)
  at
 
 org.apache.nutch.protocol.httpclient.HttpResponse.init(HttpResponse.java:
  79) at
 org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
  at
 
 org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
  :224) at
  org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
 fetch
  of https://uri1...From=stats failed with:
  java.lang.IllegalArgumentException: Invalid uri
  'https://uri1...From=stats': Invalid query
  -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
  -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
 
  On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
 
  markus.jel...@openindex.iowrote:
   copy the stack trace please
  
   On Monday 16 January 2012 14:58:46 remi tassing wrote:
Hello all,
   
I'm getting invalid uri error with some link that have three dots,
i.e.  They work perfectly well in browsers (IE and Chrome)
 but,
apparently, not with Nutch.
   
Is this a known issue? Any idea on how to handle it?
   
Remi
  
   --
   Markus Jelsma - CTO - Openindex

 --
 Markus Jelsma - CTO - Openindex





Re: invalid uri with three dots

2012-01-17 Thread Lewis John Mcgibbney
Hi Remi,

This also looks like we need to document and address it.

Can you log a Jira issue and we will try to get on to it. Can you also have
a look through some of the existing issues in case there is something
similar, possibly relate them.

Thank you in advance

Lewis

On Tue, Jan 17, 2012 at 9:38 AM, remi tassing tassingr...@gmail.com wrote:

 Hi,

 The problem is really similar to this:

 http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td21856688.html

 Unfortunately, I have no clue on what to update in Nutch ...

 On Mon, Jan 16, 2012 at 4:41 PM, remi tassing tassingr...@gmail.com
 wrote:

  Hello Markus,
 
  thanks for the help!
 
  Just to clarify a little bit. In my previous message, uri1 represented
 a
  normal, ordinary URL, I just didn't want to copy the exact URL.
 
  The weird part is that it all works in the browser...
 
 
  On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma 
 markus.jel...@openindex.io
   wrote:
 
  This? https://uri1...From=stats
 
  That's not a correct or valid URL if you ask me.
 
  On Monday 16 January 2012 15:12:51 remi tassing wrote:
   Hello ,
  
   this is a snapshot of the log:
  
   -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
   -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
   -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
   -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
   java.lang.IllegalArgumentException: Invalid uri
   'https://uri1...From=stats': Invalid query
   at
  
 
 org.apache.commons.httpclient.HttpMethodBase.init(HttpMethodBase.java:222
   ) at
  
 
 org.apache.commons.httpclient.methods.GetMethod.init(GetMethod.java:89)
   at
  
 
 org.apache.nutch.protocol.httpclient.HttpResponse.init(HttpResponse.java:
   79) at
  org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
   at
  
 
 org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
   :224) at
   org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
  fetch
   of https://uri1...From=stats failed with:
   java.lang.IllegalArgumentException: Invalid uri
   'https://uri1...From=stats': Invalid query
   -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
   -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
  
   On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
  
   markus.jel...@openindex.iowrote:
copy the stack trace please
   
On Monday 16 January 2012 14:58:46 remi tassing wrote:
 Hello all,

 I'm getting invalid uri error with some link that have three
 dots,
 i.e.  They work perfectly well in browsers (IE and Chrome)
  but,
 apparently, not with Nutch.

 Is this a known issue? Any idea on how to handle it?

 Remi
   
--
Markus Jelsma - CTO - Openindex
 
  --
  Markus Jelsma - CTO - Openindex
 
 
 




-- 
*Lewis*


Re: invalid uri with three dots

2012-01-17 Thread Markus Jelsma
this may also be an issue of protocolhttp-client. 

 Hi Remi,
 
 This also looks like we need to document and address it.
 
 Can you log a Jira issue and we will try to get on to it. Can you also have
 a look through some of the existing issues in case there is something
 similar, possibly relate them.
 
 Thank you in advance
 
 Lewis
 
 On Tue, Jan 17, 2012 at 9:38 AM, remi tassing tassingr...@gmail.com wrote:
  Hi,
  
  The problem is really similar to this:
  
  http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
  1856688.html
  
  Unfortunately, I have no clue on what to update in Nutch ...
  
  On Mon, Jan 16, 2012 at 4:41 PM, remi tassing tassingr...@gmail.com
  
  wrote:
   Hello Markus,
   
   thanks for the help!
   
   Just to clarify a little bit. In my previous message, uri1
   represented
  
  a
  
   normal, ordinary URL, I just didn't want to copy the exact URL.
   
   The weird part is that it all works in the browser...
   
   
   On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma 
  
  markus.jel...@openindex.io
  
wrote:
   This? https://uri1...From=stats
   
   That's not a correct or valid URL if you ask me.
   
   On Monday 16 January 2012 15:12:51 remi tassing wrote:
Hello ,

this is a snapshot of the log:

-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
java.lang.IllegalArgumentException: Invalid uri
'https://uri1...From=stats': Invalid query
at
  
  org.apache.commons.httpclient.HttpMethodBase.init(HttpMethodBase.java:2
  22
  
) at
  
  org.apache.commons.httpclient.methods.GetMethod.init(GetMethod.java:89)
  
at
  
  
org.apache.nutch.protocol.httpclient.HttpResponse.init(HttpResponse.java:
79) at
   
   org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
   
at
  
  org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
  va
  
:224) at

org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
   
   fetch
   
of https://uri1...From=stats failed with:
java.lang.IllegalArgumentException: Invalid uri
'https://uri1...From=stats': Invalid query
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96

On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma

markus.jel...@openindex.iowrote:
 copy the stack trace please
 
 On Monday 16 January 2012 14:58:46 remi tassing wrote:
  Hello all,
  
  I'm getting invalid uri error with some link that have three
  
  dots,
  
  i.e.  They work perfectly well in browsers (IE and Chrome)
   
   but,
   
  apparently, not with Nutch.
  
  Is this a known issue? Any idea on how to handle it?
  
  Remi
 
 --
 Markus Jelsma - CTO - Openindex
   
   --
   Markus Jelsma - CTO - Openindex


Re: invalid uri with three dots

2012-01-16 Thread remi tassing
It comes under the error java.lang.IllegalArgumentException

On Mon, Jan 16, 2012 at 3:58 PM, remi tassing tassingr...@gmail.com wrote:

 Hello all,

 I'm getting invalid uri error with some link that have three dots, i.e.
  They work perfectly well in browsers (IE and Chrome) but,
 apparently, not with Nutch.

 Is this a known issue? Any idea on how to handle it?

 Remi



Re: invalid uri with three dots

2012-01-16 Thread Markus Jelsma
copy the stack trace please

On Monday 16 January 2012 14:58:46 remi tassing wrote:
 Hello all,
 
 I'm getting invalid uri error with some link that have three dots, i.e.
  They work perfectly well in browsers (IE and Chrome) but,
 apparently, not with Nutch.
 
 Is this a known issue? Any idea on how to handle it?
 
 Remi

-- 
Markus Jelsma - CTO - Openindex


Re: invalid uri with three dots

2012-01-16 Thread remi tassing
Hello ,

this is a snapshot of the log:

-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
java.lang.IllegalArgumentException: Invalid uri 'https://uri1...From=stats':
Invalid query
at
org.apache.commons.httpclient.HttpMethodBase.init(HttpMethodBase.java:222)
at org.apache.commons.httpclient.methods.GetMethod.init(GetMethod.java:89)
at
org.apache.nutch.protocol.httpclient.HttpResponse.init(HttpResponse.java:79)
at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:224)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
fetch of https://uri1...From=stats failed with:
java.lang.IllegalArgumentException: Invalid uri 'https://uri1...From=stats':
Invalid query
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96

On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 copy the stack trace please

 On Monday 16 January 2012 14:58:46 remi tassing wrote:
  Hello all,
 
  I'm getting invalid uri error with some link that have three dots, i.e.
   They work perfectly well in browsers (IE and Chrome) but,
  apparently, not with Nutch.
 
  Is this a known issue? Any idea on how to handle it?
 
  Remi

 --
 Markus Jelsma - CTO - Openindex



Re: invalid uri with three dots

2012-01-16 Thread Markus Jelsma
This? https://uri1...From=stats

That's not a correct or valid URL if you ask me.

On Monday 16 January 2012 15:12:51 remi tassing wrote:
 Hello ,
 
 this is a snapshot of the log:
 
 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
 java.lang.IllegalArgumentException: Invalid uri
 'https://uri1...From=stats': Invalid query
 at
 org.apache.commons.httpclient.HttpMethodBase.init(HttpMethodBase.java:222
 ) at
 org.apache.commons.httpclient.methods.GetMethod.init(GetMethod.java:89)
 at
 org.apache.nutch.protocol.httpclient.HttpResponse.init(HttpResponse.java:
 79) at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
 at
 org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
 :224) at
 org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628) fetch
 of https://uri1...From=stats failed with:
 java.lang.IllegalArgumentException: Invalid uri
 'https://uri1...From=stats': Invalid query
 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
 
 On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
 
 markus.jel...@openindex.iowrote:
  copy the stack trace please
  
  On Monday 16 January 2012 14:58:46 remi tassing wrote:
   Hello all,
   
   I'm getting invalid uri error with some link that have three dots,
   i.e.  They work perfectly well in browsers (IE and Chrome) but,
   apparently, not with Nutch.
   
   Is this a known issue? Any idea on how to handle it?
   
   Remi
  
  --
  Markus Jelsma - CTO - Openindex

-- 
Markus Jelsma - CTO - Openindex


Re: invalid uri with three dots

2012-01-16 Thread remi tassing
Hello Markus,

thanks for the help!

Just to clarify a little bit. In my previous message, uri1 represented a
normal, ordinary URL, I just didn't want to copy the exact URL.

The weird part is that it all works in the browser...

On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 This? https://uri1...From=stats

 That's not a correct or valid URL if you ask me.

 On Monday 16 January 2012 15:12:51 remi tassing wrote:
  Hello ,
 
  this is a snapshot of the log:
 
  -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
  -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
  -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
  -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
  java.lang.IllegalArgumentException: Invalid uri
  'https://uri1...From=stats': Invalid query
  at
 
 org.apache.commons.httpclient.HttpMethodBase.init(HttpMethodBase.java:222
  ) at
  org.apache.commons.httpclient.methods.GetMethod.init(GetMethod.java:89)
  at
 
 org.apache.nutch.protocol.httpclient.HttpResponse.init(HttpResponse.java:
  79) at
 org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
  at
 
 org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
  :224) at
  org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
 fetch
  of https://uri1...From=stats failed with:
  java.lang.IllegalArgumentException: Invalid uri
  'https://uri1...From=stats': Invalid query
  -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
  -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
 
  On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
 
  markus.jel...@openindex.iowrote:
   copy the stack trace please
  
   On Monday 16 January 2012 14:58:46 remi tassing wrote:
Hello all,
   
I'm getting invalid uri error with some link that have three dots,
i.e.  They work perfectly well in browsers (IE and Chrome) but,
apparently, not with Nutch.
   
Is this a known issue? Any idea on how to handle it?
   
Remi
  
   --
   Markus Jelsma - CTO - Openindex

 --
 Markus Jelsma - CTO - Openindex