Problem solved! I replaced all whitespaces with "%20" in the url before getting the "content" in httpreaponse.java(Httpclient plugin).
Dirty solution? Yes, but it works for me now. Remi On Thursday, January 26, 2012, remi tassing <tassingr...@gmail.com> wrote: > Hey guys, > any ideas on how to "properly escape non-URI characters?". I'm getting invalid URI for urls that contain "three dots", "space"... > //Remi > [1] https://issues.apache.org/jira/browse/HTTPCLIENT-858 > > Ortwin Glück added a comment - 30/Jun/09 14:46 > Properly escape non-URI characters. HttpClient is not a browser and thus does not, can not and will never try to fix invalid input. > On Wed, Jan 18, 2012 at 4:51 PM, remi tassing <tassingr...@gmail.com> wrote: > > I posted a question on this JIRA: https://issues.apache.org/jira/browse/HTTPCLIENT-858?focusedCommentId=13188481#comment-13188481 > I looks like the same problem > > On Tue, Jan 17, 2012 at 6:41 PM, Markus Jelsma <markus.jel...@openindex.io> wrote: > > this may also be an issue of protocolhttp-client. > >> Hi Remi, >> >> This also looks like we need to document and address it. >> >> Can you log a Jira issue and we will try to get on to it. Can you also have >> a look through some of the existing issues in case there is something >> similar, possibly relate them. >> >> Thank you in advance >> >> Lewis >> >> On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <tassingr...@gmail.com> wrote: >> > Hi, >> > >> > The problem is really similar to this: >> > >> > http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2 >> > 1856688.html >> > >> > Unfortunately, I have no clue on what to update in Nutch ... >> > >> > On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <tassingr...@gmail.com> >> > >> > wrote: >> > > Hello Markus, >> > > >> > > thanks for the help! >> > > >> > > Just to clarify a little bit. In my previous message, "uri1" >> > > represented >> > >> > a >> > >> > > normal, ordinary URL, I just didn't want to copy the exact URL. >> > > >> > > The weird part is that it all works in the browser... >> > > >> > > >> > > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma < >> > >> > markus.jel...@openindex.io >> > >> > > > wrote: >> > >> This? https://uri1...&From=stats >> > >> >> > >> That's not a correct or valid URL if you ask me. >> > >> >> > >> On Monday 16 January 2012 15:12:51 remi tassing wrote: >> > >> > Hello , >> > >> > >> > >> > this is a snapshot of the log: >> > >> > >> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96 >> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96 >> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96 >> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96 >> > >> > java.lang.IllegalArgumentException: Invalid uri >> > >> > 'https://uri1...&From=stats': Invalid query >> > >> > at >> > >> > org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:2 >> > 22 >> > >> > >> > ) at >> > >> > org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89) >> > >> > >> > at >> > >> > > org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java: >> > >> > 79) at >> > >> >> > >> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154) >> > >> >> > >> > at >> > >> > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja >> > va >> > >> > >> > :224) at >> > >> > >> > >> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run