Problem solved!

I replaced all whitespaces with "%20" in the url before getting the
"content" in httpreaponse.java(Httpclient plugin).

Dirty solution? Yes, but it works for me now.

Remi

On Thursday, January 26, 2012, remi tassing <tassingr...@gmail.com> wrote:
> Hey guys,
> any ideas on how to "properly escape non-URI characters?". I'm getting
invalid URI for urls that contain "three dots", "space"...
> //Remi
> [1] https://issues.apache.org/jira/browse/HTTPCLIENT-858
>
> Ortwin Glück added a comment - 30/Jun/09 14:46
> Properly escape non-URI characters. HttpClient is not a browser and thus
does not, can not and will never try to fix invalid input.
> On Wed, Jan 18, 2012 at 4:51 PM, remi tassing <tassingr...@gmail.com>
wrote:
>
> I posted a question on this JIRA:
https://issues.apache.org/jira/browse/HTTPCLIENT-858?focusedCommentId=13188481#comment-13188481

> I looks like the same problem
>
> On Tue, Jan 17, 2012 at 6:41 PM, Markus Jelsma <markus.jel...@openindex.io>
wrote:
>
> this may also be an issue of protocolhttp-client.
>
>> Hi Remi,
>>
>> This also looks like we need to document and address it.
>>
>> Can you log a Jira issue and we will try to get on to it. Can you also
have
>> a look through some of the existing issues in case there is something
>> similar, possibly relate them.
>>
>> Thank you in advance
>>
>> Lewis
>>
>> On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <tassingr...@gmail.com>
wrote:
>> > Hi,
>> >
>> > The problem is really similar to this:
>> >
>> >
http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
>> > 1856688.html
>> >
>> > Unfortunately, I have no clue on what to update in Nutch ...
>> >
>> > On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <tassingr...@gmail.com>
>> >
>> > wrote:
>> > > Hello Markus,
>> > >
>> > > thanks for the help!
>> > >
>> > > Just to clarify a little bit. In my previous message, "uri1"
>> > > represented
>> >
>> > a
>> >
>> > > normal, ordinary URL, I just didn't want to copy the exact URL.
>> > >
>> > > The weird part is that it all works in the browser...
>> > >
>> > >
>> > > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <
>> >
>> > markus.jel...@openindex.io
>> >
>> > > > wrote:
>> > >> This? https://uri1...&From=stats
>> > >>
>> > >> That's not a correct or valid URL if you ask me.
>> > >>
>> > >> On Monday 16 January 2012 15:12:51 remi tassing wrote:
>> > >> > Hello ,
>> > >> >
>> > >> > this is a snapshot of the log:
>> > >> >
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > java.lang.IllegalArgumentException: Invalid uri
>> > >> > 'https://uri1...&From=stats': Invalid query
>> > >> > at
>> >
>> >
org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:2
>> > 22
>> >
>> > >> > ) at
>> >
>> >
org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
>> >
>> > >> > at
>> >
>> >
>
org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
>> > >> > 79) at
>> > >>
>> > >> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
>> > >>
>> > >> > at
>> >
>> >
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
>> > va
>> >
>> > >> > :224) at
>> > >> >
>> > >> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run

Reply via email to