Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-25 Thread Alex Shinn
On Wed, Jan 23, 2013 at 5:09 PM, Alex Shinn alexsh...@gmail.com wrote: On Wed, Jan 23, 2013 at 3:45 PM, Ivan Raikov ivan.g.rai...@gmail.comwrote: Yes, I ran into this when I was adding UTF-8 support to mbox... If you were to add wide char support in srfi-14, is there a way to quantify the

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-23 Thread Peter Bex
On Wed, Jan 23, 2013 at 03:29:01PM +0900, Ivan Raikov wrote: Hi Peter, I think uri-generic does not silently mangle input upon receiving UTF-8, it just returns #f. When parsing, yes. I think this should stay the way it is (see below). What I was referring to here was the example in my

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-22 Thread Ivan Raikov
Hi Peter, I think uri-generic does not silently mangle input upon receiving UTF-8, it just returns #f. I think it is not a bad idea to raise an exception instead. I have not yet had the chance to thoroughly test the UTF-8 mapping constructor, but will try to do this during the weekend.

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-22 Thread Alex Shinn
On Thu, Jan 17, 2013 at 4:51 AM, Peter Bex peter@xs4all.nl wrote: On Tue, Jan 15, 2013 at 02:44:08PM +0900, Alex Shinn wrote: This result looks broken. As I noted in my previous mail, the URI representation already handles non-ASCII characters and escapes on output: $ csi -R

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-17 Thread Peter Bex
On Thu, Jan 17, 2013 at 09:35:36AM +0900, Ivan Raikov wrote: Hi Peter, I think that allowing raw UTF-8 sequences in uri-generic breaks compatibility with RFC 3986. In other words, if you construct a URI with a UTF-8 sequence that happens to include reserved ASCII characters, those ASCII

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-16 Thread Peter Bex
On Wed, Jan 16, 2013 at 11:22:57AM +0900, Alex Shinn wrote: Anyway, this isn't really important. I'm mostly concerned with making utf8 do the right thing, and was wondering what the API was because it's not clear from the docs. OK, I think it's worth figuring this out. Put another way, do

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-16 Thread Peter Bex
On Tue, Jan 15, 2013 at 02:44:08PM +0900, Alex Shinn wrote: This result looks broken. As I noted in my previous mail, the URI representation already handles non-ASCII characters and escapes on output: $ csi -R uri-common #;1 (make-uri scheme: http host: 127.0.0.1 path: '(/ 삼계탕))

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-16 Thread Ivan Raikov
Hi Peter, I think that allowing raw UTF-8 sequences in uri-generic breaks compatibility with RFC 3986. In other words, if you construct a URI with a UTF-8 sequence that happens to include reserved ASCII characters, those ASCII characters will not get escaped, and you could potentially be

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Alex Shinn
On Tue, Jan 15, 2013 at 3:03 PM, Ivan Raikov ivan.g.rai...@gmail.comwrote: Percent-encoded sequences of more than one octet will not get touched by pct-decode in the current implementation, so you will not get double escaping. Percent-encoded sequences of one octet will get decoded if they

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Peter Bex
On Tue, Jan 15, 2013 at 06:07:06PM +0900, Alex Shinn wrote: On Tue, Jan 15, 2013 at 3:03 PM, Ivan Raikov ivan.g.rai...@gmail.comwrote: Percent-encoded sequences of more than one octet will not get touched by pct-decode in the current implementation, so you will not get double escaping.

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Peter Bex
On Tue, Jan 15, 2013 at 07:30:07PM +0900, Alex Shinn wrote: Right, I'm familiar with the evil standards :) I'm also hoping that we can have some basic compatibility between Chicken's uri module and Chibi's (and whatever R7RS WG2 comes up with). That would be nice indeed. It seems to me the

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Alex Shinn
On Tue, Jan 15, 2013 at 7:48 PM, Peter Bex peter@xs4all.nl wrote: These special characters are called reserved in the BNF. As you can see, the question mark, equals sign and ampersand is in there. For query urlencoded query strings, these *cannot* be decoded, because then you can't

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Peter Bex
On Wed, Jan 16, 2013 at 12:39:16AM +0900, Alex Shinn wrote: The internal representation is either decoded, or it is encoded. Either can be made to work. In this case, the decoded uri-common representation of the former is: ((bool-expr . xy=1)) and the decoded representation of the

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-15 Thread Alex Shinn
On Wed, Jan 16, 2013 at 12:59 AM, Peter Bex peter@xs4all.nl wrote: On Wed, Jan 16, 2013 at 12:39:16AM +0900, Alex Shinn wrote: The internal representation is either decoded, or it is encoded. Either can be made to work. In this case, the decoded uri-common representation of the

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-14 Thread Peter Bex
On Mon, Jan 14, 2013 at 02:42:40PM +0900, Alex Shinn wrote: On Mon, Jan 14, 2013 at 1:36 PM, Sungjin Chun chu...@gmail.com wrote: As far as I know, revised RFC permits UTF-8 characters in the URL without encoding. Am I wrong here? Thus you can't use raw non-ASCII bytes in a URI - they must

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-14 Thread .alyn.post.
On Mon, Jan 14, 2013 at 09:18:52AM +0100, Peter Bex wrote: On Mon, Jan 14, 2013 at 02:42:40PM +0900, Alex Shinn wrote: On Mon, Jan 14, 2013 at 1:36 PM, Sungjin Chun chu...@gmail.com wrote: As far as I know, revised RFC permits UTF-8 characters in the URL without encoding. Am I wrong here?

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-14 Thread Sungjin Chun
Thank you very much. :-) My proposed hack(yes, no solution) just works for me but I found that it is just wrong w.r.t RFC. I'll try your modification and and let you know whether it works or not. Thank you again. On Mon, Jan 14, 2013 at 5:08 PM, Ivan Raikov ivan.g.rai...@gmail.comwrote: Hi

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-14 Thread Alex Shinn
On Tue, Jan 15, 2013 at 7:35 AM, Sungjin Chun chu...@gmail.com wrote: Thank you very much. :-) My proposed hack(yes, no solution) just works for me but I found that it is just wrong w.r.t RFC. I'll try your modification and and let you know whether it works or not. Thank you again. On

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-14 Thread Sungjin Chun
My intention is to create search client for Solr (search server using lucene); where I should send request URL like this; http://127.0.0.1:8983/solr/select?q=삼계탕start=0rows=10 I've tried to create this client using http-client egg and had found that it does not like UTF-8 characters in the

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-14 Thread Alex Shinn
On Tue, Jan 15, 2013 at 11:50 AM, Sungjin Chun chu...@gmail.com wrote: My intention is to create search client for Solr (search server using lucene); where I should send request URL like this; http://127.0.0.1:8983/solr/select?q=삼계탕start=0rows=10 I've tried to create this client using

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-14 Thread Ivan Raikov
Hi all, I realized that I replied only to Sungjin and neglected to include the mailing list, so let me repeat. Section 3.1 of RFC 3987 defines a mapping between IRIs and URIs such that UTF-8 sequences are percent-encoded. So I implemented a procedure iri-uri, which percent-encodes a UTF-8

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-14 Thread Ivan Raikov
Hi again, I have now extended the utf8 code in uri-generic, so that UTF-8 sequences are percent-encoded as lists of the form '(% h1 h2 [% h3 h4 ...])). The percent-decoding routine is not going to decode sequences of more that one byte, so that now percent encoding normalization will not

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-14 Thread Alex Shinn
On Tue, Jan 15, 2013 at 2:23 PM, Ivan Raikov ivan.g.rai...@gmail.comwrote: Hi again, I have now extended the utf8 code in uri-generic, so that UTF-8 sequences are percent-encoded as lists of the form '(% h1 h2 [% h3 h4 ...])). The percent-decoding routine is not going to decode sequences

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-14 Thread Ivan Raikov
Hi Alex, I understand your point about make-uri, but I want to provide a uri constructor that takes a UTF-8 input string and maps it in accordance with RFC 3986 / 3987. So we still have to perform path and percent-encoding normalization steps for the ASCII portions of the string. make-uri

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-14 Thread Ivan Raikov
Oops, the second example should have been For the string 삼계탕 the octets are EC 82 BC EA B3 84 ED 83 95 and (utf8-string-uri http://example.com/삼계탕;) produces #(URI scheme=http authority=#(URIAuth host=example.com port=#f) path=(/ %EC%82%BC%EA%B3%84%ED%83%95) query=#f fragment=#f) Sorry about

[Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-13 Thread Sungjin Chun
For testing solr, lucene based client, I have to create url which contains utf-8 encoding(for Korean). But having this encoding uri-common cannot create uri. Can any one help me on this? Thanks. Sent from my iPhone ___ Chicken-users mailing list

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-13 Thread Peter Bex
On Mon, Jan 14, 2013 at 07:04:05AM +0900, Sungjin Chun wrote: For testing solr, lucene based client, I have to create url which contains utf-8 encoding(for Korean). But having this encoding uri-common cannot create uri. Can any one help me on this? Thanks. Hello Sungjin, As far as I

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-13 Thread Sungjin Chun
Though I'm not that fluent in scheme, I'll try to make test case for uri-generic with UTF-8 string. Thanks. On Mon, Jan 14, 2013 at 7:15 AM, Peter Bex peter@xs4all.nl wrote: On Mon, Jan 14, 2013 at 07:04:05AM +0900, Sungjin Chun wrote: For testing solr, lucene based client, I have to

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-13 Thread Sungjin Chun
First, I might have found wrong place but... It seems that the main source of the my problem is related to the part of uri-generic.scm, especially; (define char-set:uri-unreserved (char-set union char-set:letter+digit (string-char-set -_.~))) If I change this part as; (define

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-13 Thread Sungjin Chun
As far as I know, revised RFC permits UTF-8 characters in the URL without encoding. Am I wrong here? Even Solr (the search engine) permits them. On Mon, Jan 14, 2013 at 1:26 PM, Alex Shinn alexsh...@gmail.com wrote: Hi, On Mon, Jan 14, 2013 at 12:52 PM, Sungjin Chun chu...@gmail.com wrote:

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

2013-01-13 Thread Alex Shinn
On Mon, Jan 14, 2013 at 1:36 PM, Sungjin Chun chu...@gmail.com wrote: As far as I know, revised RFC permits UTF-8 characters in the URL without encoding. Am I wrong here? The latest URI RFC is 3986. The relevant description in prose is: Local names, such as file system names, are stored