enhancement requests

2011-03-17 Thread Andy Wingo
Hi Wolfgang, (I have my own scheme to play with, written in LISP, recently enhanced by adding call-with-prompt. Still trying to figure out all of its implications ...) Hey, me too... it's nice to have company in that regard :) You mentioned a number of wishlist items as well, that I wanted

scm_{to,from}_locale_string

2011-03-17 Thread Andy Wingo
Hi Mike, I'm looking at changing to use the helper locale_charset() function from libunistring in the scm_to_locale_string and scm_from_locale_string functions. It seems like that's more correct than snarfing through the current input/output ports. Likewise I'll just use the

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Ludovic Courtès
Hi Mark, Mark H Weaver m...@netris.org writes: I have a compromise proposal, which could be implemented for 2.0.x: We keep wide (UTF-32) stringbufs as-is, but we change narrow stringbufs to UTF-8, along with a flag that indicates whether it is known to be ASCII-only. The whole point of the

Re: `regexp-exec' and non-ascii strings

2011-03-17 Thread Andy Wingo
On Sun 06 Mar 2011 20:52, Clinton Ebadi clin...@unknownlamer.org writes: While debugging[0] an issue with Bobot++ (poor sneek!) aborting after calling scm_regexp_exec on any utf-8 strings I eventually realized that... the string was actually single-byte encoded internally. After taking that

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Mark H Weaver
l...@gnu.org (Ludovic Courtès) writes: We keep wide (UTF-32) stringbufs as-is, but we change narrow stringbufs to UTF-8, along with a flag that indicates whether it is known to be ASCII-only. The whole point of the narrow/wide distinction was to avoid variable-width encodings. In addition,

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Mike Gran
From:Ludovic Courtès l...@gnu.org Can we first check what would need to be done to fix this in 2.0.x? At first glance:   - “Straße” is normally stored as a Latin1 string, so it would need to     be converted to UTF-* before it can be passed to one of the     unicase.h functions. 

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Ludovic Courtès
Hi! Mark H Weaver m...@netris.org writes: (string-upcase Straße) = STRAßE (should be STRASSE) (string-downcase ΧΑΟΣΣ)= χαοσσ (should be χαoσς) (string-downcase ΧΑΟΣ Σ) = χαοσ σ (should be χαoς σ) (string-ci=? Straße Strasse) = #f(should be #t)

Re: scm_{to,from}_locale_string

2011-03-17 Thread Andy Wingo
Evening, On Thu 17 Mar 2011 19:38, Mike Gran spk...@yahoo.com writes: So, if have a CGI script where the stdout could have one a couple of different encodings based on a web client's language preference settings, but, where the CGI program is running in a C or en_US.utf8 locale, this might

Re: Cross-compiling Guile 2.0

2011-03-17 Thread Andy Wingo
On Sun 06 Mar 2011 23:12, l...@gnu.org (Ludovic Courtès) writes: Neil Jerram n...@ossau.uklinux.net writes: In principle, how should Guile 2.0 be cross-compiled? I'm thinking mostly of the part of the build that compiles all the installed modules. Guile 2.0 can only be cross-compiled when

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Thien-Thi Nguyen
() Mark H Weaver m...@netris.org () Thu, 17 Mar 2011 13:58:42 -0400 * regexp search: The search itself can be implemented bytewise, exactly as if it was a fixed-width encoding. Compiling the regexp can _almost_ be implemented as if the UTF-8-encoded regexp was in a fixed-width

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Mark H Weaver
Thien-Thi Nguyen t...@gnuvola.org writes: In unibyte land, . matches a byte. OK. In multibyte land done bytewise, . matches . (What goes in the blank?) . (and more generally [^...]) is equivalent to (a|b|c|d|...) where every valid UTF-8 character is present in the disjunction