Re: Using libunistring for string comparisons et al

2011-03-17 Thread Mark H Weaver
Thien-Thi Nguyen writes: > In unibyte land, "." matches a byte. OK. > > In multibyte land done "bytewise", "." matches . > (What goes in the blank?) "." (and more generally [^...]) is equivalent to (a|b|c|d|...) where every valid UTF-8 character is present in the disjunction except f

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Thien-Thi Nguyen
() Mark H Weaver () Thu, 17 Mar 2011 13:58:42 -0400 * regexp search: The search itself can be implemented bytewise, exactly as if it was a fixed-width encoding. Compiling the regexp can _almost_ be implemented as if the UTF-8-encoded regexp was in a fixed-width encoding, with j

Re: Cross-compiling Guile 2.0

2011-03-17 Thread Andy Wingo
On Sun 06 Mar 2011 23:12, l...@gnu.org (Ludovic Courtès) writes: > Neil Jerram writes: > >> In principle, how should Guile 2.0 be cross-compiled? I'm thinking >> mostly of the part of the build that compiles all the installed modules. > > Guile 2.0 can only be cross-compiled when the endianness

Re: scm_{to,from}_locale_string

2011-03-17 Thread Andy Wingo
Evening, On Thu 17 Mar 2011 19:38, Mike Gran writes: > So, if have a CGI script where the stdout could have one > a couple of different encodings based on a web client's language > preference settings, but, where the CGI program is running in a "C" > or "en_US.utf8" locale, this might count. Th

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Ludovic Courtès
Hi! Mark H Weaver writes: > (string-upcase "Straße") => "STRAßE" (should be "STRASSE") > (string-downcase "ΧΑΟΣΣ")=> "χαοσσ" (should be "χαoσς") > (string-downcase "ΧΑΟΣ Σ") => "χαοσ σ" (should be "χαoς σ") > (string-ci=? "Straße" "Strasse") => #f(should

Re: scm_{to,from}_locale_string

2011-03-17 Thread Mike Gran
> From:Andy Wingo > > Hi Mike, > > I'm looking at changing to use the helper "locale_charset()" > function > from libunistring in the scm_to_locale_string and scm_from_locale_string > functions.  It seems like that's more correct than snarfing through the > current input/output ports. > > Like

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Mike Gran
> From:Ludovic Courtès > >> Can we first check what would need to be done to fix this in 2.0.x? > >> > >> At first glance: > >> > >>   - “Straße” is normally stored as a Latin1 string, so it would need to > >>     be converted to UTF-* before it can be passed to one of the > >>     unicase.h fun

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Mark H Weaver
l...@gnu.org (Ludovic Courtès) writes: >> We keep wide (UTF-32) stringbufs as-is, but we change narrow stringbufs >> to UTF-8, along with a flag that indicates whether it is known to be >> ASCII-only. > > The whole point of the narrow/wide distinction was to avoid > variable-width encodings. In ad

Re: `regexp-exec' and non-ascii strings

2011-03-17 Thread Andy Wingo
On Sun 06 Mar 2011 20:52, Clinton Ebadi writes: > While debugging[0] an issue with Bobot++ (poor sneek!) aborting after > calling scm_regexp_exec on any utf-8 strings I eventually realized > that... the string was actually single-byte encoded internally. After > taking that down the wrong path I

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Ludovic Courtès
Hi Mark, Mark H Weaver writes: > I have a compromise proposal, which could be implemented for 2.0.x: > > We keep wide (UTF-32) stringbufs as-is, but we change narrow stringbufs > to UTF-8, along with a flag that indicates whether it is known to be > ASCII-only. The whole point of the narrow/wid

scm_{to,from}_locale_string

2011-03-17 Thread Andy Wingo
Hi Mike, I'm looking at changing to use the helper "locale_charset()" function from libunistring in the scm_to_locale_string and scm_from_locale_string functions. It seems like that's more correct than snarfing through the current input/output ports. Likewise I'll just use the scm_i_get_conversi

Re: Using libunistring for string comparisons et al

2011-03-17 Thread Mark H Weaver
I have a compromise proposal, which could be implemented for 2.0.x: We keep wide (UTF-32) stringbufs as-is, but we change narrow stringbufs to UTF-8, along with a flag that indicates whether it is known to be ASCII-only. Applying string-ref or string-set! to a narrow stringbuf would upgrade it to

enhancement requests

2011-03-17 Thread Andy Wingo
Hi Wolfgang, > (I have "my own" scheme to play with, written in LISP, recently enhanced > by adding "call-with-prompt". Still trying to figure out all of its > implications ...) Hey, me too... it's nice to have company in that regard :) You mentioned a number of wishlist items as well, that I