Thien-Thi Nguyen writes:
> In unibyte land, "." matches a byte. OK.
>
> In multibyte land done "bytewise", "." matches .
> (What goes in the blank?)
"." (and more generally [^...]) is equivalent to (a|b|c|d|...) where
every valid UTF-8 character is present in the disjunction except f
() Mark H Weaver
() Thu, 17 Mar 2011 13:58:42 -0400
* regexp search: The search itself can be implemented bytewise, exactly
as if it was a fixed-width encoding. Compiling the regexp can
_almost_ be implemented as if the UTF-8-encoded regexp was in a
fixed-width encoding, with j
On Sun 06 Mar 2011 23:12, l...@gnu.org (Ludovic Courtès) writes:
> Neil Jerram writes:
>
>> In principle, how should Guile 2.0 be cross-compiled? I'm thinking
>> mostly of the part of the build that compiles all the installed modules.
>
> Guile 2.0 can only be cross-compiled when the endianness
Evening,
On Thu 17 Mar 2011 19:38, Mike Gran writes:
> So, if have a CGI script where the stdout could have one
> a couple of different encodings based on a web client's language
> preference settings, but, where the CGI program is running in a "C"
> or "en_US.utf8" locale, this might count.
Th
Hi!
Mark H Weaver writes:
> (string-upcase "Straße") => "STRAßE" (should be "STRASSE")
> (string-downcase "ΧΑΟΣΣ")=> "χαοσσ" (should be "χαoσς")
> (string-downcase "ΧΑΟΣ Σ") => "χαοσ σ" (should be "χαoς σ")
> (string-ci=? "Straße" "Strasse") => #f(should
> From:Andy Wingo
>
> Hi Mike,
>
> I'm looking at changing to use the helper "locale_charset()"
> function
> from libunistring in the scm_to_locale_string and scm_from_locale_string
> functions. It seems like that's more correct than snarfing through the
> current input/output ports.
>
> Like
> From:Ludovic Courtès
> >> Can we first check what would need to be done to fix this in 2.0.x?
> >>
> >> At first glance:
> >>
> >> - “Straße” is normally stored as a Latin1 string, so it would need to
> >> be converted to UTF-* before it can be passed to one of the
> >> unicase.h fun
l...@gnu.org (Ludovic Courtès) writes:
>> We keep wide (UTF-32) stringbufs as-is, but we change narrow stringbufs
>> to UTF-8, along with a flag that indicates whether it is known to be
>> ASCII-only.
>
> The whole point of the narrow/wide distinction was to avoid
> variable-width encodings. In ad
On Sun 06 Mar 2011 20:52, Clinton Ebadi writes:
> While debugging[0] an issue with Bobot++ (poor sneek!) aborting after
> calling scm_regexp_exec on any utf-8 strings I eventually realized
> that... the string was actually single-byte encoded internally. After
> taking that down the wrong path I
Hi Mark,
Mark H Weaver writes:
> I have a compromise proposal, which could be implemented for 2.0.x:
>
> We keep wide (UTF-32) stringbufs as-is, but we change narrow stringbufs
> to UTF-8, along with a flag that indicates whether it is known to be
> ASCII-only.
The whole point of the narrow/wid
Hi Mike,
I'm looking at changing to use the helper "locale_charset()" function
from libunistring in the scm_to_locale_string and scm_from_locale_string
functions. It seems like that's more correct than snarfing through the
current input/output ports.
Likewise I'll just use the scm_i_get_conversi
I have a compromise proposal, which could be implemented for 2.0.x:
We keep wide (UTF-32) stringbufs as-is, but we change narrow stringbufs
to UTF-8, along with a flag that indicates whether it is known to be
ASCII-only.
Applying string-ref or string-set! to a narrow stringbuf would upgrade
it to
Hi Wolfgang,
> (I have "my own" scheme to play with, written in LISP, recently enhanced
> by adding "call-with-prompt". Still trying to figure out all of its
> implications ...)
Hey, me too... it's nice to have company in that regard :)
You mentioned a number of wishlist items as well, that I
13 matches
Mail list logo