Hi-
I wonder if I could get someone to check out what I've done with wide
strings and the VM on the string_abstraction2 branch with commit
efb042...
That code is quite confusing, and I may have made a mess of it. And, I
wasn't sure about alignment.
I daresay that this Unicode string stuff
Hello!
Mike Gran spk...@yahoo.com writes:
On Tue, 2009-04-21 at 23:37 +0200, Ludovic Courtès wrote:
You seem to imply that `scm_getc ()' will now return a Unicode
codepoint, is that right? What about `scm_c_{read,write} ()', and
`scm_{get,put}s ()'?
I vacillate on this, but, I think
Hello!
Mike Gran spk...@yahoo.com writes:
Strings are internally encoded either as narrow 8-bit ISO-8859-1
strings or as wide UTF-32 strings. Strings are usually created as
narrow strings. Narrow strings get automatically widened to wide
strings if non-8-bit characters are set! or appended
that are unclear to me now. Wide strings are
currently getting truncated to 8-bit somewhere in there.
The compiler could use bytevectors when dealing with bytecode. Maybe
that would clarify things.
On those issues, I'll have to concede to the wisdom of others. I'll do
what I can with the C code
.
Strings are internally encoded either as narrow 8-bit ISO-8859-1
strings or as wide UTF-32 strings. Strings are usually created as
narrow strings. Narrow strings get automatically widened to wide
strings if non-8-bit characters are set! or appended to them.
Outside of the core strings module and srfi
seem to think so too.
Yes, as far as I'm concerned. I know you're probably more knowledgeable
than I am on this issue and I'm confident.
For the record, I'm happy too - in fact I'm excited that Guile is
finally going to have wide strings. Technically I think I'm less of
an expert here than
Hi,
Let's say that one possible goal is to add wide strings
* using Gnulib functions
* with minimal changes to the public Guile API
* where chars become 4-byte codepoints and strings are internally
either UTF-32 or ISO-8859-1
Since I need this functionality taken care of, and since I have
Hi,
On Wed 28 Jan 2009 17:44, Mike Gran spk...@yahoo.com writes:
Since I need this functionality taken care of, and since I have some
time to play with it, what's the procedure here?
The best thing IMO would be to hack on it on a Git branch, with small
and correct patches. We could get you
Mike Gran spk...@yahoo.com writes:
Hi,
Let's say that one possible goal is to add wide strings
* using Gnulib functions
* with minimal changes to the public Guile API
* where chars become 4-byte codepoints and strings are internally
either UTF-32 or ISO-8859-1
Since I need
Hello,
Clinton Ebadi clin...@unknownlamer.org writes:
The `scm_{to|from}_locale_string' functions provide enough abstraction
to make this doable without breaking anything that doesn't use
`scm_take_locale_string' (and even then Guile can detect when the locale
is not UCS-4, revert to
On Tue 27 Jan 2009 06:52, Mike Gran spk...@yahoo.com writes:
I said
(Though, such a scheme would force scm_take_locale_string to become
scm_take_iso88591_string.)
which is incorrect. Under the proposed scheme, scm_take_locale_string
would only be able to use that storage directly if it
Hi!
Mike Gran spk...@yahoo.com writes:
Gnulib works for me. Bruno is the maintainer of those funcs, so I'm
sure they work great.
Good!
So really the first questions to answer are the encoding question and
whether the R6RS string API is the goal.
SRFI-1[34] (i.e., status quo in terms of
Ludo sez,
Mike sez,
1. IMO it'd be nice to have ASCII strings special-cased so that they
are always encoded in ASCII. This would allow for memory savings
since, e.g., most symbols are expected to contain only ASCII
characters. It might also simplify interaction with C in
by providing a specific API.
Fair enough. With wide strings in place, this could all be done in
pure Scheme anyway, and end up in some library. I brought it up
really just to note the codepoint / grapheme problem.
[...] We need to look more closely at what Gnulib has to offer, IMO.
Gnulib
I said
(Though, such a scheme would force scm_take_locale_string to become
scm_take_iso88591_string.)
which is incorrect. Under the proposed scheme, scm_take_locale_string
would only be able to use that storage directly if it happened to be
ASCII or 8859-1.
Hi. I know there has been a lot of talk about wide characters and
Unicode over the years. I'd like to see it happen because how the are
implemented will determine the future of a couple of my side-projects.
I could pitch in, if you needed some help.
I looked over the history of guile-devel, and
Hello!
Mike Gran spk...@yahoo.com writes:
Hi. I know there has been a lot of talk about wide characters and
Unicode over the years. I'd like to see it happen because how the are
implemented will determine the future of a couple of my side-projects.
I could pitch in, if you needed some
2009/1/25 Ludovic Courtès l...@gnu.org:
I agree it would be really nice to have Unicode support, but I'm not
aware of any plan, so please go ahead! :-)
Indeed.
A few considerations regarding the inevitable debate about the internal
string representation:
[...]
But what about the other
of deciding which actively developed library of wide character
functions is to be used and how to integrate it.
There are 3 good, actively developed solutions of which I am aware.
1. Use GNU libc functionality. Encode wide strings as wchar_t.
2. Use GLib functionality. Encode wide strings as UTF
19 matches
Mail list logo