Wide strings and the VM

2009-07-26 Thread Mike Gran
Hi- I wonder if I could get someone to check out what I've done with wide strings and the VM on the string_abstraction2 branch with commit efb042... That code is quite confusing, and I may have made a mess of it. And, I wasn't sure about alignment. I daresay that this Unicode string stuff

Re: Wide strings status

2009-04-22 Thread Ludovic Courtès
Hello! Mike Gran spk...@yahoo.com writes: On Tue, 2009-04-21 at 23:37 +0200, Ludovic Courtès wrote: You seem to imply that `scm_getc ()' will now return a Unicode codepoint, is that right? What about `scm_c_{read,write} ()', and `scm_{get,put}s ()'? I vacillate on this, but, I think

Re: Wide strings status

2009-04-21 Thread Ludovic Courtès
Hello! Mike Gran spk...@yahoo.com writes: Strings are internally encoded either as narrow 8-bit ISO-8859-1 strings or as wide UTF-32 strings. Strings are usually created as narrow strings. Narrow strings get automatically widened to wide strings if non-8-bit characters are set! or appended

Re: Wide strings status

2009-04-21 Thread Mike Gran
that are unclear to me now. Wide strings are currently getting truncated to 8-bit somewhere in there. The compiler could use bytevectors when dealing with bytecode. Maybe that would clarify things. On those issues, I'll have to concede to the wisdom of others. I'll do what I can with the C code

Wide strings status

2009-04-20 Thread Mike Gran
. Strings are internally encoded either as narrow 8-bit ISO-8859-1 strings or as wide UTF-32 strings. Strings are usually created as narrow strings. Narrow strings get automatically widened to wide strings if non-8-bit characters are set! or appended to them. Outside of the core strings module and srfi

Re: Wide strings

2009-01-29 Thread Neil Jerram
seem to think so too. Yes, as far as I'm concerned. I know you're probably more knowledgeable than I am on this issue and I'm confident. For the record, I'm happy too - in fact I'm excited that Guile is finally going to have wide strings. Technically I think I'm less of an expert here than

Re: Wide strings

2009-01-28 Thread Mike Gran
Hi, Let's say that one possible goal is to add wide strings * using Gnulib functions * with minimal changes to the public Guile API * where chars become 4-byte codepoints and strings are internally either UTF-32 or ISO-8859-1 Since I need this functionality taken care of, and since I have

Re: Wide strings

2009-01-28 Thread Andy Wingo
Hi, On Wed 28 Jan 2009 17:44, Mike Gran spk...@yahoo.com writes: Since I need this functionality taken care of, and since I have some time to play with it, what's the procedure here? The best thing IMO would be to hack on it on a Git branch, with small and correct patches. We could get you

Re: Wide strings

2009-01-28 Thread Clinton Ebadi
Mike Gran spk...@yahoo.com writes: Hi, Let's say that one possible goal is to add wide strings * using Gnulib functions * with minimal changes to the public Guile API * where chars become 4-byte codepoints and strings are internally either UTF-32 or ISO-8859-1 Since I need

Re: Wide strings

2009-01-28 Thread Ludovic Courtès
Hello, Clinton Ebadi clin...@unknownlamer.org writes: The `scm_{to|from}_locale_string' functions provide enough abstraction to make this doable without breaking anything that doesn't use `scm_take_locale_string' (and even then Guile can detect when the locale is not UCS-4, revert to

Re: Wide strings

2009-01-27 Thread Andy Wingo
On Tue 27 Jan 2009 06:52, Mike Gran spk...@yahoo.com writes: I said (Though, such a scheme would force scm_take_locale_string to become scm_take_iso88591_string.) which is incorrect. Under the proposed scheme, scm_take_locale_string would only be able to use that storage directly if it

Re: Wide strings

2009-01-27 Thread Ludovic Courtès
Hi! Mike Gran spk...@yahoo.com writes: Gnulib works for me. Bruno is the maintainer of those funcs, so I'm sure they work great. Good! So really the first questions to answer are the encoding question and whether the R6RS string API is the goal. SRFI-1[34] (i.e., status quo in terms of

Re: Wide strings

2009-01-26 Thread Mike Gran
Ludo sez, Mike sez, 1. IMO it'd be nice to have ASCII strings special-cased so that they are always encoded in ASCII. This would allow for memory savings since, e.g., most symbols are expected to contain only ASCII characters. It might also simplify interaction with C in

Re: Wide strings

2009-01-26 Thread Mike Gran
by providing a specific API. Fair enough. With wide strings in place, this could all be done in pure Scheme anyway, and end up in some library. I brought it up really just to note the codepoint / grapheme problem. [...] We need to look more closely at what Gnulib has to offer, IMO. Gnulib

Re: Wide strings

2009-01-26 Thread Mike Gran
I said (Though, such a scheme would force scm_take_locale_string to become scm_take_iso88591_string.) which is incorrect. Under the proposed scheme, scm_take_locale_string would only be able to use that storage directly if it happened to be ASCII or 8859-1.

Wide strings

2009-01-25 Thread Mike Gran
Hi.  I know there has been a lot of talk about wide characters and Unicode over the years.  I'd like to see it happen because how the are implemented will determine the future of a couple of my side-projects. I could pitch in, if you needed some help. I looked over the history of guile-devel, and

Re: Wide strings

2009-01-25 Thread Ludovic Courtès
Hello! Mike Gran spk...@yahoo.com writes: Hi.  I know there has been a lot of talk about wide characters and Unicode over the years.  I'd like to see it happen because how the are implemented will determine the future of a couple of my side-projects. I could pitch in, if you needed some

Re: Wide strings

2009-01-25 Thread Neil Jerram
2009/1/25 Ludovic Courtès l...@gnu.org: I agree it would be really nice to have Unicode support, but I'm not aware of any plan, so please go ahead! :-) Indeed. A few considerations regarding the inevitable debate about the internal string representation: [...] But what about the other

Re: Wide strings

2009-01-25 Thread Mike Gran
of deciding which actively developed library of wide character functions is to be used and how to integrate it. There are 3 good, actively developed solutions of which I am aware. 1.  Use GNU libc functionality.  Encode wide strings as wchar_t. 2.  Use GLib functionality.  Encode wide strings as UTF