Re: Wide strings status

2009-04-22 Thread Ludovic Courtès
Hello! Mike Gran spk...@yahoo.com writes: On Tue, 2009-04-21 at 23:37 +0200, Ludovic Courtès wrote: You seem to imply that `scm_getc ()' will now return a Unicode codepoint, is that right? What about `scm_c_{read,write} ()', and `scm_{get,put}s ()'? I vacillate on this, but, I think

Re: Wide strings status

2009-04-21 Thread Ludovic Courtès
Hello! Mike Gran spk...@yahoo.com writes: Strings are internally encoded either as narrow 8-bit ISO-8859-1 strings or as wide UTF-32 strings. Strings are usually created as narrow strings. Narrow strings get automatically widened to wide strings if non-8-bit characters are set! or appended

Re: Wide strings status

2009-04-21 Thread Mike Gran
On Tue, 2009-04-21 at 23:37 +0200, Ludovic Courtès wrote: This is all going to be slower than before because of the string conversion operations, but, I didn't want to do any premature optimization. First, I wanted to get it working, but, there is plenty of room for optimization later.

Re: Wide strings

2009-01-29 Thread Neil Jerram
l...@gnu.org (Ludovic Courtès) writes: Do we need to talk more about what needs to be accomplished? Do we need a complete specification? Do we need a vote on if it is a good idea? I think you're going in the right direction. More importantly, although I can't speak for them, Neil and Ludo

Re: Wide strings

2009-01-28 Thread Mike Gran
Hi, Let's say that one possible goal is to add wide strings * using Gnulib functions * with minimal changes to the public Guile API * where chars become 4-byte codepoints and strings are internally either UTF-32 or ISO-8859-1 Since I need this functionality taken care of, and since I have

Re: Wide strings

2009-01-28 Thread Andy Wingo
Hi, On Wed 28 Jan 2009 17:44, Mike Gran spk...@yahoo.com writes: Since I need this functionality taken care of, and since I have some time to play with it, what's the procedure here? The best thing IMO would be to hack on it on a Git branch, with small and correct patches. We could get you

Re: Wide strings

2009-01-28 Thread Clinton Ebadi
Mike Gran spk...@yahoo.com writes: Hi, Let's say that one possible goal is to add wide strings * using Gnulib functions * with minimal changes to the public Guile API * where chars become 4-byte codepoints and strings are internally either UTF-32 or ISO-8859-1 Since I need this

Re: Wide strings

2009-01-28 Thread Ludovic Courtès
Hello, Clinton Ebadi clin...@unknownlamer.org writes: The `scm_{to|from}_locale_string' functions provide enough abstraction to make this doable without breaking anything that doesn't use `scm_take_locale_string' (and even then Guile can detect when the locale is not UCS-4, revert to

Re: Wide strings

2009-01-27 Thread Andy Wingo
On Tue 27 Jan 2009 06:52, Mike Gran spk...@yahoo.com writes: I said (Though, such a scheme would force scm_take_locale_string to become scm_take_iso88591_string.) which is incorrect. Under the proposed scheme, scm_take_locale_string would only be able to use that storage directly if it

Re: Wide strings

2009-01-27 Thread Ludovic Courtès
Hi! Mike Gran spk...@yahoo.com writes: Gnulib works for me. Bruno is the maintainer of those funcs, so I'm sure they work great. Good! So really the first questions to answer are the encoding question and whether the R6RS string API is the goal. SRFI-1[34] (i.e., status quo in terms of

Re: Wide strings

2009-01-26 Thread Mike Gran
Ludo sez, Mike sez, 1. IMO it'd be nice to have ASCII strings special-cased so that they are always encoded in ASCII. This would allow for memory savings since, e.g., most symbols are expected to contain only ASCII characters. It might also simplify interaction with C in

Re: Wide strings

2009-01-26 Thread Mike Gran
Hello, Ludo' sez Mike Gran spk...@yahoo.com writes: BTW, Gnulib has a wealth of modules that could be helpful here: http://www.gnu.org/software/gnulib/MODULES.html#posix_ext_unicode I used a few of them in Guile-R6RS-Libs to implement `string-utf8' and such like. The Gnulib routines

Re: Wide strings

2009-01-26 Thread Mike Gran
I said (Though, such a scheme would force scm_take_locale_string to become scm_take_iso88591_string.) which is incorrect. Under the proposed scheme, scm_take_locale_string would only be able to use that storage directly if it happened to be ASCII or 8859-1.

Re: Wide strings

2009-01-25 Thread Ludovic Courtès
Hello! Mike Gran spk...@yahoo.com writes: Hi.  I know there has been a lot of talk about wide characters and Unicode over the years.  I'd like to see it happen because how the are implemented will determine the future of a couple of my side-projects. I could pitch in, if you needed some

Re: Wide strings

2009-01-25 Thread Neil Jerram
2009/1/25 Ludovic Courtès l...@gnu.org: I agree it would be really nice to have Unicode support, but I'm not aware of any plan, so please go ahead! :-) Indeed. A few considerations regarding the inevitable debate about the internal string representation: [...] But what about the other

Re: Wide strings

2009-01-25 Thread Mike Gran
From: Ludovic Courtès l...@gnu.org I believe that we should aim for R6RS strings. I think the most important thing is to have humility in the face of an impossible problem: how to encode all textual information.  It is important to stand on the shoulders of giants here.  It becomes a matter of