Re: [Chicken-users] UTF-8 support in eggs

2014-07-10 Thread John Cowan
Alex Shinn scripsit: The clean way to handle this is to duplicate the useful string APIs for bytevectors. This could be done without code duplication with the use of functors, though compiler assistance may be needed for efficiency (e.g. for inlined procedures). Even without code

Re: [Chicken-users] UTF-8 support in eggs

2014-07-10 Thread John Cowan
Alex Shinn scripsit: Hmmm... that's upsetting. Python 3 is a notorious dead-end language. That's a premature judgment (and rather Google-centric). Since Python 3.0 was DOA, the intended five-year transition plan to Python 3 by default ended only a month ago (Python 3.1 was released in June

Re: [Chicken-users] UTF-8 support in eggs

2014-07-10 Thread John Cowan
Yaroslav Tsarko scripsit: Why don`t just add *(use utf8)* line explicitly to all the eggs that handle strings? That will ultimately fix the problem and will clearly indicate that egg performs string manipulations and is capable of handling UTF-8 encoding. That will certainly work, but has an

Re: [Chicken-users] UTF-8 support in eggs

2014-07-10 Thread Yaroslav Tsarko
On 09.07.2014 02:15, Oleg Kolosov wrote: IMO just enable utf8 by default and let them break. Is it's not 80's anymore, latin1 only software should die. + 1. For specific use cases (backward compatibility, logging or minimizing runtime size for example) it should be possible to disable

Re: [Chicken-users] UTF-8 support in eggs

2014-07-10 Thread Michele La Monaca
On Wed, Jul 9, 2014 at 7:00 AM, Alex Shinn alexsh...@gmail.com wrote: However, I don't think that's the real problem. The issue as I understand is that although Chicken has both strings and bytevectors in the core, historically and for continued simplicity strings are abused as bytevectors

Re: [Chicken-users] UTF-8 support in eggs

2014-07-10 Thread Oleg Kolosov
On 07/09/14 09:00, Alex Shinn wrote: However, I don't think that's the real problem. The issue as I understand is that although Chicken has both strings and bytevectors in the core, historically and for continued simplicity strings are abused as bytevectors in many cases. ... And this is a

Re: [Chicken-users] UTF-8 support in eggs

2014-07-10 Thread Alex Shinn
On Fri, Jul 11, 2014 at 7:20 AM, Oleg Kolosov bazur...@gmail.com wrote: On 07/09/14 09:00, Alex Shinn wrote: The clean way to handle this is to duplicate the useful string APIs for bytevectors. This could be done without code duplication with the use of functors, though compiler

Re: [Chicken-users] UTF-8 support in eggs

2014-07-10 Thread Alex Shinn
On Fri, Jul 11, 2014 at 6:53 AM, Michele La Monaca mikele.chic...@lamonaca.net wrote: Wouldn't be simpler and more effective this other path? 1) keep current string functions as they are (i.e. byte-oriented) and keep abusers abusing (and happy) 2) provide new utf8/cursor-oriented functions

Re: [Chicken-users] UTF-8 support in eggs

2014-07-09 Thread Alex Shinn
On Wed, Jul 9, 2014 at 7:15 AM, Oleg Kolosov bazur...@gmail.com wrote: IMO just enable utf8 by default and let them break. Is it's not 80's anymore, latin1 only software should die. I agree that if people want latin1 only there should at best be a compiler option for this which is disabled

Re: [Chicken-users] UTF-8 support in eggs

2014-07-08 Thread Mario Domenech Goulart
Hi, On Tue, 08 Jul 2014 08:57:43 +0400 Yaroslav Tsarko eriktsa...@googlemail.com wrote: Why don`t just add (use utf8) line explicitly to all the eggs that handle strings? That will ultimately fix the problem and will clearly indicate that egg performs string manipulations and is capable of

Re: [Chicken-users] UTF-8 support in eggs

2014-07-08 Thread Yaroslav Tsarko
Hi, On 08.07.2014 16:40, Mario Domenech Goulart wrote: On the other hand, we risk breaking eggs that operate on latin1 text. UTF-8 support may also affect performance-sensitive code. Best wishes. Mario Isn`t UTF-8 backward-compatible with Latin-1 and ASCII encodings? AFAIR UTF-8 is the

Re: [Chicken-users] UTF-8 support in eggs

2014-07-08 Thread Mario Domenech Goulart
Hi Alex, On Tue, 8 Jul 2014 12:42:21 +0900 Alex Shinn alexsh...@gmail.com wrote: On Tue, Jul 8, 2014 at 5:58 AM, Mario Domenech Goulart mario.goul...@gmail.com wrote: I want to use some eggs and I need them to handle UTF-8. By handle UTF-8 I mean treat strings as UTF-8, so that

Re: [Chicken-users] UTF-8 support in eggs

2014-07-08 Thread Mario Domenech Goulart
On Tue, 08 Jul 2014 17:27:27 +0400 Yaroslav Tsarko eriktsa...@googlemail.com wrote: On 08.07.2014 16:40, Mario Domenech Goulart wrote: On the other hand, we risk breaking eggs that operate on latin1 text. UTF-8 support may also affect performance-sensitive code. Isn`t UTF-8

Re: [Chicken-users] UTF-8 support in eggs

2014-07-08 Thread Yaroslav Tsarko
Hi, On 08.07.2014 18:03, Mario Domenech Goulart wrote: They are compatible only in the 7-bit ASCII range. The remaining bit in the byte makes the whole difference. :-) In UTF-8 it means either here's your 8-bit character or look at the next byte. In latin1 it always means here's your 8-bit

Re: [Chicken-users] UTF-8 support in eggs

2014-07-08 Thread Oleg Kolosov
On 07/08/14 16:40, Mario Domenech Goulart wrote: Hi, On Tue, 08 Jul 2014 08:57:43 +0400 Yaroslav Tsarko eriktsa...@googlemail.com wrote: Why don`t just add (use utf8) line explicitly to all the eggs that handle strings? That will ultimately fix the problem and will clearly indicate that

Re: [Chicken-users] UTF-8 support in eggs

2014-07-08 Thread Alex Shinn
On Tue, Jul 8, 2014 at 11:00 PM, Mario Domenech Goulart mario.goul...@gmail.com wrote: On Tue, 8 Jul 2014 12:42:21 +0900 Alex Shinn alexsh...@gmail.com wrote: 4. Make affected eggs functors on the set of basic string operations. Wouldn't 4 be an implementation method of 2? Yes. --

Re: [Chicken-users] UTF-8 support in eggs

2014-07-07 Thread Alex Shinn
Hi, On Tue, Jul 8, 2014 at 5:58 AM, Mario Domenech Goulart mario.goul...@gmail.com wrote: Hi, I want to use some eggs and I need them to handle UTF-8. By handle UTF-8 I mean treat strings as UTF-8, so that (string (string-ref ç 0)) = ç for example. CHICKEN's string-related

Re: [Chicken-users] UTF-8 support in eggs

2014-07-07 Thread John Cowan
Alex Shinn scripsit: On Tue, Jul 8, 2014 at 5:58 AM, Mario Domenech Goulart mario.goul...@gmail.com wrote: It might help the discussion if we had a list of eggs which are known to break on UTF-8 inputs. Indeed. 1. Have egg and egg-utf8 variants. Or, more generally, egg and

Re: [Chicken-users] UTF-8 support in eggs

2014-07-07 Thread Alex Shinn
On Tue, Jul 8, 2014 at 12:59 PM, John Cowan co...@mercury.ccil.org wrote: The same approaches also apply to eggs needing the full numeric tower, though with UTF-8 there's less chance of breakage when mixing eggs which do and don't use the utf8 egg. I would say that UTF-8 has *more*

Re: [Chicken-users] UTF-8 support in eggs

2014-07-07 Thread Yaroslav Tsarko
Hi, On 08.07.2014 00:58, Mario Domenech Goulart wrote: To properly handle UTF-8, we have the utf8 egg. If I understand correctly, the only way for eggs to properly support UTF-8 is by using the utf8 egg (or an equivalent implementation). Best wishes. Mario Why don`t just add *(use utf8)*