Re: Collation support (minimal)

2013-03-22 Thread Vladimir Támara Patiño

On Sat, Mar 09, 2013 at 02:45:00PM +0100, Stefan Sperling wrote:

I'd like to concentrate on collation. To properly review your diff,
I need to understand what needs to be added to OpenBSD' locale implementation.


I hope the attached patch helps.


Instead, you should consider OpenBSD's code as your starting point.
And add to that _only_ what needs to be done to support collation.


Not so easy, because the implementation of collations that I know is
the one of FreeBSD (by the way mklocale and the ctype catalogues come from 
NetBSD, as well as part of the libc implementation --both from citrus). 
In future would be nice to implement collation in UTF-8 complete, but at 
least here something for single-byte encondings and UTF-8 translatable 
to ISO8859-1.


By the way to support UTF-8 translatable to ISO8859-1 I changed the
functions in collate.c with respect to the implementation of FreeBSD.
I don't have a FreeBSD at hand, but I guess it doesn't support
collations in locales like es_CO.UTF-8, while the attached
implementation does.


Adding new functions like strxfrm_l() etc. just doesn't make sense to me.
You shouldn't need to add any new functions. We already have strcoll()
and strxfrm(). Those are interfaces from C89, so they're a bit dated.
But they should be good enough for getting some basic collation support
going. We can talk about adding support for newer interfaces later, but
those should not matter for the initial implementation.


Those functions were removed (but I left the credit of the 
implementations that I used).



You have other unrelated changes in there, such as shuffling of MLINKS lists.
Those add noise to the diff and make it harder to review.


Other changes I sent in separate emails (including man pages for
wcscoll and wcsxfrm).

If there is interest in xlocale support after this or after having
other kind of collation support in OpenBSD, please let me know.  I 
also want to advance in other LC_* implementations.


Best regards.
--
Dios, gracias por tu amor infinito.
--  
 Vladimir Támara Patiño.  http://vtamara.pasosdeJesus.org/

 http://www.pasosdejesus.org/dominio_publico_colombia.html

diff -ruN -x obj -x CVS -x *~ src53orig/lib/libc/locale/Makefile.inc 
src/lib/libc/locale/Makefile.inc
--- src53orig/lib/libc/locale/Makefile.inc  Sat Mar  9 11:16:47 2013
+++ src/lib/libc/locale/Makefile.incTue Mar 19 05:52:30 2013
@@ -9,7 +9,7 @@
wcstombs.c wctob.c wctomb.c wcstof.c wcstod.c wcstold.c wcstol.c \
wcstoul.c wcstoll.c wcstoull.c wcstoimax.c wcstoumax.c \
setrunelocale.c runeglue.c rune.c runetable.c ___runetype_mb.c \
-   _wctrans.c wcsxfrm.c
+   _wctrans.c wcsxfrm.c collate.c
 
 MAN+=  nl_langinfo.3 setlocale.3 iswalnum.3 towlower.3 \
btowc.3 mblen.3 mbrlen.3 mbrtowc.3 mbsinit.3 mbsrtowcs.3 \
diff -ruN -x obj -x CVS -x *~ src53orig/lib/libc/locale/collate.c 
src/lib/libc/locale/collate.c
--- src53orig/lib/libc/locale/collate.c Wed Dec 31 19:00:00 1969
+++ src/lib/libc/locale/collate.c   Mon Mar 18 07:24:12 2013
@@ -0,0 +1,430 @@
+/*-
+ * Copyright (c) 1995 Alex Tatmanjants 
+ * at Electronni Visti IA, Kiev, Ukraine.
+ * All rights reserved.
+ *
+ * Copyright (c) 2011 The FreeBSD Foundation
+ * All rights reserved.
+ * Portions of this software were developed by David Chisnall
+ * under sponsorship from the FreeBSD Foundation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+/**
+ * Public domain according to Colombian Legislation. 
+ * http://www.pasosdejesus.org/dominio_publico_colombia.html
+ * 2013. vtam...@pasosdejesus.org.
+ */
+
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "rune.h"
+#include "runetype.

Re: Collation support (minimal)

2013-03-09 Thread Stefan Sperling
On Fri, Mar 08, 2013 at 08:10:23AM -0500, Vladimir Támara Patiño wrote:
> 
> Following the advice of Stefan Sperling (see
> http://marc.info/?l=openbsd-tech&m=136029207723238&w=4 )
> the attached patch is minimal to support collation in
> OpenBSD-current it works with ISO-8859-1 and UTF-8 translatable
> to ISO-8859-1.   It includes:
> * Program colldef to generate collation tables  (for single-byte
>  encodings)
> * Collation tables for spanish to be generated  in
> /usr/share/locale. * Implementation of strcoll, wcscoll, strxfrm and
> wcsxfrm having into account that these functions are particular
> cases of strcoll_l, wcscoll_l, strxfrm_l and wcsxfrm_l from xlocale
> and that in near future possibly OpenBSD will implement xlocale
> * Small changes to documentation * Regression test (test spanish
> with ISO-8859-1 and specially UTF-8)
> 
> The implementation is based on the one I sent before
> ( http://marc.info/?l=openbsd-tech&m=136050321323182&w=4 and
> http://marc.info/?l=openbsd-tech&m=136034304709498&w=4 ), that in
> turn is based on FreeBSD implementation.

Thanks! This looks a bit better but is still too large for my taste.

I'd like to concentrate on collation. To properly review your diff,
I need to understand what needs to be added to OpenBSD' locale implementation.
But I cannot clearly see that from reading your diff. There is still so much
unrelated stuff in there. I'll cross-check other sources of information such
as POSIX as additional references, of course. But first I need to understand
what your new code really wants to do.

It looks like your approach to adding collation support to OpenBSD is
to take FreeBSD's code as a starting point and fudge that code into
OpenBSD's libc. This approach reduces the quality of the code you end
up with and makes it hard to review your changes.

Instead, you should consider OpenBSD's code as your starting point.
And add to that _only_ what needs to be done to support collation.

Adding new functions like strxfrm_l() etc. just doesn't make sense to me.
You shouldn't need to add any new functions. We already have strcoll()
and strxfrm(). Those are interfaces from C89, so they're a bit dated.
But they should be good enough for getting some basic collation support
going. We can talk about adding support for newer interfaces later, but
those should not matter for the initial implementation.

If it helps, consider writing a new, clean, and small collation
implementation for our libc, using C89 interfaces. Instead of porting
over FreeBSD's implementation. If you can reuse bits from FreeBSD for your
own implementation, fine (just don't forget to copy licence information).
But adjust whatever FreeBSD code you use to fit our libc's model.
Don't adjust our libc to fit FreeBSD's model.

You have other unrelated changes in there, such as shuffling of MLINKS lists.
Those add noise to the diff and make it harder to review.