Re: svn commit: r202572 - head/lib/libc/gen

2010-01-21 Thread Andrey Chernov
On Thu, Jan 21, 2010 at 04:25:53PM +1100, Bruce Evans wrote:
  To be used in practice, strcoll() should never fails, doing fallback to
  strcmp() instead, not only in that, but in lots of other cases too (it may
  set errno like EILSEQ, but not fails). The next important thing is to
  return 0 only for true binary equals, additionaly ranking (f.e. by
  strcmp()) anything inside classes of equality to stabilize result.
 
  I hope our strcoll() will be kept in that state after implementing
  UCA too.
 
 What is UCA?

http://unicode.org/reports/tr10/

 Can it return equal for non-binary-equal strings?  I think it can -- the
 locale might have different encodings for strings that are considered
 identical.  Then duplicates should be according to strcoll() and file
 systems would have a hard time managing such duplicates when they are
 created in a locale where they are non-duplicates.

It can, but it isn't convenient. It depends of how equality classes are 
treated. If strings belongs to such class, they have the same weight 
comparing with other strings, but additional ranking inside class is 
possible.

-- 
http://ache.pp.ru/
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Unicode collation [Was: Re: svn commit: r202572 - head/lib/libc/gen]

2010-01-21 Thread Gabor Kovesdan

El 2010. 01. 21. 12:57, Andrey Chernov escribió:

On Thu, Jan 21, 2010 at 04:25:53PM +1100, Bruce Evans wrote:
   

To be used in practice, strcoll() should never fails, doing fallback to
strcmp() instead, not only in that, but in lots of other cases too (it may
set errno like EILSEQ, but not fails). The next important thing is to
return 0 only for true binary equals, additionaly ranking (f.e. by
strcmp()) anything inside classes of equality to stabilize result.

I hope our strcoll() will be kept in that state after implementing
UCA too.
   

What is UCA?
 

http://unicode.org/reports/tr10/
   
IIRC, there was a SoC student working on collation. Do we know something 
about him and the status of that project?


Cheers,

--
Gabor Kovesdan
FreeBSD Volunteer

EMAIL: ga...@freebsd.org .:|:. ga...@kovesdan.org
WEB:   http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org

___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r202572 - head/lib/libc/gen

2010-01-20 Thread Bruce Evans

On Wed, 20 Jan 2010, Andrey Chernov wrote:


On Wed, Jan 20, 2010 at 01:42:08AM +1100, Bruce Evans wrote:

The comment was correct.  It says that POSIX requires strcoll() for
alphasort(), not for opendir().  Since opendir() is not alphasort(),
and it wants plain ASCII sorting to support union file systems, it
intentionally doesn't use either alphasort() or strcoll().


Yes, the comment _alone_ was correct, but its place - isn't. Along with
function name containing _alphasort part it makes impression that
opendir() uses this type of sort too.


No, it is a comment about opendir()'s comparison function.  It has nothing
to do with scandir(), and the only thing that it has to do with alphasort()
is that it must be different for the reasons described.


BTW, we already have the same correct comment but in the proper place in
the scandir.c


That one is quite different.  It describes why alphasort() (now) uses
strcoll().  It is because POSIX says so.  This comment is relatively
useless.  It obviously uses strcoll(), and it is a POSIX interface so
this would be surprising only if it conflicted with POSIX.  The comment
is there mainly for historical reasons, and history belongs in the man
page more than here.  BTW, I don't remember any man page updates for
this.  The man page still only says that alphasort() can be used to
give alphabetical sorting in scandir().


Was correct, but it could have been clearer by saying , so opendir()
uses this comparison function instead of alphasort().


So, what? The two mentioned things are unrelated and can't be
concatenated by so.


They are related.


I forget what the old name was.  Having alphasort in the name here was
wrong 3 layers deep, since this is not alphasort(), and alphasort() is not
an alpha sorting function -- it is a lexicographically-on-the-whole-
character-set comparison function.


Yes.


Correct modulo the name.


What name you suggest, opendir_compar()?


OK.


New bug in a comment in scandir(): now has an extra blank line, due to
partial removal.


Ok, will remove it a bit later.


I can't see this now (some illusion from my mailer or $TERMCAP
misformatting the patch?), but now I see an extra the in it:

requires the alphasort() to use strcoll()

should be either

requires that alphasort() uses strcoll()

(preferred) or

requires alphasort() to use strcoll()

(probably intended, but not too passive).  I thought that you removed
this line completely.  The previous line is even less useful.

Bruce
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r202572 - head/lib/libc/gen

2010-01-20 Thread Andrey Chernov
On Wed, Jan 20, 2010 at 07:43:29PM +1100, Bruce Evans wrote:
 No, it is a comment about opendir()'s comparison function.  It has nothing
 to do with scandir(), and the only thing that it has to do with alphasort()
 is that it must be different for the reasons described.

Then the comment was plain wrong (not misplaced), so removing it becomes 
right again because the comment states: opendir()'s comparison function 
according to POSIX 2008 and XSI 7 should use strcoll(). But there is 
nothing said about opendir()  strcoll() relation in the mentioned 
standards. The only word I found is that opendir() returns ordered 
sequence, but nowhere mentioned ordered by what criteria, so perhaps they 
mean stable:

The type DIR, which is defined in the dirent.h header, represents a 
directory stream, which is an ordered sequence of all the directory 
entries in a particular directory.

 page more than here.  BTW, I don't remember any man page updates for
 this.  The man page still only says that alphasort() can be used to
 give alphabetical sorting in scandir(). 

Alphabetically already means sorted according to collate, otherwhise it is 
called binary. Perhaps manpage should refer strcoll() directly.

 I can't see this now (some illusion from my mailer or $TERMCAP
 misformatting the patch?), but now I see an extra the in it:
 
  requires the alphasort() to use strcoll()
 
 should be either
 
  requires that alphasort() uses strcoll()
 
 (preferred) or
 
  requires alphasort() to use strcoll()
 
 (probably intended, but not too passive).  I thought that you removed
 this line completely.  The previous line is even less useful.

I don't add extra the there) What do you want, clarify please:
1) Remove whole comment.
2) Remove only first line and correct second to that.
3) Just correct second to that.

-- 
http://ache.pp.ru/
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r202572 - head/lib/libc/gen

2010-01-20 Thread Bruce Evans

On Wed, 20 Jan 2010, Andrey Chernov wrote:


On Wed, Jan 20, 2010 at 07:43:29PM +1100, Bruce Evans wrote:

No, it is a comment about opendir()'s comparison function.  It has nothing
to do with scandir(), and the only thing that it has to do with alphasort()
is that it must be different for the reasons described.


Then the comment was plain wrong (not misplaced), so removing it becomes
right again because the comment states: opendir()'s comparison function
according to POSIX 2008 and XSI 7 should use strcoll().


No.  It never mentioned opendir() or even scandir().  It only mentioned the
relevant things.  Here it is:

1.25 (delphij  16-Apr-08): /*
1.27 (kib  05-Jan-10):  * POSIX 2008 and XSI 7 require alphasort() 
to call strcoll() for
1.27 (kib  05-Jan-10):  * directory entries ordering.  Use local 
copy that uses strcmp().
1.27 (kib  05-Jan-10):  */


But there is
nothing said about opendir()  strcoll() relation in the mentioned
standards. The only word I found is that opendir() returns ordered
sequence, but nowhere mentioned ordered by what criteria, so perhaps they
mean stable:


As I said before, sorting in opendir() has nothing to do with POSIX!  It
is an implementation detail for union file systems/mounts.


page more than here.  BTW, I don't remember any man page updates for
this.  The man page still only says that alphasort() can be used to
give alphabetical sorting in scandir().


Alphabetically already means sorted according to collate, otherwhise it is
called binary. Perhaps manpage should refer strcoll() directly.


Yes it should, like POSIX does.  It should also give the FreeBSD
extension of POSIX.  POSIX says: If the strcoll() function fails,
then the return value of alphasort() is unspecified., but this makes
alphasort() unusable since a qsort() comparison function must return
a specified value.




I can't see this now (some illusion from my mailer or $TERMCAP
misformatting the patch?), but now I see an extra the in it:

 requires the alphasort() to use strcoll()

should be either

 requires that alphasort() uses strcoll()

(preferred) or

 requires alphasort() to use strcoll()

(probably intended, but not too passive).  I thought that you removed
this line completely.  The previous line is even less useful.


I don't add extra the there) What do you want, clarify please:
1) Remove whole comment.
2) Remove only first line and correct second to that.
3) Just correct second to that.


Correct the second line to that ... uses.

Bruce
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r202572 - head/lib/libc/gen

2010-01-20 Thread Andrey Chernov
On Wed, Jan 20, 2010 at 09:33:08PM +1100, Bruce Evans wrote:
  But there is
  nothing said about opendir()  strcoll() relation in the mentioned
  standards. The only word I found is that opendir() returns ordered
  sequence, but nowhere mentioned ordered by what criteria, so perhaps they
  mean stable:
 
 As I said before, sorting in opendir() has nothing to do with POSIX!  It
 is an implementation detail for union file systems/mounts.

Moreover, even sorting itself is not required here. We sort just to remove 
dups.

 It should also give the FreeBSD
 extension of POSIX.  POSIX says: If the strcoll() function fails,
 then the return value of alphasort() is unspecified., but this makes
 alphasort() unusable since a qsort() comparison function must return
 a specified value.

To be used in practice, strcoll() should never fails, doing fallback to 
strcmp() instead, not only in that, but in lots of other cases too (it may 
set errno like EILSEQ, but not fails). The next important thing is to 
return 0 only for true binary equals, additionaly ranking (f.e. by 
strcmp()) anything inside classes of equality to stabilize result.

I hope our strcoll() will be kept in that state after implementing 
UCA too.

-- 
http://ache.pp.ru/
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r202572 - head/lib/libc/gen

2010-01-20 Thread Bruce Evans

On Wed, 20 Jan 2010, Andrey Chernov wrote:


On Wed, Jan 20, 2010 at 09:33:08PM +1100, Bruce Evans wrote:

But there is
nothing said about opendir()  strcoll() relation in the mentioned
standards. The only word I found is that opendir() returns ordered
sequence, but nowhere mentioned ordered by what criteria, so perhaps they
mean stable:


As I said before, sorting in opendir() has nothing to do with POSIX!  It
is an implementation detail for union file systems/mounts.


Moreover, even sorting itself is not required here. We sort just to remove
dups.


Interesting.  Why does it require a stable sort then?  It only removes
duplicates by name.  At least with strcmp() in the compare function, such
dups will remain together although they may be moved.  The stable sort
would be needed if it must keep the original first of duplicates by name,
but it doesn't say that.

BTW, the statfs() to determine if this sort is necessary is a large
pessimization for nfs file systems.  Nfs caches most things but not
statfs().  Thus a readdir() over nfs does an expensive statfs() every
time although the directory contents will normally be cached after the
first time.  I think the sorting belongs in file systems, not in
readdir() where it affects file systems that don't need it.


It should also give the FreeBSD
extension of POSIX.  POSIX says: If the strcoll() function fails,
then the return value of alphasort() is unspecified., but this makes
alphasort() unusable since a qsort() comparison function must return
a specified value.


To be used in practice, strcoll() should never fails, doing fallback to
strcmp() instead, not only in that, but in lots of other cases too (it may
set errno like EILSEQ, but not fails). The next important thing is to
return 0 only for true binary equals, additionaly ranking (f.e. by
strcmp()) anything inside classes of equality to stabilize result.

I hope our strcoll() will be kept in that state after implementing
UCA too.


What is UCA?

Failing is a POSIX bug -- C99 doesn't allow it to fail.  I think it
should at least be specified to return nonzero (unequal) on failure.
This is like comparisons of NaNs returning unequal even for comparisons
of identical NaNs.

Can it return equal for non-binary-equal strings?  I think it can -- the
locale might have different encodings for strings that are considered
identical.  Then duplicates should be according to strcoll() and file
systems would have a hard time managing such duplicates when they are
created in a locale where they are non-duplicates.

Bruce
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r202572 - head/lib/libc/gen

2010-01-19 Thread Bruce Evans

On Mon, 18 Jan 2010, Andrey A. Chernov wrote:


Log:
 Double checking my commit I found that comment saying that
 POSIX 2008 and XSI 7require strcoll() for opendir() is not true.
 I can't find such requirement in POSIX 2008 and XSI 7.


The comment was correct.  It says that POSIX requires strcoll() for
alphasort(), not for opendir().  Since opendir() is not alphasort(),
and it wants plain ASCII sorting to support union file systems, it
intentionally doesn't use either alphasort() or strcoll().


 So, back out that part of my commit, returning old strcmp(), and remove
 this misleading comment.

Modified:
 head/lib/libc/gen/opendir.c

Modified: head/lib/libc/gen/opendir.c
==
--- head/lib/libc/gen/opendir.c Mon Jan 18 13:38:45 2010(r202571)
+++ head/lib/libc/gen/opendir.c Mon Jan 18 13:44:44 2010(r202572)
@@ -92,15 +92,11 @@ __opendir2(const char *name, int flags)
return __opendir_common(fd, name, flags);
}

-/*
- * POSIX 2008 and XSI 7 require alphasort() to call strcoll() for
- * directory entries ordering.
- */


Was correct, but it could have been clearer by saying , so opendir()
uses this comparison function instead of alphasort().


static int
-opendir_alphasort(const void *p1, const void *p2)
+opendir_sort(const void *p1, const void *p2)


I forget what the old name was.  Having alphasort in the name here was
wrong 3 layers deep, since this is not alphasort(), and alphasort() is not
an alpha sorting function -- it is a lexicographically-on-the-whole-
character-set comparison function.

fts's internal comparison function is named correctly -- fts_compar.


{

-   return (strcoll((*(const struct dirent **)p1)-d_name,
+   return (strcmp((*(const struct dirent **)p1)-d_name,
(*(const struct dirent **)p2)-d_name));


Now correct (was broken by previous commit).


}

@@ -253,7 +249,7 @@ __opendir_common(int fd, const char *nam
 * This sort must be stable.
 */
mergesort(dpv, n, sizeof(*dpv),
-   opendir_alphasort);
+   opendir_sort);


Correct modulo the name.



dpv[n] = NULL;
xp = NULL;



New bug in a comment in scandir(): now has an extra blank line, due to
partial removal.

Bruce
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r202572 - head/lib/libc/gen

2010-01-19 Thread Andrey Chernov
On Wed, Jan 20, 2010 at 01:42:08AM +1100, Bruce Evans wrote:
 The comment was correct.  It says that POSIX requires strcoll() for
 alphasort(), not for opendir().  Since opendir() is not alphasort(),
 and it wants plain ASCII sorting to support union file systems, it
 intentionally doesn't use either alphasort() or strcoll().

Yes, the comment _alone_ was correct, but its place - isn't. Along with 
function name containing _alphasort part it makes impression that 
opendir() uses this type of sort too.

BTW, we already have the same correct comment but in the proper place in 
the scandir.c

 Was correct, but it could have been clearer by saying , so opendir()
 uses this comparison function instead of alphasort().

So, what? The two mentioned things are unrelated and can't be 
concatenated by so.

 I forget what the old name was.  Having alphasort in the name here was
 wrong 3 layers deep, since this is not alphasort(), and alphasort() is not
 an alpha sorting function -- it is a lexicographically-on-the-whole-
 character-set comparison function.

Yes.

 Correct modulo the name.

What name you suggest, opendir_compar()?

 New bug in a comment in scandir(): now has an extra blank line, due to
 partial removal.

Ok, will remove it a bit later.

-- 
http://ache.pp.ru/
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org