bug#11620: Bug in sleep--linux 2.6

2012-06-03 Thread vikas bansal
The sleep function when written at the end, forces the program to sleep,
preventing him to execute any printf() instructions before it.
Attaching a program.
-- 
Vikas Bansal
Student(ECE undergraduate final year)
The L.N.Mittal Institute of Information Technology,Rajasthan,India.
#include stdio.h
#include sys/types.h
#include sys/ipc.h
#include sys/shm.h

#define SHMSIZE 27

main()
{
printf(going to sleep-start);
char c;
int shmid;
key_t key;
char *shm, *s;
printf(going to sleep-start);
/* * We'll name our shared memory segment * 5678. */

key = 5678;

/* * create the segment.* */
if ((shmid = shmget((key_t)key, SHMSIZE, IPC_CREAT | 0666))  0)
{
perror(shmget);
printf(error);
exit(1);
}

/** Now we attach the segment to our data space.*/
if ((shm = shmat(shmid,0, 0)) == (char *) -1)
{
perror(shmat);
printf(error);
exit(1);
}
printf(going to sleep-mid);

/** Now put some things into the memory for the other process to read. */
s = shm;
for (c = 'a'; c = 'z'; c++)
*s++ = c;
*s = '\0';

/** Finally, we wait until the other process
* Changes the first character of our memory
* to '*', indicating that it has read what
* we put there.
*/

printf(going to sleep);
while (*shm != '*')
sleep(1);
exit(0);

}


bug#11620: Bug in sleep--linux 2.6

2012-06-03 Thread Pádraig Brady
tag 11620 notabug

On 06/03/2012 01:13 PM, vikas bansal wrote:
 The sleep function when written at the end, forces the program to sleep,
 preventing him to execute any printf() instructions before it.
 Attaching a program.

You got the wrong list because you followed `man 1 sleep`
rather than `man 3 sleep`.

I've not looked at the program at all,
but you may need to add some fflush(stdout) calls,
to ensure timely output.

cheers,
Pádraig.





bug#11620: Bug in sleep--linux 2.6

2012-06-03 Thread Bob Proulx
retitle 11620 libc I/O buffers not flushed confusion
tag 11620 + notabug
close 11620
thanks

vikas bansal wrote:
 The sleep function when written at the end, forces the program to sleep,
 preventing him to execute any printf() instructions before it.

You have confused the sleep(3) libc C library routine with the
coreutils sleep(1) command line utility.

Please see this reference:

  
http://www.gnu.org/software/coreutils/faq/#I-am-trying-to-compile-a-C-program-_002e_002e_002e

 Attaching a program.

Thank you for showing us the program even if it isn't anything to do
with coreutils.

 printf(going to sleep);

Missing \n newline at end of line?

 while (*shm != '*')
 sleep(1);
 exit(0);

The misunderstanding you are seeing is that printf() is calling the
libc stdio function which is buffering the output data.  This data is
only flushed when exit() is called.  The libc buffers data differently
depending upon the context of the file such as if the output is a tty
or a file.

If you wish to ensure that data is output then you should fflush() the
data in the libc I/O buffers.

  http://pubs.opengroup.org/onlinepubs/009696799/functions/fflush.html

Bob





bug#11621: questionable locale sorting order (especially as related to char ranges in REs)

2012-06-03 Thread Linda Walsh

Within in the past few years, use of ranges in RE's has become
unreliable due to some locale changes sorting their native character
sets such that aAbByYzZ (vs. 'C' ordering ABYZabyz).

Additionally many distro's have switched to UTF-8 resulting in
localizations like en_GB.UTF-8, en_US.UTF-8, etc...

There seems to be a problem in when a user has set their system to use
Unicode, it is no longer using the locale specific character set 
(iso-8859-x,

or others).

In Unicode, it is recommended that upper case be uniformly sorted
below lower case (section 6.6, http://www.unicode.org/reports/tr10/).

A chart, including accent variations is at

http://unicode.org/charts/case/chart_Latin.htm.

Temporarily ignoring accents, only talking about lower and upper
case letters, you will note that the sorting order of A=41, B=42, C=43,
while the lower case letters from 'a', have weights a=61, b=62, c=63.

This uniformly puts all lower case letters after any upper case letters.

Thus -- I am asserting, that any computer using a local for country
preferences, BUT is also using a unicode character set (e.g. UTF-8),
should return sorted results as specified by the character set.

I.e. the utility 'sort' (and any programs that use the collation/sorting
order specified in the core-utils libs) should return A-Z  a-z.


This is currently not the case and is leading to erroneous results
in programs written before locales were considered.  The thing is --
in many cases, within some short period of locales being implemented,
many or most distro's also switched to UTF-8.

Unfortunately it's collation order has not been respected.

I would assert this is a serious bug that should be addressed ASAP...


Thanks,
Linda W.







bug#11621: questionable locale sorting order (especially as related to char ranges in REs)

2012-06-03 Thread Pádraig Brady
On 06/03/2012 11:13 PM, Linda Walsh wrote:
 Within in the past few years, use of ranges in RE's has become
 unreliable due to some locale changes sorting their native character
 sets such that aAbByYzZ (vs. 'C' ordering ABYZabyz).
 
 Additionally many distro's have switched to UTF-8 resulting in
 localizations like en_GB.UTF-8, en_US.UTF-8, etc...
 
 There seems to be a problem in when a user has set their system to use
 Unicode, it is no longer using the locale specific character set (iso-8859-x,
 or others).

It's not specific to unicode. Sorting in a iso-8859-1 charset
results in locale ordering:

$ printf %s\n A b a á | iconv -t iso-8859-1 | LC_ALL=en_US sort | iconv -f 
iso-8859-1
a
A
á
b

 In Unicode, it is recommended that upper case be uniformly sorted
 below lower case (section 6.6, http://www.unicode.org/reports/tr10/).
 
 A chart, including accent variations is at
 
 http://unicode.org/charts/case/chart_Latin.htm.

http://unicode.org/charts/case/chart_Latin.html

 Temporarily ignoring accents, only talking about lower and upper
 case letters, you will note that the sorting order of A=41, B=42, C=43,
 while the lower case letters from 'a', have weights a=61, b=62, c=63.
 
 This uniformly puts all lower case letters after any upper case letters.
 
 Thus -- I am asserting, that any computer using a locale for country
 preferences, BUT is also using a unicode character set (e.g. UTF-8),
 should return sorted results as specified by the character set.
 
 I.e. the utility 'sort' (and any programs that use the collation/sorting
 order specified in the core-utils libs) should return A-Z  a-z.

Well case comparison is a complicated area.

For the special case of discounting accented chars etc.
you can use an attribute of the well designed UTF-8.
Enabling traditional byte comparison on (normalized) UTF-8 data
will result in data sorted in Unicode code point order:

$ printf %s\n A b a á | LC_ALL=C sort
A
a
b
á

 This is currently not the case and is leading to erroneous results
 in programs written before locales were considered.  The thing is --
 in many cases, within some short period of locales being implemented,
 many or most distro's also switched to UTF-8.
 
 Unfortunately it's collation order has not been respected.
 
 I would assert this is a serious bug that should be addressed ASAP...

As for the question in the subject for handling ranges in REs,
there has been recent work in changing as you suggest:

http://lists.gnu.org/archive/html/bug-gnulib/2011-06/threads.html#00105

cheers,
Pádraig.





bug#11621: questionable locale sorting order (especially as related to char ranges in REs)

2012-06-03 Thread Linda A. Walsh



Pádraig Brady wrote:

On 06/03/2012 11:13 PM, Linda Walsh wrote:

Within in the past few years, use of ranges in RE's has become
unreliable due to some locale changes sorting their native character
sets such that aAbByYzZ (vs. 'C' ordering ABYZabyz).

There seems to be a problem in when a user has set their system to use
Unicode, it is no longer using the locale specific character set (iso-8859-x,
or others).


To clarify my above statement:


   There seems to be a problem in when a user has set their system to use
Unicode: It is no longer using the locale specific character set (iso-8859-x,
or others) -- ***or*** *their* *orderings*.  I.e. Unicode defines a collation
order -- I don't know that they others do ('C' does, but I don't know about
other locale-specific character sets).



It's not specific to unicode. Sorting in a iso-8859-1 charset
results in locale ordering:


Can you cite a source specifying the sort/collation order of the
iso-8859-1 charset that would prove that it is not-conforming to the collation
specification for that charset?

I.e. If there is no official source, then the order with that charset
is undefined, and while it may not be desirable, returning aAbB, would not
be an error.





http://unicode.org/charts/case/chart_Latin.htm.


http://unicode.org/charts/case/chart_Latin.html

---
^^Correct^^ (typho)


Temporarily ignoring accents, only talking about lower and upper
case letters, ...


Well case comparison is a complicated area.


A bit, but it's mostly just wrong in the gnu library concerning 
unicode, and,
as you are pointing out -- the 'C' encoding as well.
the 'C' locale was the original charset used by the 'C' language -- only 8 bits
wide.

So how can it sort characters beyond the lower 256?
This would seem to be meaningless and bugs output.


Is it?...   When the case comparison ordering is specified in a
standard, it makes it fairly clear that one is either compliant with the 
standard
or not.

In this case, the Gnu sort/collation lib is not Unicode/UTF-8 compliant.

What happens in other charsets may or may not be covered under some
other standard -- e.g. the 'C'/ascii ordering is specified.  But I don't know
if others have relevant standards or not.



For the special case of discounting accented chars etc.
you can use an attribute of the well designed UTF-8.

---
This is not exactly the point -- the point is that the core sort
DOESN'T use that ordering.  That's the bug I am reporting.

In reporting this, I'm trying to keep the argument 'simple' and focus on
the problem of widely used ranges in the first 256 code-points of
Unicode.

Unicode gives a fairly extensive algorithm for handling accents,
but I didn't want to complicate the discussion by going there.  Please
focus this bug on the lower 128 code points, as full unicode compliance
with the full collation algorithm that is specified is likely to be a
larger task.  HOWEVER, fixing the sorting/collation order of the lower
127 code points, is, comparatively a small task that conceivably could be
fixed in the next release.



Enabling traditional byte comparison on (normalized) UTF-8 data
will result in data sorted in Unicode code point order:
A b a á = A a b á


But you are missing the point (as well as raising an interesting 
'feature'(?bug?)).

How is it that 'C' collation collates characters that are outside the ascii 
range?

I.e. -- you can't interpret input data as 'unicode' in the 'C' locale.
So how does this work in the 'C' local?  AND more importantly -- it SHOULD work
when charset is unicode (UTF-8)... and does not.  Test prog:
---
#!/bin/bash
set -m
# vals to test:
declare -a vals=( A a B b X x Y y Z z Ⅷ  Ⅴ Ⅲ Ⅰ Ⅿ Ⅽ ⅶ  ⅼ ⅲ )
COLLATE_ORDER=C

function isatty {
local fd=${1:-1} ;
0$fd tty -s
}

function ord {
  local nl=;
isatty  nl=\n
printf %d$nl '$1
}

function background_print {
readarray -t inp
for ch in ${inp[@]}; {
printf %s   (U+%x)\n $ch $(ord $ch)
}
}


printf %s\n ${vals[@]} |
LC_COLLATE=$COLLATE_ORDER sort |
background_print



Note, that the above produces:

/tmp/stest
Ⅷ   (U+2167)
Ⅴ   (U+2164)
Ⅲ   (U+2162)
Ⅰ   (U+2160)
Ⅿ   (U+216f)
Ⅽ   (U+216d)
ⅶ   (U+2176)
ⅼ   (U+217c)
ⅲ   (U+2172)
a   (U+61)
A   (U+41)
b   (U+62)
B   (U+42)
x   (U+78)
X   (U+58)
y   (U+79)
Y   (U+59)
z   (U+7a)
Z   (U+5a)

NOT the output you showed...Seems there's a bug in the C collation order?

Changing collation order to UTF-8:

Same thing:
 /tmp/stest
Ⅷ   (U+2167)
Ⅴ   (U+2164)
Ⅲ   (U+2162)
Ⅰ   (U+2160)
Ⅿ   (U+216f)
Ⅽ   (U+216d)
ⅶ   (U+2176)
ⅼ   (U+217c)
ⅲ   (U+2172)
a   (U+61)
A   (U+41)
b   (U+62)
B   (U+42)
x   (U+78)
X   (U+58)
y   (U+79)
Y   (U+59)
z   (U+7a)
Z   (U+5a)



I would assert this is a serious bug that should be addressed ASAP...


As for the question in