Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Ian Jackson
Peter Silva writes ("Re: Bug#877900: How to get 24-hour time on en_US.UTF-8 
locale now?"):
> iso_en ?  That sounds smart...
> 
> English for most of the world that aren't necessarily native English speakers?
> https://en.wikipedia.org/wiki/International_English
> Use ISO dates and stuff, and pick a random spelling. As a Canadian, I'm pretty
> sure about colour, but unclear about whether we should standardize on disc.
> Dates should be iso, even better if it used UTC as the timezone.   This would
> be a default that would include US keyboard bindings (by default.)
> as the easiest thing to default to during installation, etc.. but perhaps I
> should be disqualified, being both a unix greybeard, and a recovering ntp
> admin.

I don't see that this exists as a locale already.  It is probably too
late for buster to introduce it.

Realistically our sensible choices for the default are
  C.UTF-8
  One of en_{AU,GB,NZ}.UTF-8

All of these would be better than en_US.UTF-8 for the reasons given
by Adam (although, Adam, really, could you try to be a little less
rude?).

The middle-endian dates and 12-hour clock are particularly poor
defaults.

Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: gnupg2 autopkgtest uses multi-arch which seems fragile

2018-07-09 Thread Ian Jackson
Ian Jackson writes ("Re: gnupg2 autopkgtest uses multi-arch which seems 
fragile"):
> I looked in:
> 
> * debian/tests/control in the gnupg2 source tree.
>   One test, of gpgv-win32.  Depends on gpgv-win32, gnupg2,

I have found it:

debian/tests/gpgv-win32 manually installs wine32 using apt.

This seems quite wrong.  If a package needs to be installed, it should
be handled via Depends in debian/tests/control.  Otherwise all of the
machinery to select which packages are being tested is utterly
defeated.

Ian.



Re: gnupg2 autopkgtest uses multi-arch which seems fragile

2018-07-09 Thread Ian Jackson
Paul Gevers writes ("gnupg2 autopkgtest uses multi-arch which seems fragile"):
> The following packages have unmet dependencies:
>  wine32:i386 : Depends: libc6:i386 (>= 2.17) but it is not going to be
> installed

I am at a loss to understand why anything is trying to install wine32
here.

I looked in:

* debian/tests/control in the gnupg2 source tree.

  One test, of gpgv-win32.  Depends on gpgv-win32, gnupg2,

* debian/control

  gpg-win32 Suggests wine.  No stronger dependency apparent.

* https://packages.debian.org/sid/gpgv-win32

  One dependency: the Suggests on wine.

* ci.d.n "test log" and "test artifacts" from 2018-07-09 04:35:19 UTC.

  The artifacts.tar.gz contains only these hits for wine:
 $ zgrep wine *
 gpgv-win32-stderr.gz:dpkg-query: package 'wine32' is not installed and no 
information is available
 gpgv-win32-stdout.gz: wine32:i386 : Depends: libc6:i386 (>= 2.17) but it 
is not going to be installed
 gpgv-win32-stdout.gz:   Depends: libwine:i386 (= 3.0.2-1) but 
it is not going to be installed
 $

  The test log is no more informative.

Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Bug#28250: closed by Credible Finance <y_mai...@yahoo.com> (reply to Credible Finance <crediblefinance....@gmail.com>) (Apply for a Personal/Business Loan)

2018-02-26 Thread Ian Jackson
Control: reopen -1
Control: retitle -1 perl can lose output due to stdio buffering
Control: found -1 5.20.2-3+deb8u9

Debian Bug Tracking System writes ("Bug#28250 closed by Credible Finance 
 (reply to Credible Finance 
) (Apply for a Personal/Business Loan)"):
> It has been closed by Credible Finance  (reply to 
> Credible Finance ).

spam

Also, here is a modern repro.

Reference behaviour in non-buggy case:

chiark:~> perl -mautodie -e '$|=1; print "hi" or die $!; flush STDOUT' 
>/dev/full 
No space left on device at -e line 1.
chiark:~> echo $?
28

Observed behaviour in buggy case:

chiark:~> perl -mautodie -e 'print "hi" or die $!; flush STDOUT' >/dev/full 
chiark:~> echo $?
0

Desired behaviour: print a message to stderr, mentioning at least the
errno value and ideally also the descriptor number or stream name, and
exit nonzero.

I think this is a bug in the libc (#28251).  This does not seem to
have been accepted by libc developers.  Instead the Unix world has
engaged in a decades-long campaign to work around it in all
application softwware.

Ian.



Bug#28251: closed by Credible Finance <y_mai...@yahoo.com> (reply to Credible Finance <crediblefinance....@gmail.com>) (Apply for a Personal/Business Loan)

2018-02-26 Thread Ian Jackson
Control: reopen -1
Control: retitle -1 libc should unilaterally report buffered write error on 
close

Debian Bug Tracking System writes ("Bug#28251 closed by Credible Finance 
 (reply to Credible Finance 
) (Apply for a Personal/Business Loan)"):
> It has been closed by Credible Finance  (reply to 
> Credible Finance ).

spam



Bug#850182: Please disable TSX in stretch and backport to jessie

2017-01-04 Thread Ian Jackson
Package: eglibc

Gilles Filippini writes ("Request for help - scilab segfaults with TSX"):
> I've just noticed this RC bug [1] against scilab. [...]
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=844134
> [2] https://lists.debian.org/debian-devel/2016/11/threads.html#00210
...
> I don't have access to any box with TSX enabled, and failed to find any
> porterbox as well. [...]

amd64 with TSX is for the purposes of pthreads like a new
architecture: the locking primitives behave differently and expose
extra bugs.

These extra bugs will be discovered only by chance (as we see in that
bug report and in the earlier bugs #843324 and maybe #842796).  As
more TSX-capable hardware becomes available, we will discover more of
them, during the life of stretch, when they are hard to fix.

Also, we don't have the capability to debug them.  I don't think we
can have a release architecture for stretch that has no porterboxes.

So please would the libc be changed not to make use of these features
for stretch.  The downsides will be somewhat lower performance and
not detecting some preexisting bugs; but the upsides are not shipping
undetected bugs, and not throwing useful software out of Debian.

Please would you make a decision quickly.

Regards,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-06 Thread Ian Jackson
Henrique de Moraes Holschuh writes ("Re: libc recently more aggressive about 
pthread locks in stable ?"):
> Per logs from message #15 on bug #842796:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=842796#15
> 
> SIGSEGV on __lll_unlock_elision is a signature (IME with very high
> confidence) of an attempt to unlock an already unlocked lock while
> running under hardware lock elision.

I don't know anything about hardware lock elision...

> Well, unlocking an already unlocked lock is a pthreads API rule
> violation, and it is going to crash the process on something that
> implements hardware lock elision.

... but you are of course correct about this.  I debugged the problem
with ghostscript, and it was indeed violating the pthreads rules.  I
have filed #843324 with a patch for Debian to backport the
corresponding upstream fix.  I don't understand the wider logic in
ghostscript; the bug was in the colour space management code and
occurred when a function was called with two pointer arguments which
were actually aliases of the same colourspace-related data structure.
Converting ghostscript to use recursive mutexes was IMO clearly
correct and fixed the bug.

> If the problem is too widespread and too hard to fix on a large number
> of packages, I suppose we could ask the glibc maintainers to consider
> disabling hardware lock elision support in stable through a stable
> update.

I think this would be a good idea.

ogg123 and ghostscript are hardly obscure programs.  It's difficult to
know how bad this problem is, but we would like stable to be useful
even on recent hardware.

> And what should we do about Debian stretch, then?

Perhaps we could add the assert you suggest, on non-lock-elision
hardware.  Whether to do that would depend on its performance impact.

TBH I wonder whether we really want to be giving an evidently shonky
codebase boobytrapped mutexes by default.  We could change the default
mutex type to recursive and make all of these bugs go away.

Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: libc recently more aggressive about pthread locks in stable ?

2016-11-05 Thread Ian Jackson
Ian Jackson writes ("libc recently more aggressive about pthread locks in 
stable ?"):
> I have just been debugging a ghostscript segfault on jessie amd64.
...
> I recently encountered what seems to be a similar bug in ogg123 in
> stable.  #842796.
> 
> Has something changed in jessie's libc recently ?  I find it difficult
> to imagine that these bugs would have been missed earlier during the
> life of jessie.
> 
> I will try to make a patch to fix ghostscript, or at least file a
> proper bug.  But, if there was a libc change, would it be possible to
> revert it or make some kind of workaround ?

FYI, the ghostscript bug, with patch for jessie, is #843324.
sid's ghostscript is fine and I think stretch's is too.

Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



libc recently more aggressive about pthread locks in stable ?

2016-11-05 Thread Ian Jackson
I have just been debugging a ghostscript segfault on jessie amd64.

Looking at the code, I think that gs in jessie is plainly violating
the rules about the use of pthread locks.  On my partner's machine,
this makes it segfault on termination (with some input files, at
least).  On my machine it works just fine.  The code in sid is better.

I recently encountered what seems to be a similar bug in ogg123 in
stable.  #842796.

Has something changed in jessie's libc recently ?  I find it difficult
to imagine that these bugs would have been missed earlier during the
life of jessie.

I will try to make a patch to fix ghostscript, or at least file a
proper bug.  But, if there was a libc change, would it be possible to
revert it or make some kind of workaround ?

Thanks,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: Replacing ldconfig maintscripts with declarative methods

2015-08-25 Thread Ian Jackson
Niels Thykier writes (Replacing ldconfig maintscripts with declarative 
methods):
 A possible solution is to replace these scripts with an
 activate-no-await trigger (again, no-await to avoid trigger cycles).
 I would need libc-bin to promote its trigger to part of its API for this
 to work.

I think this is a good idea.

  * The major concern I have, is that activate-triggers are done for
- unpack (is this ok?)

It had better be !  (Ie I think it is OK.)


  * Performance-wise we would see up to 5 calls to ldconfig instead of
1-2 per dpkg run (that processes triggers).

OTOH the reduced number of maintscript invocations might well outweigh
that.

Thanks,
Ian.



Bug#749345: if_nametoindex error behaviour assumptions

2014-05-26 Thread Ian Jackson
Package: libc6
Version: 2.13-38+deb7u1

As part of trying to determine the error behaviour of if_nametoindex,
I wrote and ran the attached test program.  I discovered by looking at
the strace of a test run that if_nametoindex doesn't always properly
check the errno values from its system calls.

To reproduce:
  * Compile the attached test program
  cc -Wallif-nametoindex-test.c -o if-nametoindex-test
  * Run it in a way which will make it fail
  strace sh -c 'ulimit -n 4; exec ./t x
  * Observe the strace output.

In my setup I see this:

open(/dev/null, O_RDONLY) = 3
access(/proc/net, R_OK)   = 0
access(/proc/net/unix, R_OK)  = 0
socket(PF_FILE, SOCK_DGRAM|SOCK_CLOEXEC, 0) = -1 EMFILE (Too many open files)
socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = -1 EMFILE (Too many open 
files)
access(/proc/net/if_inet6, R_OK)  = 0
socket(PF_INET6, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = -1 EMFILE (Too many 
open files)
access(/proc/net/ax25, R_OK)  = -1 ENOENT (No such file or directory)
access(/proc/net/nr, R_OK)= -1 ENOENT (No such file or directory)
access(/proc/net/rose, R_OK)  = -1 ENOENT (No such file or directory)
access(/proc/net/ipx, R_OK)   = -1 ENOENT (No such file or directory)
access(/proc/net/appletalk, R_OK) = -1 ENOENT (No such file or directory)
access(/proc/sys/net/econet, R_OK)= -1 ENOENT (No such file or directory)
access(/proc/sys/net/ash, R_OK)   = -1 ENOENT (No such file or directory)
access(/proc/net/x25, R_OK)   = -1 ENOENT (No such file or directory)
write(2, got 0, No such file or directory..., 33got 0, No such file or 
directory
) = 33

I think if_nametoindex should have immediately stopped when it got
EMFILE, and returned to the caller with errno still set to EMFILE.

It's also far from clear that ENOENT is the right return value to
provide if the protocol type walk falls off the end.  ENXIO would seem
better.  But I haven't looked at the glibc source code to see exactly
what code I'm exercising here.

Ian.

#include sys/types.h
#include sys/socket.h
#include netinet/in.h
#include net/if.h
#include stdio.h
#include errno.h
#include string.h
#include unistd.h
#include sys/fcntl.h
#include assert.h
int main(int argc, char **argv) {
  int fd = open(/dev/null,O_RDONLY);
  assert(fd=0);

  errno = 0;
  int x = if_nametoindex(argv[1]);
  fprintf(stderr, got %d, %s\n, x, strerror(errno));
  return 0;
}


Bug#749349: Please document if_nametoindex error behaviour

2014-05-26 Thread Ian Jackson
Package: glibc-doc-reference, libc6
Version: 2.13-1, 2.13-38+deb7u1

I have been writing a program which needs to call if_nametoindex.
Naturally I need to deal with the possible error cases.


RFC3493, which appears to be the standards document describing it, has
this to say about its error behaviour:

   If ifname is the name of an interface, the if_nametoindex() function
   shall return the interface index corresponding to name ifname;
   otherwise, it shall return zero.  No errors are defined.

The glibc info document says, very similarly,

 -- Function: unsigned int if_nametoindex (const char *ifname)
 This function yields the interface index corresponding to a
 particular name.  If no interface exists with the name given, it
 returns 0.

Of course this leaves the implementors of if_nametoindex in a quandry:
what to do if the operation fails ?  And correspondingly, it doesn't
say how the callers are to distinguish no such interface from
arrgh!.


I wrote a test program (see #749345) to discover the real behaviour.
It appears that if if_nametoindex fails it sets errno - on Debian
squeeze, FreeBSD 10 and OpenBSD 4.9, at least.

If the name is not that of an interface, it sets errno to ENXIO (which
is is specified on the RFC for if_indextoname in the analogous case).

If if_nametoindex fails for another reason, it returns 0 and leaves
errno set to whatever it got from the underlying failing system call.
We (mostly Mark Wooding and Richard Kettlewell) haven't been able to
get the BSDs to actually fail - the implementation has a fallback
strategy which meant that the ulimit trick doesn't work.  But FreeBSD
documents the behaviour (mostly):
  
http://www.freebsd.org/cgi/man.cgi?query=getifaddrssektion=3apropos=0manpath=FreeBSD+10.0-RELEASE


It would be good if the error behaviour were documented in the glibc
reference manual.


Thanks,
Ian.


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/21379.23603.494700.761...@chiark.greenend.org.uk



Bug#683826: cfsetspeed, real baud rates, custom baud rates (BOTHER)

2012-08-04 Thread Ian Jackson
Package: libc6
Version: 2.13-33

In summary, please:

   - Fix the documentation so that it no longer claims that
 tcgetispeed and tcgetospeed return actual baud rates.

   - Provide new functions tcgetispeedbps and tcgetospeedbps
 which do return actual baud rates.

   - Make all the tc*speed* functions support arbitrary baud
 rates using (c_cflagsCBAUD)==BOTHER / c_ispeed / c_ospeed.

   - When on Linux use TCSETS2 if (c_cflagsCBAUD)==BOTHER.
 Use TCGETS2 when it is available, or at least when necessary.


According to the info manual for glibc, node `(libc) Line Speed':

 *Portability note:* In the GNU library, the functions above accept
  speeds measured in bits per second as input, and return speed values
  measured in bits per second.  Other libraries require speeds to be
  indicated by special codes.  For POSIX.1 portability, you must use one
  of the following symbols to represent the speed; their precise numeric
  values are system-dependent, but each name has a fixed meaning: `B110'
  stands for 110 bps, `B300' for 300 bps, and so on.  There is no
  portable way to represent any speed but these, but these are the only
  speeds that typical serial lines can support.

However, actually, this does not work as described.  cfgetspeed
returns a traditional B9600 value, not the number 9600 (or whatever).
(cfsetspeed does accept baud rates in bps as documented and converts
them to magic constants in the struct termios).

The only way to make this work as documented in the glibc manual would
be to change the B #defines each to have the value .  That
would break compatibility with programs which know about setting
c_cflagsCBAUD to B.  (I think such programs will exist.)

I therefore suggest that the specification is wrong.  cfget{i,o}speed
are defined in POSIX to return these B constants, and the values
of the constants can't be changed, so users of cfget{i,o} speed will
never be able to expect speeds.  cfgetispeed and cfgetospeed have to
continue to return magic constants rather than baud rates.

There should however be a function with the semantics described.  This
is useful because (i) we would like to be able to easily print out or
compare or compute with the actual baud rate without having a B
table in the application and (ii) the application may want to set
nonstandard baud rates (see below).

I therefore propose that we should introduce:

 -- Function: speed_t cfgetispeedbps (struct termios *TERMIOS-P)
 This function returns the input line speed stored in the
 structure `*TERMIOS-P' as a number of bits per second.

and the corresponding cfgetospeedbps.

Also it would be very desirable to be able to support arbitrary baud
rates.  At least Linux does support arbitrary baud rates where the
hardware can do it: you are supposed to set c_cflagsCBAUD to BOTHER
and c_ispeed and c_ospeed to the desired rates in bps.

This does not currently work properly with glibc on Linux even if you
do it by hand because glibc uses the Linux ioctls TGGETS/TCSETS whose
Linux termios struct does not contain c_ispeed and c_ospeed.  (glibc
appears to convert between its idea of termios and the kernel's.)  If
glibc used TCGETS2/TCSETS2 where available or where necessary, then
arbitrary baud rates would work.

TCSETS2 needs to be used if (c_cflagCBAUD)==BOTHER.  Otherwise I
think TCSETS is sufficient (but there may be other reasons why TCSETS
is insufficient - I haven't checked).  For getting ideally TCGETS2
would be used all the time but some old kernels don't support it so
glibc should try TCGETS2 first and then TCGETS, probably.

Ian.


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20509.9758.229930.316...@chiark.greenend.org.uk



Bug#683826: Acknowledgement (cfsetspeed, real baud rates, custom baud rates (BOTHER))

2012-08-04 Thread Ian Jackson
PS here are the two test programs I wrote while preparing the bug
report.

#include termios.h
#include stdio.h
#include stdlib.h

#defineBOTHER 001

int main(int argc, const char **argv) {
  struct termios to;
  int r;
  r = tcgetattr(0, to);  if (r) { perror(tcgetattr); exit(-1); }

  const char *arg = argv[1];

  if (!arg) {
printf(cfgetispeed=%lu cfgetospeed=%lu,
   (unsigned long)cfgetispeed(to),
   (unsigned long)cfgetospeed(to));
if ((to.c_cflag  CBAUD) == BOTHER)
  printf( BOTHER c_ispeed=%lu c_ospeed=%lu,
 (unsigned long)to.c_ispeed,
 (unsigned long)to.c_ospeed);
putchar('\n');
exit(0);
  }

  speed_t spd = atoi(arg);
  r = cfsetspeed(to, spd);
  if (r) {
perror(tcsetspeed failed (will do by hand));
to.c_ispeed = to.c_ospeed = spd;
to.c_cflag = ~CBAUD;
to.c_cflag |= BOTHER;
  }
  r = tcsetattr(0, TCSANOW, to);  if (r) { perror(tcsetattr); exit(-1); }
  return 0;
}
#include stdio.h
#include stdlib.h
#include sys/ioctl.h
#include /usr/include/asm-generic/termbits.h
#include /usr/include/asm-generic/ioctls.h

#defineBOTHER 001

int main(int argc, const char **argv) {
  struct termios2 to;
  int r;

  r = ioctl(0, TCGETS2, to);  if (r) { perror(TCGETS2); exit(-1); }

  const char *arg = argv[1];
  if (!arg) {
printf(c_cflagCBAUD=0%lo, (unsigned long)(to.c_cflagCBAUD));
if ((to.c_cflag  CBAUD) == BOTHER)
  printf( BOTHER c_ispeed=%lu c_ospeed=%lu,
 (unsigned long)to.c_ispeed,
 (unsigned long)to.c_ospeed);
putchar('\n');
exit(0);
  }

  speed_t spd = atoi(arg);
  to.c_ispeed = to.c_ospeed = spd;
  to.c_cflag = ~CBAUD;
  to.c_cflag |= BOTHER;
  r = ioctl(0, TCSETS2, to);  if (r) { perror(TCSETS2); exit(-1); }
  return 0;
}


Bug#682972: ttyname() (and /bin/tty) on /dev/tty return /dev/tty

2012-07-27 Thread Ian Jackson
Package: libc6
Version: 2.11.2-10

ttyname() should return the name of the tty device in a form than can
be used by other processes (with different controlling terminals) to
open it.

Steps to reproduce:

  $ perl -e 'use POSIX; printf %s\n, ttyname(STDIN)'
  /dev/pts/117
  $ perl -e 'use POSIX; printf %s\n, ttyname(STDIN)' /dev/tty
  /dev/tty
  $ tty
  /dev/pts/117
  $ tty /dev/tty
  /dev/tty
  $

Expected output: in a normal xterm or ssh session, `tty /dev/tty'
should produce the same output as `tty'.

Thanks,
Ian.


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20498.44036.596005.239...@chiark.greenend.org.uk



Bug#682972: ttyname() (and /bin/tty) on /dev/tty return /dev/tty

2012-07-27 Thread Ian Jackson
Ian Jackson writes (ttyname() (and /bin/tty) on /dev/tty return /dev/tty):
 Package: libc6
 Version: 2.11.2-10
 
 ttyname() should return the name of the tty device in a form than can
 be used by other processes (with different controlling terminals) to
 open it.

Having considered and discussed this some more I think it will be
difficult to fix this without a kernel change.

Consider the following case:

$ exec 4/dev/tty
$ xterm
[1] 15604
$

And then in the resulting xterm:

$ tty 4
/dev/tty
$ echo hi 4  [ prints hi on the original terminal, as expected ]
$ echo hi `tty 4`
hi [ this is wrong, it should go to the original terminal ]
$

AFAICT Linux (2.6.32-5-686-bigmem at least) don't provide any way to
get the right answer.

My friends report that the behaviour across other operating systems is
not consistent, but I think the behaviour we see here is clearly
wrong.  (I don't know what Debian/kFreeBSD and Hurd do.)

SuSv3 says that ttyname returns
a string containing a null-terminated pathname of the terminal 
associated with file descriptor fildes

I can't see how that can be satisfied by returning a pathname which,
when opened, gives a new descriptor referring to a different terminal.
(Notwithstanding the non-normative Application Usage section giving
/dev/tty as an example of a possible return value.)

There is similar problem with ctermid, which the glibc info manual
documents as always returning /dev/tty.

Thanks,
Ian.


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20498.51388.987753.650...@chiark.greenend.org.uk



Bug#438179: RFC3484 rule 9 active again in glibc 2.7-5.

2008-02-26 Thread Ian Jackson
Aurelien Jarno writes (Re: RFC3484 rule 9 active again in glibc 2.7-5.):
 IP on different subnet are not sorted, IP on some local subnet are
 sorted by a longer common prefix with the interface address.

Err, pardon my language, but WTF ?!

What on earth is the justification for that ?

Ian.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#438179: RFC3484 rule 9 active again in glibc 2.7-5.

2008-02-24 Thread Ian Jackson
Aurelien Jarno writes (Re: RFC3484 rule 9 active again in glibc 2.7-5.):
 An IP which uses the same IP range as your computer, as defined by the
 netmask. In short a local server which can be reached without a
 gateway.

Ah.  I see.

So what you mean is that it will now:
  * prefer a server in the same subnet as one of the local interfaces
as defined by the netmask on that interface, to a server which
is not;
  * not otherwise sort servers according to their IPv4 address
unless specifically configured
?

That sounds exactly right.

If you mean that _for servers on some local subnet_ it will prefer to
use servers with a longer common prefix with the interface address,
then I think that's wrong.

So for example, my machine here has eth0 172.18.45.2/24.  You're
saying (I hope) that it would prefer 172.18.45.6 (because it's on the
subnet local to eth0) to 172.31.80.8 (which is not), which is fine.

If you're saying that it would prefer 172.18.45.6 to 172.18.45.11
because .6 has a longer common prefix with .2 than .11, then I think
that's wrong.  But that would be pretty weird so I assume that's not
what you mean.

Thanks,
Ian.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#438179: RFC3484 rule 9 active again in glibc 2.7-5.

2008-02-22 Thread Ian Jackson
Aurelien Jarno writes (Re: RFC3484 rule 9 active again in glibc 2.7-5.):
 Upstream has committed a fix in the CVS (without telling anybody) so
 that for IPv4 addresses rule 9 is only applied when source and
 destination addresses are in the same subnet. I guess this is very close
 to the wanted behaviour reported in this bug log, so I am reassigning the
 bug back to the libc6 package. It will be closed by the next upload.

I see.  Thanks for letting us know.

What does `in the same subnet' mean ?

Ian.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#447609: ldconfig triggerisation

2007-10-23 Thread Ian Jackson
Daniel Jacobowitz writes (Re: Bug#447609: ldconfig triggerisation):
 On Mon, Oct 22, 2007 at 04:07:05PM +0100, Ian Jackson wrote:
  [assumptions]
 
 Note that this is usually true but not always; it may be true
 enough for our purposes but I want to set the record straight.

Thanks.

 The only failure case I can think of would be a package which places
 libraries in the multi-arch directories, which Debian locates using a
 file in /etc/ld.so.conf.d, and the same or another package which runs
 a newly installed program using the library from the first package
 in its postinst.

Do such packages typically invoke ldconfig in the same way as a normal
package does ?  If not then we could distinguish the two kinds of
ldconfig call.

 If usage of those directories is planned to increase this may become
 a problem.

Yes, I see.

Joey Hess writes (Re: Bug#447609: ldconfig triggerisation):
 Couldn't file triggers be used, so ldconfig is triggered after any
 package installs a library file? That's much more how I expected
 triggers to be used, rather than needing an ugly ldconfig wrapper. I
 think it also addresses drow's point about libraries in nonstandard
 locations, since those packages could just run ldconfig as usual.
 Meanwhile, packages installing libraries to standard locations could
 stop calling ldconfig.

Unfortunately because of the way /lib and particularly /usr/lib are
dumping grounds, file triggers wouldn't work properly.  A file trigger
is activated by packages' files whose pathnames start with the name of
the trigger.  So the trigger would have to be /usr/lib/ but that
covers enormous amounts of junk.

Also, doing it like that would mean we should probably wait with
starting on those changes to packages until the dpkg (and apt and
aptitude) with triggers were actually deployed, as otherwise in the
meantime ldconfig wouldn't run at all.

Ian.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#447609: ldconfig triggerisation

2007-10-23 Thread Ian Jackson
Daniel Jacobowitz writes (Re: Bug#447609: ldconfig triggerisation):
 On Tue, Oct 23, 2007 at 11:11:37AM +0100, Ian Jackson wrote:
   The only failure case I can think of would be a package which places
   libraries in the multi-arch directories, which Debian locates using a
   file in /etc/ld.so.conf.d, and the same or another package which runs
   a newly installed program using the library from the first package
   in its postinst.
  
  Do such packages typically invoke ldconfig in the same way as a normal
  package does ?  If not then we could distinguish the two kinds of
  ldconfig call.
 
 I don't think there would be any difference.

Hrm.  That's troublesome.  Can we insist on them making such a
different invocation ?

If we can't then the ldconfig provided by glibc can't tell the
difference and AFAICS that means we can't provide the optimisation
without updating all packages.

Ian.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#447609: ldconfig triggerisation

2007-10-23 Thread Ian Jackson
Clint Adams writes (Re: Bug#447609: ldconfig triggerisation):
 Why not just move to use of dpkg-trigger piecemeal in each postinst?
 That would involve no more invocations of ldconfig than we are already
 enduring.

Because that would involve a great deal of additional dependency
complexity: each such package would have to depend on the
triggers-capable dpkg and the triggers-interested libc.

Ian.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#447609: ldconfig triggerisation

2007-10-22 Thread Ian Jackson
Source: glibc
Version: 2.6.1-6
Severity: wishlist
Tags: patch

The attached patch triggerises the invocation of ldconfig by package
maintainer scripts.

By `triggerises' I mean that the patch arranges for ldconfig
invocations by maintainer scripts to call dpkg-trigger instead of
ldconfig.  ldconfig will be actually run out of glibc's maintainer
script during trigger processing.  The consequence is that all of the
ldconfig invocations during a dpkg run are deferred, and instead
ldconfig is run once at the end.

The understanding on which we base this approach is that after library
installation (which is when ldconfig is used in maintainer scripts) it
is always safe to defer the execution of ldconfig.  Ie, that after a
new library has been installed or an existing library upgraded,
programs which link against the library will work even though ldconfig
hasn't been run.  We understand that not running ldconfig will incur
some performance penalty during the upgrade process but in practice
this is far outweighed by the cost of repeatedly running ldconfig.

We took the approach of renaming ldconfig to ldconfig.real and
replacing it with a wrapper script.  This is unfortunately necessary
because maintainer scripts are in the habit of calling ldconfig
directly.  An alternative approach would be to change all of the
packages not to call ldconfig but instead to call a new script but
this would involve a much more complicated and lengthy transition.

The patch is safe to use with a non-triggers-supporting dpkg and in
all transitional states: where the trigger system is not properly set
up yet, ldconfig is run as normal.  Note that sid's dpkg does not yet
have the triggers patch merged but there has been extensive discussion
of the design and interfaces for triggers and the API should IMO be
considered stable.

These changes have been tested and released as part of Ubuntu 7.10 aka
`gutsy gibbon'.  The patch below is the consilidation of the results
of our testing.

So we believe that this patch can and should be safely applied to
sid's glibc straight away.

Ian.

diff --exclude='*.orig' -ruN orig/glibc-2.6.1/debian/debhelper.in/libc.postinst 
glibc-2.6.1/debian/debhelper.in/libc.postinst
--- orig/glibc-2.6.1/debian/debhelper.in/libc.postinst  2007-10-22 
15:40:11.0 +0100
+++ glibc-2.6.1/debian/debhelper.in/libc.postinst   2007-10-22 
15:38:11.0 +0100
@@ -5,6 +5,15 @@
 type=$1
 preversion=$2
 
+if [ x$type = xtriggered ]
+then
+   LDCONFIG_NOTRIGGER=y
+   export LDCONFIG_NOTRIGGER
+   echo ldconfig deferred processing now taking place
+   ldconfig
+   exit 0
+fi
+
 package_name()
 {
 echo LIBC
diff --exclude='*.orig' -ruN orig/glibc-2.6.1/debian/debhelper.in/libc.triggers 
glibc-2.6.1/debian/debhelper.in/libc.triggers
--- orig/glibc-2.6.1/debian/debhelper.in/libc.triggers  1970-01-01 
01:00:00.0 +0100
+++ glibc-2.6.1/debian/debhelper.in/libc.triggers   2007-10-22 
15:38:11.0 +0100
@@ -0,0 +1 @@
+interest ldconfig
diff --exclude='*.orig' -ruN orig/glibc-2.6.1/debian/local/ldconfig_wrap 
glibc-2.6.1/debian/local/ldconfig_wrap
--- orig/glibc-2.6.1/debian/local/ldconfig_wrap 1970-01-01 01:00:00.0 
+0100
+++ glibc-2.6.1/debian/local/ldconfig_wrap  2007-10-22 15:39:01.0 
+0100
@@ -0,0 +1,17 @@
+#!/bin/sh
+
+if  test $# = 0\
+ test x$LDCONFIG_NOTRIGGER = x \
+  test x$DPKG_MAINTSCRIPT_PACKAGE != x \
+  dpkg-trigger --check-supported 2/dev/null \
+  dpkg --compare-versions $DPKG_RUNNING_VERSION ge '1.14.5ubuntu10~~'
+then
+   if dpkg-trigger --no-await ldconfig; then
+   if test x$LDCONFIG_TRIGGER_DEBUG != x; then
+   echo ldconfig: wrapper deferring update (trigger 
activated)
+   fi
+   exit 0
+   fi  
+fi
+
+exec /sbin/ldconfig.real $@
diff --exclude='*.orig' -ruN orig/glibc-2.6.1/debian/rules.d/debhelper.mk 
glibc-2.6.1/debian/rules.d/debhelper.mk
--- orig/glibc-2.6.1/debian/rules.d/debhelper.mk2007-10-22 
15:40:11.0 +0100
+++ glibc-2.6.1/debian/rules.d/debhelper.mk 2007-10-22 15:38:11.0 
+0100
@@ -59,6 +59,13 @@
dh_install -p$(curpass) debian/bug/$(curpass) usr/share/bug; \
fi
 
+   set -ex; case $(curpass) in libc6|libc6.1) \
+   mv debian/$(curpass)/sbin/ldconfig \
+   debian/$(curpass)/sbin/ldconfig.real; \
+   install -m755 -o0 -g0 debian/local/ldconfig_wrap \
+   debian/$(curpass)/sbin/ldconfig; \
+   ;; esac
+
# extra_debhelper_pkg_install is used for debhelper.mk only.
# when you want to install extra packages, use extra_pkg_install.
$(call xx,extra_debhelper_pkg_install)
@@ -118,6 +125,11 @@

debian/$(curpass)/usr/share/lintian/overrides/$(curpass) ; \
fi
 
+   

Re: getaddrinfo() behaviour

2007-10-02 Thread Ian Jackson
Anthony Towns writes (Re: getaddrinfo() behaviour):
 The only reason suitability for release is relevant is in overriding the
 directive that we'll not make a technical decision until efforts to
 resolve it via consensus have been tried and failed. We haven't made
 efforts to get a consensus with the IETF working group and change the
 standard that all this stems from, so making a decision before that's
 happened requires further justification in my view.

That refers to efforts within Debian, not to efforts in standards
bodies.

Standards bodies generally make and implement decisions on a timescale
that makes Technical Committee decisions look frenetic and rushed.

Ian.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: getaddrinfo() behaviour

2007-10-01 Thread Ian Jackson
Anthony Towns writes (Re: getaddrinfo() behaviour):
 In my opinion, if this isn't an RC issue, there's no urgency to having
 glibc changed prior to the standards changing, and as such, this isn't
 the last resort so the tech ctte shouldn't be deciding the issue, let
 alone overruling the maintainer.

You are assuming that the documented standard[1] will change, and that
it will change in a timely manner.  As I have said before, it is not
the job of the TC (or of Debian!) to slavishly follow standards.

[1] I'll go along for the sake of argument with the proposition that
the documented standard is rule 9 for IPv4, even though that
propositionis actually false.

Standards like those in the IETF and elsewhere often allege that they
document existing practice, and when we follow some incorrect
documentation we are in fact undermining the quality of the
standards-setting process.

It is wrong of Debian to follow incorrect standards.  We should fix
brokenness straight away, not wait for a glacial standards body to
react.

Also, you suggest that it would be wrong of the TC to overrule a
maintainer for a non-RC reason.  I think that is absurd.  The TC
should overrule a maintainer whenever it is sufficiently clear that
the maintainer is wrong, and the supermajority requirement
specifically serves to ensure that the TC will only overrule in that
case.

Limiting the TC's power to overrule a technical decision to only cases
where the TC believes that the wrong behaviour makes the package
unsuitable for release would eviscerate the only mechanism we have for
dealing with errors by maintainers.

Ian.
(Added CC to glibc)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: getaddrinfo() behaviour

2007-10-01 Thread Ian Jackson
Ian Jackson writes (Re: getaddrinfo() behaviour):
 Limiting the TC's power to overrule a technical decision to only cases
 where the TC believes that the wrong behaviour makes the package
 unsuitable for release would eviscerate the only mechanism we have for
 dealing with errors by maintainers.

I should have said, for dealing with errors by maintainers which
persist after persuasion has been tried.

Ian.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: getaddrinfo: DNS round robin vs RFC3484 s6 rule 9, for etch

2007-09-28 Thread Ian Jackson
Pierre Habouzit writes (Re: getaddrinfo: DNS round robin vs RFC3484 s6 rule 9, 
for etch):
 Cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
   [EMAIL PROTECTED]
 
   Thanks to have kept glibc maintainers in the loop, that was
 considerate.

I had assumed that you'd be following the discussion on debian-ctte.
I'm sorry you hadn't.  I'll start CCing [EMAIL PROTECTED]
Will that help ?

 On Fri, Sep 28, 2007 at 03:56:31PM +, Ian Jackson wrote:
  I don't know if you've been following the argument on the TC list
  about bug #438179.  I think the Technical Committee are probably going
  to rule that sid's glibc ought to be changed so that it does not
  implement RFC3484 section 6 rule 9 prefix-length based sorting for
  IPv4.  This will restore the traditional DNS round robin.
 
   FWIW I still believe that someone in the thread had a point and that
 getaddrinfo should use an extension being e.g. AI_UNSORTED and that the
 issue should be raised to the IETF.

An extension along these lines is no good because the purpose is to
make applications which are updated (usually by upstream) to use IPv6
to continue to behave the same way they used to when they do IPv4.

Saying that we should offer an extension to getaddrinfo to have the
correct behaviour amounts to saying that all callers of getaddrinfo in
Debian should be changed.

   But such a ruling in Debian
 (disregarding Debian's internal power games) has a pretty limited scope,
 and won't fix the fact that most OSes follow Rule 9 and that people that
 use Debian on their servers will still need to use other techniques than
 DNS RR until total world domination is achieved.

Well, it will allow Debian's own ftpmasters to use DNS round robin
because nearly all of the systems which access ftp.*.debian.org are
running Debian.

I'm not sure what you mean by Debian's internal power games.  This
conversation seems to have been entirely about the correct behaviour
of glibc, and how and where to fix it, and pretty much entirely on a
technical level.  If you have some conspiracy theory or something
perhaps you'd like to discuss it under a different Subject line ?

   Oh and above anything else I find really intriguing that such a bad
 functionality (at least it seems to be a pretty grave problem given the
 length of some mails on the CT list) has slept in Debian for more than 2
 years unnoticed.

AIUI it was noticed by the ftpmasters when ftp.us.debian.org broke
when large numbers of users upgraded to etch and got the new
behaviour.

   This argument is pure crap and prevent anyone interested to post to
 the TC list. This has pissed me beyond repair on this problem, and I
 believe I wasn't the only one. IMHO, the TC isn't functional with a
 restricted mailing list. debian-release is not under the same
 censorship, and looks though pretty functional to me.

I'm sorry you don't like the way we run our mailing list but that is a
matter for us.

Ian.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: glibc's getaddrinfo() sort order

2007-09-21 Thread Ian Jackson
Anthony Towns writes (Re: glibc's getaddrinfo() sort order):
 On Thu, Sep 20, 2007 at 06:19:10PM -0700, Steve Langasek wrote:
  So do you have a use case where you think the behavior described in rule 9
  *is* desirable?
 
 Any application written assuming this behaviour, works correctly on
 Windows, Solaris, *BSD and glibc based systems in general, but not
 on Debian.

You're completely missing the point.

Applications are NOT written assuming this behaviour.

Applications are written assuming the behaviour of gethostbyname and
then later the call to gethostbyname is replaced by getaddrinfo when
the application is upgraded to support IPv6.


Let us take a concrete example: http over tcp as implemented by curl.

Originally, curl would call gethostbyname.  gethostbyname would get
its answers from the DNS, with the usual round robin.  curl would then
connect to the first address in the list.

Across the population of calling clients, curl would (just as with
other applications such as web browsers) pick the one of the available
addresses with roughly equal probability.

So that is what the DNS administrator for the site intends by the
publication of multiple address records.

Now, curl is changed to support IPv6.  These are not very intrusive
changes but one of them involves a pretty much direct replacement of
gethostbyname with getaddrinfo.

After being changed in this way curl will sort the addresses according
to rule 9.  This means for each client the address is always the same,
and which one depends on the client's idea of its own address.  Since
clients are nowhere near uniformly distributed in the address space,
this will direct the traffic quite non-uniformly.

This is not what the DNS administrator had intended and represents a
change to the behaviour.


So to recap the three possibilites I mentioned were:

] (a) It is correct that the behaviour of applications (and hence of
] hosts) should be changed to comply with rule 9.

Ie the DNS administrator was wrong, even if these DNS records were
published before the change was made or before RFC3484 was written.
I assume you're not proposing this.

] (b) Application behaviour should not change; getaddrinfo should
] behave the same way as gethostbyname.

This seems obviously correct to me.

] (c) Application behaviour should not change but getaddrinfo should
] comply with rule 9.  Applications should therefore not be changed
] to use getaddrinfo instead of gethostbyname.

And yours, which seems like a version of (c) to me:

] (d) Applications should use getaddrinfo(), and if the ordering behaviour
] it uses is not desired, they should use an ordering that is desired.

Is the ordering behaviour desired ?  Obviously not.

So you seem to be suggesting that the direct replacement of
gethostbyname with getaddrinfo is wrong in this case.

So how should curl be changed to use the desired (DNS round robin, or
equivalent) ordering ?

What is special about curl ?  I could replace curl with almost any
other application in the argument above and come to the same
conclusions.


Ian.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: glibc's getaddrinfo() sort order

2007-09-21 Thread Ian Jackson
Anthony Towns writes (Re: glibc's getaddrinfo() sort order):
 As it happens I largely agree with that. I don't agree with making a
 decision to go against an IETF standard and glibc upstream lightly,
 though, no matter how many caps Ian expends repeating that it's at the
 least mature level of Internet standard.


Firstly: the STANDARD BEHAVIOUR FOR IPV4 IS THAT IMPLEMENTED BY
GETHOSTBYNAME.  I wonder how familiar you are with Internet protocol
standardisation, and the IETF ?  The purely document-oriented and de
jure approach your taking doesn't seem to match actual Internet
practice very well. Internet standards are living documents describing
an evolving network.  It is well known that if you read the RFCs as
your only source of guideance for implementation you will go badly
wrong.  Making reference to an RFC which contradicts long-established
existing behaviour is rather beside the point.

Secondly: RFC3484 mandates that all applications should change, even
those using gethostbyname.  (You have completely ignored this point.)

Thirdly: I'm not saying we should make this decision lightly.  Saying
we shouldn't go against ... lightly is just weasel-words.  Is this
discussion [going] against ... lightly ?  No, of course not.  What
that argument would really be if you had any confidence in it would be
shouldn't go against ... at all - but of course that's absurd.


I would like to expand on this point about standards.

Slavish adherence to standards, or to the views of mistaken upstreams,
is a generally a mistake.  This is particularly the case for the
Debian Technical Committee.

The TC's job is to decide what the correct behaviour is, by
considering the technical merits.  The TC's job is not to interpret
standards documents.  (Indeed, within our jurisdiction, our job
includes changing them if we disagree with them.)

Obviously we need to use standards documents to help understand the
behaviour of the actual computing systems, to understand what is
expected of our systems and what responses other systems are likely to
produce.  If we find ourself in clear disagreement with a standard we
ought to ask ourselves whether we're sure we really understand the
situation fully.

As the implementor of a DNS resolver library, a past IETF participant,
a DNS administrator, and someone who's followed some of the IPv6
transition work, I'm convinced I have that understanding.

If you feel you don't have that understanding them please ask the
questions which would help you gain it.  I think we should be able to
answer them.


Ian.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: glibc's getaddrinfo() sort order

2007-09-18 Thread Ian Jackson
Anthony Towns writes (Re: glibc's getaddrinfo() sort order):
 I'm not familiar with how getaddrinfo() has been implemented in the
 past

I think this is an important point.  If you're not familiar with the
history then perhaps I can help explain.


hostname-to-address lookups have up to recently generally been done
with gethostbyname.

The addresses from gethostbyname are ordered as they were returned by
the nameserver (unless special configuration is made locally to
override this, which is rarely done).  When multiple addresses are
available for a single lookup, special code in all widely-deployed
nameservers arranges to rotate or round-robin the returned
addresses: each enquirer gets a new ordering.  This is so that a
single service name can be made to refer to a number of physical
network interfaces (perhaps on different hosts) and the load shared
across them.  This is known as DNS based load balancing.  If the
protocol is one like mail, where callers can be expected to try
multiple addreses if the first doesn't work, this gives you a failover
as well.

So far so good.  (For clarify, it is the above round-robin
functionality that I am arguing ought to be preserved.)


gethostbyname can theoretically support IPv6 but it can only return
one address type per call.  While there is a way to embed an IPv4
address in an IPv6 address, for circumstances like these, there is no
clear way to tell gethostbyname that the calling application (and the
rest of the stack on which the application relies) will cope with
getting a pile of AF_INET6 back rather than AF_INET.

Therefore for IPv6, a new interface was needed.  The interface
(defined in RFC3493 s6.1 and its predecessors) is getaddrinfo.  It has
several new features most of which aren't relevant here.  The
critical new feature is this:

getaddrinfo allows the application to specify whether it wants to get
only IPv4 addresses or IPv6 addresses as well, and if getting mixed
addresses, whether to encode then as AF_INET or as `v6-mapped'
AF_INET6 (ie, the 32 bits of IPv4 address padded with a specific
prefix to make up an IPv6 address, where the prefix means no actually
this is not an IPv6 address but an IPv4 address and should be used
with IPv4).

Combined with various other new facilities, this makes it reasonably
straightforward to convert an IPv4-only application to be
IPv6-capable.

So, in summary: getaddrinfo is intended to replace gethostbyname.


However, additionally, it was realised that if getaddrinfo can return
a mixture of IPv4 and v6 addresses it was necessary to specify in what
order they ought to be returned.

When RFC3484 was written its authors evidently felt that the best way
to do this was to define a comparison function over all addresses,
which would define which address was to be preferred.

Heedless of the effect on the DNS round-robin functionality I describe
above, the authors of RFC3484 specified (s6 rule 9) that all addresses
should be sorted by proximity to the host making the choice - where
proximity is defined as the length of the common initial address
prefix.

This may have been a disputed but arguable definition of real network
proximity for IPv6 in at the time 3484 was written.  But it is clear
now that it is not such a measure in the real IPv6 internet, and it
has never been such a measure in the IPv4 internet.

So RFC3484 s6 rule 9 is just wrong, because the reasons behind it do
not apply any more if they ever did.


However, it's worse than that: rule 9 is trying to change the
behaviour of existing systems.  If we agree with rule 9 it ought to
apply just as well to applications using gethostbyname.

All existing applications using gethostbyname are not in compliance
with rule 9.  It would perhaps be possible to modify gethostbyname to
sort addresses according to RFC3484 s5 and s6.

But would it be a good idea ?  No, obviously not.  It would change the
behaviour of all of the applications which currently use
gethostbyname.

Currently such applications pick addresses at random (according to
the DNS round robin).  Rule 9 would have applications pick them
according to longest-common-prefix.  This would destroy the DNS based
load balancing arrangements.


What about getaddrinfo ?  Well, there is no reason why a change in API
(to add additional richness needed for new functionality) should so
radically change the behaviour.

And indeed, we see that indeed the DNS load balancing of our own
servers has been broken by this change !

That is, applications are changed from using non-rule-9 gethostbyname
to rule-9 getaddrinfo, and the servers experience wildly unbalanced
load and break.



 The RFC tries to make getaddrinfo return a predictable ordering in the
 face of random orderings from DNS. That seems a perfectly reasonable
 way to define a function in the abstract; though certainly the ordering
 it comes up with can be criticised.

It is not reasonable for the RFC to attempt to specify that the
addresses be returned in a predictable 

Re: glibc's getaddrinfo() sort order

2007-09-18 Thread Ian Jackson
Anthony Towns writes (Re: glibc's getaddrinfo() sort order):
 So if getaddrinfo() has always behaved in this way, I don't see a great
 deal of justification in changing it. The bug log indicated that there
 were pre-rfc implementations of getaddrinfo() that behaved more like
 gethostbyname() at least wrt round-robin DNS; but I've got no way of
 verifying that.

I don't know whether or not there were previous versions of
getaddrinfo with the same behaviour as gethostbyname, but that is the
wrong way of looking at it.  getaddrinfo wasn't in widespread use
until the recent efforts to support IPv6.

Did you miss the bits where I said that
 * getaddrinfo is supposed to replace gethostbyname
 * applications are being changed t call getaddrinfo instead of
   gethostbyname
?

There are only three possibilities:

(a) It is correct that the behaviour of applications (and hence of
hosts) should be changed to comply with rule 9.
(b) Application behaviour should not change; getaddrinfo should
behave the same way as gethostbyname.
(c) Application behaviour should not change but getaddrinfo should
comply with rule 9.  Applications should therefore not be changed
to use getaddrinfo instead of gethostbyname.

Which of these are you proposing ?  RFC3484 says (a) but is wrong for
the reasons I have explained.  (b) is my view.   (c) is obviously
unreasonable.

Anthony Towns writes (Re: glibc's getaddrinfo() sort order):
  All existing applications using gethostbyname are not in compliance
  with rule 9.  
 
 The RFC specifies the behaviour of getaddrinfo(), not gethostbyname(),

Nonsense.  It doesn't specify the behaviour of any such API at all.

RFCs like this one specify the behaviour of _hosts_.  That is, it
specifies what kind of packets the host should emit and accept, on
what interfaces.

There is nothing in RFC3484 that limits its application to getaddrinfo
rather than gethostbyname.

There is discussion in s8 which suggests some possible behaviours of
getaddrinfo as an `implementation strategy' for RFC3484 - but note
that our getaddrinfo doesn't do what s8 suggests (because s8 is
barking mad).  If you agree with RFC3484 s8 then you ought to conclude
that similar changes ought to be made to other internal interfaces
which do the same job as getaddrinfo.

 so doesn't affect any apps that solely use gethostbyname(). So no, it
 shouldn't be applied to other functions anymore than the definition of
 tm_year should mean we count from 1900 in every year related function.

This business about tm_year is a complete red herring.

In fact, you've got my argument completely backwards.

If someone wrote in a standards document that tm_year should be zero
at 0AD (whatever that means) rather than 1900AD, what should we do ?

Well, the answer would be obvious: we should continue to do what we
have done forever, so as not to change the meaning of existing
infrastructure: zero at 1900AD.

This is what RFC3484 s6 is doing.  It is trying to change the meaning
of existing deployments of multiple IPv4 addresses in the global DNS.

 I think we can safely say that Rule 9 isn't useful for IPv4 addresses.

Are you happy then that we should mandate that the Debian libc
maintainer should change our libc accordingly ?

 I'm not sure that's true or not for IPv6 addresses -- it certainly seems
 an inappropriately hierarchial way of viewing a network that's connected
 much more ... fluidly than that, at any rate. But even if Rule 9 is
 completely useless and counterproductive, it's still the standard for
 that function, which, afaics, we should be meeting.

It is NOT THE STANDARD as I have previously pointed out.

An IETF working group proposed that it ought to become the standard
but 1. the standard has not advanced further 2. that was in a time
when IPv6 addressing structure was understood very differently.

To justify my point 2, that RFC3484 predates substantial changes in
the IPv6 addressing architecture:

Site-local addresses are one of the key features that motivates the
rules in RFC3484.  These were deprecated by RFC3879 (status: PROPOSED)
and this was confirmed in RFC4291 (status: DRAFT).

(The standards track goes PROPOSED - DRAFT - STANDARD.)

DNS for IPv6 was originally intended to be supported with A6, DNAME
and bitstring labels according to RFC2874.  This was originally
Standards Track and was designed to support rapid and continuous
renumbering.  With the publication of RFC3363 (s1.1) and supported by
the arguments in RFC3364, 2874 was moved to EXPERIMENTAL (ie, off the
Standards Track), because rapid and continuous renumbering is no
longer planned.

Ie, the addressing and numbering arrangements for IPv6 have changed
significantly since 3484 was written.  That could well be why 3484
hasn't progressed.

  What about getaddrinfo ?  Well, there is no reason why a change in API
  (to add additional richness needed for new functionality) should so
  radically change the behaviour.
 
 Agreed in principle, but this is a rule 

Re: glibc's getaddrinfo() sort order

2007-09-12 Thread Ian Jackson
Anthony Towns writes (Re: glibc's getaddrinfo() sort order):
 On Fri, Sep 07, 2007 at 01:06:06AM +0200, Kurt Roeckx wrote:
  It's atleast in the spirit of the rfc to prefer one that's on the local
  network.  It might be the intention of rule 9, but then rule 9 isn't
  very well written.
 
 Rule 9 seems perfectly well written, it just does something you
 (reasonably) consider undesirable.

Should I take that as agreement with Steve's and my view, that we
should by default not apply rule 9 to IPv4 ?  Your opinion seems
unclear to me.

We haven't heard from the rest of the committee.

Does anyone have an answer to my point that application of rule 9
changes the long-established meaning of existing DNS data ?  (In ways,
I would add, which have proven to cause significant operational
problems in practice.)  As I say, I think that point is unanswerable
and leads inevitably to the conclusion that we should disable this
behaviour by default.


The rest of your (AJ's) mail seems to be getting bogged down a bit.
I'll try to answer what I see as the key aspects.

 In addition, I think there's two different aspects here: the first is
 should getaddrinfo() return results in random order to aid in load
 distribution? and the second is is prefix matching a reasonable way
 to determine a good host to use?

I disagree with your answer to that first question.  gethostbyname
returns results in random order.  getaddrinfo should do the same.
(random isn't quite true but it's true enough in the usual case.)

 AFAICS, the answer to the first question is simply no, it shouldn't --
 randomised load balancing like that needs to be done at the application
 level,

You are mistaken.  Randomised load balancing like that is _already
done_ using multiple IPv4 addresses in the DNS.  It has been done this
way for nearly two decades.

 [stuff]
 Doing it by changing Rule 9 to:

I don't think this kind of complexity is warranted here.  Even if it
were, you seem to be proposing a strategy which depends on guessing
whether communication with a particular destination address would
involve NAT, which would be fragile.

Ian.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: glibc's getaddrinfo() sort order

2007-09-07 Thread Ian Jackson
Kurt Roeckx writes (Re: glibc's getaddrinfo() sort order):
 It's atleast in the spirit of the rfc to prefer one that's on the local
 network.  It might be the intention of rule 9, but then rule 9 isn't
 very well written.

I agree that applying RFC3484 section 6 rule 9 to IPv4 addresses is a
mistake and that therefore we should change the default in Debian
accordingly.  I would encourage Kurt to take this matter up with the
relevant IETF working group.


Others have already written about problems involving NAT.  I agree
with this argument (although I don't approve of NAT and it galls me to
use some braindamage involving NAT as an argument for anything).

However there is another argument I would like to make:

A host using getaddrinfo configured to apply rule 9 to IPv4 addresses
will behave quite differently to a host using gethostbyname.  I think
that this change in behaviour is unwarranted.  Whether an application
uses gethostbyname or getaddrinfo is an implementation detail (related
closely to whether that particular application's source code has been
modified to try to support IPv6) and this should not change the
behaviour.

Presently when connecting to a service offering only IPv4 addresses,
most hosts will use gethostbyname and use the addresses offered in
round-robin DNS order.  That is to say, the meaning (pre-RFC3484, and
current de-facto) of a DNS RRset containing several IP addresses is
that the addresses should be tried `uniformly at random' by callers,
as done by the nameserver round-robin RRset rotation algorithm.

RFC3484 section 6 rule 9 applied to IPv4 appears to be an attempt to
change that meaning.  This interpretation of rule 9 for IPv4 as an
attempt to change the meaning of existing deployed DNS RRsets is
supported by the fact that proponents of rule 9 for IPv4 claim that it
will fix existing problems, as in
http://udrepper.livejournal.com/16116.html.

However, it is obviously wrongheaded to attempt to change the defined
meaning of all existing multi-record A RRsets.  On the existing
Internet, zone administrators use multi-record A RRsets in the
knowledge that those RRsets will be used by callers in an
evenly-distributed round-robin fashion as currently implemented by
bind and gethostbyname.

This meaning for multiple A records had been established for well over
a decade by the time 3848 was written and in the intervening years it
has continued to be dominant.  New systems, and systems newly modified
to support IPv6, should continue to interpret existing A RRsets in the
same way as before.

A few cursory web searches show that this new behaviour of getaddrinfo
is indeed causing trouble as applications are converted to IPv6 and
the change in behaviour with IPv4 is found to be undesirable.


Finally, I would like to preemptively address the line but this is an
RFC and we must do what it says.  There are two responses:

The most obvious one is that RFC3484 is merely Proposed Standard.  At
this stage of the standardisation process one can expect to find
errors, mistaken deviations from existing practice, and so on.
(The IETF standardisation process has been broken so that documents
often get stuck in this state; but that doesn't mean that we should
treat draft documents as if they were gospel, let alone documents that
aren't even drafts.)

The second is a more general point: if a standards document tells us
to do something which is wrong, then we should not do it.  Obviously
we should think fairly hard before making the decision to go against a
standard, but our job is to do the right thing and standards documents
are there to help us not to constrain us.  I think my argument above
about the existing meaning of multiple A records is irrefutable.


 I already suggested that maybe rule 9 should be limited to the common
 prefix length of the netmask you're using.  An other option is that you
 extend rule 2 to have the same behaviour with ipv4, and that 10/8,
 172.16/12 and 192.168/16 should be considered organization-local.

Replacing rule 9 with something more limited based on local network
interfaces (ie, prefer what appear to be locally-attached addresses)
would be fine.  Or a default based on routing metrics would be fine
too.  (Although I think these may be too much work to do in
getaddrinfo.)

The problem occurs when we start ranking IPv4 addresses of foreign
systems about we have no special knowledge of the topology.

Ranking RFC1918 addresses ahead of others is not entirely a safe thing
to do because people sometimes foolishly publish RFC1918 addresses for
public services and expect callers to skip those addresses somehow.
But at least it wouldn't break people who weren't already doing wrong
things.


Ian.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#438184: getpwnam and getgrnam astonishing inefficiency

2007-08-15 Thread Ian Jackson
Package: libc6
Version: 2.3.6.ds1-13

While stracing dpkg I saw something strange so I investigated.

Below is a fragment showing dpkg on sarge installing libadns1-bin.
This is from the unpack phase and convers the installation of two
files.  I see similar behaviour on etch.

18 out of the 32 system calls for installing each file are the libc
reading /etc/{passwd,group} for get{pw,gr}nam in order to map root
to 0 for the nth time.

Surely it could be at least slightly more intelligent.

Ian.

00:54:58.376667 read(7, ./usr/bin/adnsheloex\0\0\0\0\0\0\0\0\0\0\0\0..., 512) 
= 512
00:54:58.376735 open(/etc/passwd, O_RDONLY) = 8
00:54:58.376791 fcntl64(8, F_GETFD) = 0
00:54:58.376833 fcntl64(8, F_SETFD, FD_CLOEXEC) = 0
00:54:58.376876 _llseek(8, 0, [0], SEEK_CUR) = 0
00:54:58.376922 fstat64(8, {st_mode=S_IFREG|0644, st_size=5251, ...}) = 0
00:54:58.376996 mmap2(NULL, 5251, PROT_READ, MAP_SHARED, 8, 0) = 0x40022000
00:54:58.377041 _llseek(8, 5251, [5251], SEEK_SET) = 0
00:54:58.377093 munmap(0x40022000, 5251) = 0
00:54:58.377137 close(8)= 0
00:54:58.377185 open(/etc/group, O_RDONLY) = 8
00:54:58.377238 fcntl64(8, F_GETFD) = 0
00:54:58.377280 fcntl64(8, F_SETFD, FD_CLOEXEC) = 0
00:54:58.377321 _llseek(8, 0, [0], SEEK_CUR) = 0
00:54:58.377367 fstat64(8, {st_mode=S_IFREG|0644, st_size=1884, ...}) = 0
00:54:58.377440 mmap2(NULL, 1884, PROT_READ, MAP_SHARED, 8, 0) = 0x40022000
00:54:58.377487 _llseek(8, 1884, [1884], SEEK_SET) = 0
00:54:58.377538 munmap(0x40022000, 1884) = 0
00:54:58.377580 close(8)= 0
00:54:58.377635 lstat64(/usr/bin/adnsheloex, {st_mode=S_IFREG|0755, 
st_size=9720, ...}) = 0
00:54:58.377721 rmdir(/usr/bin/adnsheloex.dpkg-new) = -1 ENOENT (No such file 
or directory)
00:54:58.377829 rmdir(/usr/bin/adnsheloex.dpkg-tmp) = -1 ENOENT (No such file 
or directory)
00:54:58.377888 open(/usr/bin/adnsheloex.dpkg-new, 
O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0) = 8
00:54:58.378071 read(7, [EMAIL PROTECTED]..., 9720) = 9720
00:54:58.378324 write(8, [EMAIL PROTECTED]..., 9720) = 9720
00:54:58.381611 read(7, \0\0\0\0\0\0\0\0, 8) = 8
00:54:58.381688 fchown32(8, 0, 0)   = 0
00:54:58.381746 fchmod(8, 0755) = 0
00:54:58.381794 close(8)= 0
00:54:58.381844 utime(/usr/bin/adnsheloex.dpkg-new, [2007/08/16-00:53:53, 
2006/10/17-17:47:03]) = 0
00:54:58.381924 link(/usr/bin/adnsheloex, /usr/bin/adnsheloex.dpkg-tmp) = 0
00:54:58.382067 rename(/usr/bin/adnsheloex.dpkg-new, /usr/bin/adnsheloex) = 0

00:54:58.382224 read(7, ./usr/bin/adnsresfilter\0\0\0\0\0\0\0\0\0..., 512) = 
512
00:54:58.382410 open(/etc/passwd, O_RDONLY) = 8
00:54:58.382478 fcntl64(8, F_GETFD) = 0
00:54:58.382521 fcntl64(8, F_SETFD, FD_CLOEXEC) = 0
00:54:58.382566 _llseek(8, 0, [0], SEEK_CUR) = 0
00:54:58.382614 fstat64(8, {st_mode=S_IFREG|0644, st_size=5251, ...}) = 0
00:54:58.382689 mmap2(NULL, 5251, PROT_READ, MAP_SHARED, 8, 0) = 0x40022000
00:54:58.382739 _llseek(8, 5251, [5251], SEEK_SET) = 0
00:54:58.382853 munmap(0x40022000, 5251) = 0
00:54:58.382900 close(8)= 0
00:54:58.382952 open(/etc/group, O_RDONLY) = 8
00:54:58.383006 fcntl64(8, F_GETFD) = 0
00:54:58.383048 fcntl64(8, F_SETFD, FD_CLOEXEC) = 0
00:54:58.383092 _llseek(8, 0, [0], SEEK_CUR) = 0
00:54:58.383138 fstat64(8, {st_mode=S_IFREG|0644, st_size=1884, ...}) = 0
00:54:58.383211 mmap2(NULL, 1884, PROT_READ, MAP_SHARED, 8, 0) = 0x40022000
00:54:58.383257 _llseek(8, 1884, [1884], SEEK_SET) = 0
00:54:58.383307 munmap(0x40022000, 1884) = 0
00:54:58.383348 close(8)= 0
00:54:58.383404 lstat64(/usr/bin/adnsresfilter, {st_mode=S_IFREG|0755, 
st_size=11528, ...}) = 0
00:54:58.383492 rmdir(/usr/bin/adnsresfilter.dpkg-new) = -1 ENOENT (No such 
file or directory)
00:54:58.383598 rmdir(/usr/bin/adnsresfilter.dpkg-tmp) = -1 ENOENT (No such 
file or directory)
00:54:58.383659 open(/usr/bin/adnsresfilter.dpkg-new, 
O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0) = 8
00:54:58.383826 read(7, 
\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0\3\0\1\0\0\0\0\213\4..., 11528) = 11528
00:54:58.384050 write(8, 
\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0\3\0\1\0\0\0\0\213\4..., 11528) = 11528
00:54:58.384206 read(7, 
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0..., 248) = 248
00:54:58.384276 fchown32(8, 0, 0)   = 0
00:54:58.384326 fchmod(8, 0755) = 0
00:54:58.384371 close(8)= 0
00:54:58.384413 utime(/usr/bin/adnsresfilter.dpkg-new, [2007/08/16-00:53:53, 
2006/10/17-17:47:03]) = 0
00:54:58.384486 link(/usr/bin/adnsresfilter, 
/usr/bin/adnsresfilter.dpkg-tmp) = 0
00:54:58.384622 rename(/usr/bin/adnsresfilter.dpkg-new, 
/usr/bin/adnsresfilter) = 0

-- 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#369940: libc6 upgrade objects to my old a.out libc in /usr/local

2006-06-02 Thread Ian Jackson
Package: libc6
Version: 2.3.2.ds1-22sarge3

 Preparing to replace libc6 2.3.2.ds1-22 (using 
.../libc6_2.3.2.ds1-22sarge3_i386.deb) ...
 These libraries were found in /usr/local/lib:
 libc.so.2
 libc.so.2.2
 libm.so.2
 libm.so.2.2

 A copy of glibc was found in an unexpected directory.
 It is not safe to upgrade the C library in this situation;
 please remove that copy of the C library and try again.
 dpkg: error processing
 debian/pool/main/g/glibc/libc6_2.3.2.ds1-22sarge3_i386.deb (--install):
  subprocess pre-installation script returned error exit status 1

...

-chiark:~ file /usr/local/lib/libc.so.2.2
/usr/local/lib/libc.so.2.2: Linux/i386 demand-paged executable (ZMAGIC), 
stripped
-chiark:~

What exactly is the problem here ?

Ian.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: timezone data packaged separately and in volatile?

2006-02-07 Thread Ian Jackson
Martijn van Oosterhout writes (Re: timezone data packaged separately and in 
volatile?):
 The requirements for getting into a stable release update are not
 black magic, they're quite well known:
 
 http://people.debian.org/~joey/3.1r1/

 2. The package fixes a critical bug which can lead into data loss,
 data corruption, or an overly broken system, or the package is broken
 or not usable (anymore).

That seems to be true in this case.  I think a system which gets the
clock wrong in this way is `overly broken'.

There doesn't seem to be anything in those rules which allows for an
analysis of the risk, so that it can be compared to the benefit.
(Perhaps that's implicit, although it's not stated.)  A timezone
update, carefully built against the right dependencies, could be
diffed (that is, the .deb could be diffed) against the old version and
carefully tested, which would provide us with confidence that the new
package is right to install.

Ian.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#28250: Ping ? (re libc lost output bug)

2002-12-28 Thread Ian Jackson
On the 10th of July I wrote:
 So, could you please apply it to the libc in unstable ?

What more needs to happen before you apply this patch ?

Ian.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Bug#28250: Ping ? (re libc lost output bug)

2002-12-28 Thread Ian Jackson
On the 10th of July I wrote:
 So, could you please apply it to the libc in unstable ?

What more needs to happen before you apply this patch ?

Ian.




Bug#12411: [PATCH] A better Directory Lister example

2002-12-28 Thread Ian Jackson
H. S. Teoh writes (Bug#12411: [PATCH] A better Directory Lister example):
...
 I have declined to address this, since this example is mainly concerned
 with using the libc directory reading functions, not with handling stdout
 errors.

I actually think it's a libc bug, #28250.

  * it prints the error message about failing to open the directory to
stdout instead of stderr;
 
 The current version of the info file uses perror(), which, as far as I
 know, print to stderr, not stdout.

I think some of the things I reported must have been fixed in the
meantime.

Thanks,
Ian.