Bug#848508: LANG=C wormhole :/

2017-01-03 Thread Antoine Beaupré
On 2017-01-03 15:13:23, Daniel Kahn Gillmor wrote:
> On Tue 2017-01-03 14:20:43 -0500, anarcat wrote:
>> I'm happy to follow whatever upstream decides, but I'd like to point out
>> that this is not just a feature request ("non-ASCII wordlist", which can
>> be supported fine even if we go back to py2 btw), but an actual bug
>> ("fails to work").
>
> "fails to work when the user explicitly sets LANG=C on a program that
> deals with human-readable text in 2017" :)

I think this is inaccurate: users do not need to explicitely set LANG=C,
it is the default when unset. :)

>> C.UTF-8 is necessarily available on all Debian systems, let alone the
>> default. In fact, I believe the default locale, on Debian systems, is
>> still C. Having our package fail to work in that locale breaks the
>> Principle Of Least Astonishment.
>
> I don't think this is the case, but i could be wrong.  What makes you
> think that this is true?

primarly out of gut feeling: i commonly get pushback from people not
running unicode locales when i report bugs triggered by my funny name.

but also, i have a debian wheezy VM here created with vmdebootstrap (so
fairly plain). here's the locale:

root@debian:~# echo $LANG


ie. not set. i believe that a minimal Debian install will not set LANG
if you login through the console, nor if you login through
SSH. graphical environments typically set that variable, but we can't
assume the users will have a display manager to set that up.

i believe we may be conflating "having C.UTF-8 *available*" with "having
LANG set to some UTF-8 locale". it may be true that most systems will
have a UTF-8 locale available (although I question even that, given my
experience with this VM), but i am pretty certain that we can't assume
LANG will be properly set.

> the default for debian systems is to install
> a task-$LANGUAGE package based on the choice made during d-i, and
> configures a sensible localse C.UTF-8
> is always available.

in this (vm)debootstrap-built chroot here, it is not the case:
task-language is not installed, and the C.UTF-8 locale is not
configured. and even if it was, it is not necessarily set.

root@debian:~# apt-cache policy task-english
task-english:
  Installed: (none)
  Candidate: 3.14.1
  Version table:
 3.14.1 0
500 http://httpredir.debian.org/debian/ wheezy/main amd64 Packages

d-i is not the only way to install debian...

(i'd be curious to see if debirf actually sets that up correctly, btw ;)

> I do note that when LANG is completely unset, we see the same failure,
> even though C.UTF-8 is available.  In that case, i'd recommend that we
> just explicitly set LANG=C.UTF-8 (within wormhole) to work around
> python-click's idiosyncracies on py3.

i think that's not necessarily a good idea: this is *exactly* the kind
of stuff the python-click warning is there for - to avoid assuming any
sort of encoding or locale, and forcing the user to decide on it.

by setting the locale, we are basically ignoring the warning, and we
might as well just catch the exception and/or silence it (which is
possible with monkeypatching).

> But if the user deliberately sets LANG=whatever to something
> non-unicode, i don't think it's unreasonable for wormhole to decline to
> work in that environment if one of its dependencies is dependent on a
> UTF-8 locale.

as we have seen, the problem is not if a user deliberately configures a
"wrong" locale, but also when no locale is configured, which is a
surprisingly common situation.

>> I still believe the simplest fix, in the short term, is to revert back
>> packaging to Python2. We could (and should, anyways) provide both
>> python2 and python3 bindings for the magic-wormhole *libraries* and make
>> the binary use the python2 libraries until the click bug is fixed or
>> Debian defaults to a UTF-8 locale.
>
> why not just (a) fix the unset $LANG situation with a small patch, and

because that silences a real issue with python3-click that we do not
want to silence. click needs to be fixed, we shouldn't hide potential
errors like this.

what if the user is running under a latin1 locale that just happens to
work because it's an extension of ASCII? before you tell me how wrong
that sounds, consider that i have done exactly that for about a decade,
over various operating systems...

> (b) tag the python-click bug as "affects: magic-wormhole" and leave it
> as is?

that sounds like a good idea in any case...

my bottom line on this bug is that wormhole is a file transfer program
that doesn't, a priori, have to specifically deal with locale
problems. garbage in, garbage out. i agree that if someone has the wrong
locale and/or passes corrupt data to wormhole, it should bail out
preemptively.

but in this case, there is a legitimate use case where no locale is
configured, or, actually, the C locale is configured (by default) and
only ASCII data is passed. we shouldn't bail out in that specific case
and i don't know of anything special in wormhole that should make 

Bug#848508: LANG=C wormhole :/

2017-01-03 Thread Daniel Kahn Gillmor
On Tue 2017-01-03 14:20:43 -0500, anarcat wrote:
> I'm happy to follow whatever upstream decides, but I'd like to point out
> that this is not just a feature request ("non-ASCII wordlist", which can
> be supported fine even if we go back to py2 btw), but an actual bug
> ("fails to work").

"fails to work when the user explicitly sets LANG=C on a program that
deals with human-readable text in 2017" :)

> C.UTF-8 is necessarily available on all Debian systems, let alone the
> default. In fact, I believe the default locale, on Debian systems, is
> still C. Having our package fail to work in that locale breaks the
> Principle Of Least Astonishment.

I don't think this is the case, but i could be wrong.  What makes you
think that this is true?  the default for debian systems is to install
a task-$LANGUAGE package based on the choice made during d-i, and
configures a sensible localse C.UTF-8
is always available.

I do note that when LANG is completely unset, we see the same failure,
even though C.UTF-8 is available.  In that case, i'd recommend that we
just explicitly set LANG=C.UTF-8 (within wormhole) to work around
python-click's idiosyncracies on py3.

But if the user deliberately sets LANG=whatever to something
non-unicode, i don't think it's unreasonable for wormhole to decline to
work in that environment if one of its dependencies is dependent on a
UTF-8 locale.

> I still believe the simplest fix, in the short term, is to revert back
> packaging to Python2. We could (and should, anyways) provide both
> python2 and python3 bindings for the magic-wormhole *libraries* and make
> the binary use the python2 libraries until the click bug is fixed or
> Debian defaults to a UTF-8 locale.

why not just (a) fix the unset $LANG situation with a small patch, and
(b) tag the python-click bug as "affects: magic-wormhole" and leave it
as is?

  --dkg


signature.asc
Description: PGP signature


Bug#848508: LANG=C wormhole :/

2017-01-03 Thread anarcat
On Tue, Jan 03, 2017 at 02:03:44PM -0500, Daniel Kahn Gillmor wrote:
> Control: forwarded 848508 https://github.com/warner/magic-wormhole/issues/127
> 
> It'd be really nice for wormhole to stay on python 3 -- i would like to
> be able to run a system free of python2 in the near future, and i'd also
> like to be able to have wormhole available.

I'd like to have Debian free of Python2 as well, but it is not going to
happen in stretch, not by a long shot.

> I've forwarded the debian bug report upstream to see whether Brian has
> any suggested resolution.

Great, thanks!

> but I note that once we're talking about wormhole using non-ASCII
> wordlists (see https://github.com/warner/magic-wormhole/issues/26),
> "LANG=C wormhole receive" is going to be a buggy invocation no matter
> what anyway.

I'm happy to follow whatever upstream decides, but I'd like to point out
that this is not just a feature request ("non-ASCII wordlist", which can
be supported fine even if we go back to py2 btw), but an actual bug
("fails to work").

> I'm inclined to just say "don't do that" on debian systems, where we
> expect C.UTF-8 to be available anyway.

C.UTF-8 is necessarily available on all Debian systems, let alone the
default. In fact, I believe the default locale, on Debian systems, is
still C. Having our package fail to work in that locale breaks the
Principle Of Least Astonishment.

I still believe the simplest fix, in the short term, is to revert back
packaging to Python2. We could (and should, anyways) provide both
python2 and python3 bindings for the magic-wormhole *libraries* and make
the binary use the python2 libraries until the click bug is fixed or
Debian defaults to a UTF-8 locale.

Until then, we are, by default, not working at all unless the user
does an extra configuration. I think this is an unacceptable situation
and a bug that should be fixed.

Failing that, someone should mark this bug as unfixed and just close
this issue. I certainly wouldn't do that myself as I believe this is a
bug that we should work around until Click does the right thing.

Not everyone is a unicode geek like we are. ;)

A.

-- 
I worry about my child and the Internet all the time, even though
she's too young to have logged on yet. Here's what I worry about. I
worry that 10 or 15 years from now, she will come to me and say
'Daddy, where were you when they took freedom of the press away from
the Internet?'   - Mike Godwin, Electronic Frontier Foundation


signature.asc
Description: Digital signature


Bug#848508: LANG=C wormhole :/

2017-01-03 Thread Daniel Kahn Gillmor
Control: forwarded 848508 https://github.com/warner/magic-wormhole/issues/127

It'd be really nice for wormhole to stay on python 3 -- i would like to
be able to run a system free of python2 in the near future, and i'd also
like to be able to have wormhole available.

I've forwarded the debian bug report upstream to see whether Brian has
any suggested resolution.

but I note that once we're talking about wormhole using non-ASCII
wordlists (see https://github.com/warner/magic-wormhole/issues/26),
"LANG=C wormhole receive" is going to be a buggy invocation no matter
what anyway.

I'm inclined to just say "don't do that" on debian systems, where we
expect C.UTF-8 to be available anyway.

--dkg


signature.asc
Description: PGP signature