Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
On Fri, 27 May 2016 10:40:46 -0400 "Anthony G. Basile" wrote: > On 5/23/16 10:25 AM, Michał Górny wrote: > > On Mon, 23 May 2016 08:08:18 -0400 > > "Anthony G. Basile" wrote: > > > >> On 5/23/16 2:44 AM, Michał Górny wrote: > >>> On Sun, 22 May 2016 13:04:40 -0400 > >>> "Anthony G. Basile" wrote: > >>> > > > > 1. I think old versions of Python did not support named properties > > in sys.version_info, back when Portage was written. > > I didn't need this because I found _unicode_decode() which does what I > want. Thanks for the clue. BTW, why are those functions/classes in > pym/portage/__init__.py? All that code in there is just cluttering > __init__.py. Shouldn't that stuff be pulled into a separate file and > imported cleanly? > Yes, there is generally far too much code in many of the __init__.py's. There are many with over 1K LOC -- Brian Dolbec
Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
On 5/23/16 10:25 AM, Michał Górny wrote: > On Mon, 23 May 2016 08:08:18 -0400 > "Anthony G. Basile" wrote: > >> On 5/23/16 2:44 AM, Michał Górny wrote: >>> On Sun, 22 May 2016 13:04:40 -0400 >>> "Anthony G. Basile" wrote: >>> From: "Anthony G. Basile" The current method to check for a sane system locale is to use python's ctypes.util.find_library() to construct a full library path to the system libc.so and pass that path to ctypes.CDLL() so we can call toupper() and tolower() directly. However, this gets bogged down in implementation details and fails with musl. We work around this design flaw in ctypes with a small python module written in C which provides thin wrappers to toupper() and tolower(), and only fall back on the current ctypes-based check when this module is not available. This has been tested on glibc, uClibc and musl systems. X-Gentoo-bug: 571444 X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444 Signed-off-by: Anthony G. Basile --- pym/portage/util/locale.py | 32 ++- setup.py | 6 ++- src/portage_c_convert_case.c | 94 3 files changed, 121 insertions(+), 11 deletions(-) create mode 100644 src/portage_c_convert_case.c diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py index 2a15ea1..85ddd2b 100644 --- a/pym/portage/util/locale.py +++ b/pym/portage/util/locale.py @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals import locale import logging import os +import sys import textwrap import traceback @@ -34,18 +35,26 @@ def _check_locale(silent): """ The inner locale check function. """ - - libc_fn = find_library("c") - if libc_fn is None: - return None - libc = LoadLibrary(libc_fn) - if libc is None: - return None + try: + from portage_c_convert_case import _c_toupper, _c_tolower + libc_tolower = _c_tolower + libc_toupper = _c_toupper >>> >>> Now I'm being picky... but if you named the functions toupper() >>> and tolower(), you could actually import the whole module as 'libc' >>> and have less code! >> >> I see what you're saying, and its tempting because its elegant, but I'm >> afraid of a clash of names. I've got a bad feeling this will get us >> into trouble later. >> >> Let me play with this and see what happens. > > I don't think this will be problematic since things like this happen > in Python all the time ;-). And after all, C function names can be > different than Python function names. It works fine so my last set of patches adopts this approach. > >>> Also it would be nice to actually make the module more generic. There >>> are more places where we use CDLL, and all of them could eventually be >>> supported by the module (unshare() would be much better done in C, for >>> example). >> >> Yeah I get your point here. Let me convince myself first. > > I've got a killer argument: right now we hardcode constants from Linux > headers in the Python code! > > Not that I'm asking you to actually add code for that as well. Just > rename the module to something more generic like portage.util.libc ;-). Well you might as well point me in this direction since I'm working on this now. > + except ImportError: + writemsg_level("!!! Unable to import portage_c_convert_case\n!!!\n", + level=logging.WARNING, noiselevel=-1) >>> >>> Do we really want to warn verbosely about this? I think it'd be >>> a pretty common case for people running the git checkout. >> >> This should stay. Its good to know that the module is not being >> imported and silently falling back on the ctypes stuff. >> >> 1) its only going to happen in the rare occasion that you're using >> something like a turkish locale and can't import the module. > > Wrong. This happens before the check is done, so it will be output > every time Portage is started, also with good locale. Right I dropped it. > >> 2) people who do a git checkout should add >> PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module. >> I can add something to testpath. Users will have to be instructed to >> run `./setup build` and then the script shoudl read something like this >> >> unamem=$(uname -m) >> >> pythonversion=$(python --version 2>&1 | cut -c8-) >> pythonversion=${pythonversion%\.*} >> >> portagedir=$(dirname ${BASH_SOURCE[0]}) >> >> export PATH="${portagedir}/bin:${PATH}" >> >> export >> PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}" >> >> export PYTHONWARNINGS=d,i::ImportWarning >> >> >> BTW, the original code must have a bug in i
Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
On Mon, 23 May 2016 08:08:18 -0400 "Anthony G. Basile" wrote: > On 5/23/16 2:44 AM, Michał Górny wrote: > > On Sun, 22 May 2016 13:04:40 -0400 > > "Anthony G. Basile" wrote: > > > >> From: "Anthony G. Basile" > >> > >> The current method to check for a sane system locale is to use python's > >> ctypes.util.find_library() to construct a full library path to the > >> system libc.so and pass that path to ctypes.CDLL() so we can call > >> toupper() and tolower() directly. However, this gets bogged down in > >> implementation details and fails with musl. > >> > >> We work around this design flaw in ctypes with a small python module > >> written in C which provides thin wrappers to toupper() and tolower(), > >> and only fall back on the current ctypes-based check when this module > >> is not available. > >> > >> This has been tested on glibc, uClibc and musl systems. > >> > >> X-Gentoo-bug: 571444 > >> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444 > >> > >> Signed-off-by: Anthony G. Basile > >> --- > >> pym/portage/util/locale.py | 32 ++- > >> setup.py | 6 ++- > >> src/portage_c_convert_case.c | 94 > >> > >> 3 files changed, 121 insertions(+), 11 deletions(-) > >> create mode 100644 src/portage_c_convert_case.c > >> > >> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py > >> index 2a15ea1..85ddd2b 100644 > >> --- a/pym/portage/util/locale.py > >> +++ b/pym/portage/util/locale.py > >> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals > >> import locale > >> import logging > >> import os > >> +import sys > >> import textwrap > >> import traceback > >> > >> @@ -34,18 +35,26 @@ def _check_locale(silent): > >>""" > >>The inner locale check function. > >>""" > >> - > >> - libc_fn = find_library("c") > >> - if libc_fn is None: > >> - return None > >> - libc = LoadLibrary(libc_fn) > >> - if libc is None: > >> - return None > >> + try: > >> + from portage_c_convert_case import _c_toupper, _c_tolower > >> + libc_tolower = _c_tolower > >> + libc_toupper = _c_toupper > > > > Now I'm being picky... but if you named the functions toupper() > > and tolower(), you could actually import the whole module as 'libc' > > and have less code! > > I see what you're saying, and its tempting because its elegant, but I'm > afraid of a clash of names. I've got a bad feeling this will get us > into trouble later. > > Let me play with this and see what happens. I don't think this will be problematic since things like this happen in Python all the time ;-). And after all, C function names can be different than Python function names. > > Also it would be nice to actually make the module more generic. There > > are more places where we use CDLL, and all of them could eventually be > > supported by the module (unshare() would be much better done in C, for > > example). > > Yeah I get your point here. Let me convince myself first. I've got a killer argument: right now we hardcode constants from Linux headers in the Python code! Not that I'm asking you to actually add code for that as well. Just rename the module to something more generic like portage.util.libc ;-). > >> + except ImportError: > >> + writemsg_level("!!! Unable to import > >> portage_c_convert_case\n!!!\n", > >> + level=logging.WARNING, noiselevel=-1) > > > > Do we really want to warn verbosely about this? I think it'd be > > a pretty common case for people running the git checkout. > > This should stay. Its good to know that the module is not being > imported and silently falling back on the ctypes stuff. > > 1) its only going to happen in the rare occasion that you're using > something like a turkish locale and can't import the module. Wrong. This happens before the check is done, so it will be output every time Portage is started, also with good locale. > 2) people who do a git checkout should add > PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module. > I can add something to testpath. Users will have to be instructed to > run `./setup build` and then the script shoudl read something like this > > unamem=$(uname -m) > > pythonversion=$(python --version 2>&1 | cut -c8-) > pythonversion=${pythonversion%\.*} > > portagedir=$(dirname ${BASH_SOURCE[0]}) > > export PATH="${portagedir}/bin:${PATH}" > > export > PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}" > > export PYTHONWARNINGS=d,i::ImportWarning > > > BTW, the original code must have a bug in it. It reads > > export PYTHONPATH=PYTHONPATH="$(dirname > $BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}" > > The double PYTHONPATH=PYTHONPATH= can't be right. You are probably right. However: 1. Since bin/ scripts are setting PYTHONPATH appropriately, you sho
Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
On 5/23/16 2:44 AM, Michał Górny wrote: > On Sun, 22 May 2016 13:04:40 -0400 > "Anthony G. Basile" wrote: > >> From: "Anthony G. Basile" >> >> The current method to check for a sane system locale is to use python's >> ctypes.util.find_library() to construct a full library path to the >> system libc.so and pass that path to ctypes.CDLL() so we can call >> toupper() and tolower() directly. However, this gets bogged down in >> implementation details and fails with musl. >> >> We work around this design flaw in ctypes with a small python module >> written in C which provides thin wrappers to toupper() and tolower(), >> and only fall back on the current ctypes-based check when this module >> is not available. >> >> This has been tested on glibc, uClibc and musl systems. >> >> X-Gentoo-bug: 571444 >> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444 >> >> Signed-off-by: Anthony G. Basile >> --- >> pym/portage/util/locale.py | 32 ++- >> setup.py | 6 ++- >> src/portage_c_convert_case.c | 94 >> >> 3 files changed, 121 insertions(+), 11 deletions(-) >> create mode 100644 src/portage_c_convert_case.c >> >> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py >> index 2a15ea1..85ddd2b 100644 >> --- a/pym/portage/util/locale.py >> +++ b/pym/portage/util/locale.py >> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals >> import locale >> import logging >> import os >> +import sys >> import textwrap >> import traceback >> >> @@ -34,18 +35,26 @@ def _check_locale(silent): >> """ >> The inner locale check function. >> """ >> - >> -libc_fn = find_library("c") >> -if libc_fn is None: >> -return None >> -libc = LoadLibrary(libc_fn) >> -if libc is None: >> -return None >> +try: >> +from portage_c_convert_case import _c_toupper, _c_tolower >> +libc_tolower = _c_tolower >> +libc_toupper = _c_toupper > > Now I'm being picky... but if you named the functions toupper() > and tolower(), you could actually import the whole module as 'libc' > and have less code! I see what you're saying, and its tempting because its elegant, but I'm afraid of a clash of names. I've got a bad feeling this will get us into trouble later. Let me play with this and see what happens. > > Also it would be nice to actually make the module more generic. There > are more places where we use CDLL, and all of them could eventually be > supported by the module (unshare() would be much better done in C, for > example). Yeah I get your point here. Let me convince myself first. > >> +except ImportError: >> +writemsg_level("!!! Unable to import >> portage_c_convert_case\n!!!\n", >> +level=logging.WARNING, noiselevel=-1) > > Do we really want to warn verbosely about this? I think it'd be > a pretty common case for people running the git checkout. This should stay. Its good to know that the module is not being imported and silently falling back on the ctypes stuff. 1) its only going to happen in the rare occasion that you're using something like a turkish locale and can't import the module. 2) people who do a git checkout should add PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module. I can add something to testpath. Users will have to be instructed to run `./setup build` and then the script shoudl read something like this unamem=$(uname -m) pythonversion=$(python --version 2>&1 | cut -c8-) pythonversion=${pythonversion%\.*} portagedir=$(dirname ${BASH_SOURCE[0]}) export PATH="${portagedir}/bin:${PATH}" export PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}" export PYTHONWARNINGS=d,i::ImportWarning BTW, the original code must have a bug in it. It reads export PYTHONPATH=PYTHONPATH="$(dirname $BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}" The double PYTHONPATH=PYTHONPATH= can't be right. > >> +libc_fn = find_library("c") >> +if libc_fn is None: >> +return None >> +libc = LoadLibrary(libc_fn) >> +if libc is None: >> +return None >> +libc_tolower = libc.tolower >> +libc_toupper = libc.toupper >> >> lc = list(range(ord('a'), ord('z')+1)) >> uc = list(range(ord('A'), ord('Z')+1)) >> -rlc = [libc.tolower(c) for c in uc] >> -ruc = [libc.toupper(c) for c in lc] >> +rlc = [libc_tolower(c) for c in uc] >> +ruc = [libc_toupper(c) for c in lc] >> >> if lc != rlc or uc != ruc: >> if silent: >> @@ -62,7 +71,10 @@ def _check_locale(silent): >> "as LC_CTYPE in make.conf.") >> msg = [l for l in textwrap.wrap(msg, 70)] >> msg.append("") >> -chars = lambda l: ''.join(chr(x) f
Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
On Sun, 22 May 2016 13:04:40 -0400 "Anthony G. Basile" wrote: > From: "Anthony G. Basile" > > The current method to check for a sane system locale is to use python's > ctypes.util.find_library() to construct a full library path to the > system libc.so and pass that path to ctypes.CDLL() so we can call > toupper() and tolower() directly. However, this gets bogged down in > implementation details and fails with musl. > > We work around this design flaw in ctypes with a small python module > written in C which provides thin wrappers to toupper() and tolower(), > and only fall back on the current ctypes-based check when this module > is not available. > > This has been tested on glibc, uClibc and musl systems. > > X-Gentoo-bug: 571444 > X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444 > > Signed-off-by: Anthony G. Basile > --- > pym/portage/util/locale.py | 32 ++- > setup.py | 6 ++- > src/portage_c_convert_case.c | 94 > > 3 files changed, 121 insertions(+), 11 deletions(-) > create mode 100644 src/portage_c_convert_case.c > > diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py > index 2a15ea1..85ddd2b 100644 > --- a/pym/portage/util/locale.py > +++ b/pym/portage/util/locale.py > @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals > import locale > import logging > import os > +import sys > import textwrap > import traceback > > @@ -34,18 +35,26 @@ def _check_locale(silent): > """ > The inner locale check function. > """ > - > - libc_fn = find_library("c") > - if libc_fn is None: > - return None > - libc = LoadLibrary(libc_fn) > - if libc is None: > - return None > + try: > + from portage_c_convert_case import _c_toupper, _c_tolower > + libc_tolower = _c_tolower > + libc_toupper = _c_toupper Now I'm being picky... but if you named the functions toupper() and tolower(), you could actually import the whole module as 'libc' and have less code! Also it would be nice to actually make the module more generic. There are more places where we use CDLL, and all of them could eventually be supported by the module (unshare() would be much better done in C, for example). > + except ImportError: > + writemsg_level("!!! Unable to import > portage_c_convert_case\n!!!\n", > + level=logging.WARNING, noiselevel=-1) Do we really want to warn verbosely about this? I think it'd be a pretty common case for people running the git checkout. > + libc_fn = find_library("c") > + if libc_fn is None: > + return None > + libc = LoadLibrary(libc_fn) > + if libc is None: > + return None > + libc_tolower = libc.tolower > + libc_toupper = libc.toupper > > lc = list(range(ord('a'), ord('z')+1)) > uc = list(range(ord('A'), ord('Z')+1)) > - rlc = [libc.tolower(c) for c in uc] > - ruc = [libc.toupper(c) for c in lc] > + rlc = [libc_tolower(c) for c in uc] > + ruc = [libc_toupper(c) for c in lc] > > if lc != rlc or uc != ruc: > if silent: > @@ -62,7 +71,10 @@ def _check_locale(silent): > "as LC_CTYPE in make.conf.") > msg = [l for l in textwrap.wrap(msg, 70)] > msg.append("") > - chars = lambda l: ''.join(chr(x) for x in l) > + if sys.version_info.major >= 3: Portage uses hexversion for comparisons. Please be consistent. > + chars = lambda l: ''.join(chr(x) for x in l) > + else: > + chars = lambda l: ''.join(chr(x).decode('utf-8', > 'replace') for x in l) This looks like an unrelated change. Was the original code buggy? If this is the case, then please fix it in a separate commit. > if uc != ruc: > msg.extend([ > " %s -> %s" % (chars(lc), chars(ruc)), > diff --git a/setup.py b/setup.py > index 25429bc..8b6b408 100755 > --- a/setup.py > +++ b/setup.py > @@ -47,7 +47,11 @@ x_scripts = { > # Dictionary custom modules written in C/C++ here. The structure is > # key = module name > # value = list of C/C++ source code, path relative to top source directory > -x_c_helpers = {} > +x_c_helpers = { > + 'portage_c_convert_case' : [ > + 'src/portage_c_convert_case.c', > + ], > +} > > class x_build(build): > """ Build command with extra build_man call. """ > diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c > new file mode 100644 > index 000..f60b0c2 > --- /dev/null > +++ b/src/portage_c_convert_case.c > @@ -0,0 +1,94 @@ > +/* Copyright 2005-2016 Gentoo Foundation > + * Distributed under the terms of the GNU General Public License v2 > + */ > + > +#