Re: [gentoo-portage-dev] [PATCH 1/2] fetch: Use real os.walk() to avoid unicode issues with Portage
On 10/21/19 2:16 AM, Michał Górny wrote: > On Mon, 2019-10-21 at 02:10 -0700, Zac Medico wrote: >> On 10/21/19 1:43 AM, Michał Górny wrote: >>> Use real os.walk() when getting filenames for FlatLayout. Unlike >>> the wrapped Portage module, it return str output for str path parameter, >>> so we don't have to recode it back and forth. >>> >>> Signed-off-by: Michał Górny >>> --- >>> lib/portage/package/ebuild/fetch.py | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/lib/portage/package/ebuild/fetch.py >>> b/lib/portage/package/ebuild/fetch.py >>> index cedf12b19..be277f1a3 100644 >>> --- a/lib/portage/package/ebuild/fetch.py >>> +++ b/lib/portage/package/ebuild/fetch.py >>> @@ -11,6 +11,7 @@ import io >>> import itertools >>> import json >>> import logging >>> +import os as real_os >>> import random >>> import re >>> import stat >>> @@ -270,7 +271,7 @@ class FlatLayout(object): >>> return filename >>> >>> def get_filenames(self, distdir): >>> - for dirpath, dirnames, filenames in os.walk(distdir, >>> + for dirpath, dirnames, filenames in real_os.walk(distdir, >>> onerror=_raise_exc): >>> return iter(filenames) >>> >>> >> >> The real_os.walk will trigger UnicodeEncodeError if distdir can't be >> encoded with sys.getfilesystemencoding(). It's an edge case, but >> generally I prefer to handle it. >> >> We can continue to use portage.os for the os.walk call, and turn >> get_filenames into a generator method like this: >> >> for filename in filenames: >> try: >> yield portage._unicode_decode(filename, errors='strict') >> except UnicodeDecodeError: >> # Ignore it. Distfiles names must have valid UTF8 encoding. >> pass > > Since you've already written it, could you commit it? I don't wish to > have my name on the implicit module overrides hackery I don't approve > of. Done: https://gitweb.gentoo.org/proj/portage.git/commit/?id=d9855418352398013ae787bb73f70e935ec109ca I don't really like the portage.os unicode wrapper either, but I'm not aware of a good alternative to solve the pervasive UnicodeEncodeError issue that I've mentioned. -- Thanks, Zac signature.asc Description: OpenPGP digital signature
Re: [gentoo-portage-dev] [PATCH 1/2] fetch: Use real os.walk() to avoid unicode issues with Portage
On Mon, 2019-10-21 at 02:10 -0700, Zac Medico wrote: > On 10/21/19 1:43 AM, Michał Górny wrote: > > Use real os.walk() when getting filenames for FlatLayout. Unlike > > the wrapped Portage module, it return str output for str path parameter, > > so we don't have to recode it back and forth. > > > > Signed-off-by: Michał Górny > > --- > > lib/portage/package/ebuild/fetch.py | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/lib/portage/package/ebuild/fetch.py > > b/lib/portage/package/ebuild/fetch.py > > index cedf12b19..be277f1a3 100644 > > --- a/lib/portage/package/ebuild/fetch.py > > +++ b/lib/portage/package/ebuild/fetch.py > > @@ -11,6 +11,7 @@ import io > > import itertools > > import json > > import logging > > +import os as real_os > > import random > > import re > > import stat > > @@ -270,7 +271,7 @@ class FlatLayout(object): > > return filename > > > > def get_filenames(self, distdir): > > - for dirpath, dirnames, filenames in os.walk(distdir, > > + for dirpath, dirnames, filenames in real_os.walk(distdir, > > onerror=_raise_exc): > > return iter(filenames) > > > > > > The real_os.walk will trigger UnicodeEncodeError if distdir can't be > encoded with sys.getfilesystemencoding(). It's an edge case, but > generally I prefer to handle it. > > We can continue to use portage.os for the os.walk call, and turn > get_filenames into a generator method like this: > > for filename in filenames: > try: > yield portage._unicode_decode(filename, errors='strict') > except UnicodeDecodeError: > # Ignore it. Distfiles names must have valid UTF8 encoding. > pass Since you've already written it, could you commit it? I don't wish to have my name on the implicit module overrides hackery I don't approve of. -- Best regards, Michał Górny signature.asc Description: This is a digitally signed message part
Re: [gentoo-portage-dev] [PATCH 1/2] fetch: Use real os.walk() to avoid unicode issues with Portage
On 10/21/19 1:43 AM, Michał Górny wrote: > Use real os.walk() when getting filenames for FlatLayout. Unlike > the wrapped Portage module, it return str output for str path parameter, > so we don't have to recode it back and forth. > > Signed-off-by: Michał Górny > --- > lib/portage/package/ebuild/fetch.py | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/lib/portage/package/ebuild/fetch.py > b/lib/portage/package/ebuild/fetch.py > index cedf12b19..be277f1a3 100644 > --- a/lib/portage/package/ebuild/fetch.py > +++ b/lib/portage/package/ebuild/fetch.py > @@ -11,6 +11,7 @@ import io > import itertools > import json > import logging > +import os as real_os > import random > import re > import stat > @@ -270,7 +271,7 @@ class FlatLayout(object): > return filename > > def get_filenames(self, distdir): > - for dirpath, dirnames, filenames in os.walk(distdir, > + for dirpath, dirnames, filenames in real_os.walk(distdir, > onerror=_raise_exc): > return iter(filenames) > > The real_os.walk will trigger UnicodeEncodeError if distdir can't be encoded with sys.getfilesystemencoding(). It's an edge case, but generally I prefer to handle it. We can continue to use portage.os for the os.walk call, and turn get_filenames into a generator method like this: for filename in filenames: try: yield portage._unicode_decode(filename, errors='strict') except UnicodeDecodeError: # Ignore it. Distfiles names must have valid UTF8 encoding. pass -- Thanks, Zac signature.asc Description: OpenPGP digital signature