Re: [gentoo-portage-dev] [PATCH 1/2] fetch: Use real os.walk() to avoid unicode issues with Portage

2019-10-21 Thread Zac Medico
On 10/21/19 2:16 AM, Michał Górny wrote:
> On Mon, 2019-10-21 at 02:10 -0700, Zac Medico wrote:
>> On 10/21/19 1:43 AM, Michał Górny wrote:
>>> Use real os.walk() when getting filenames for FlatLayout.  Unlike
>>> the wrapped Portage module, it return str output for str path parameter,
>>> so we don't have to recode it back and forth.
>>>
>>> Signed-off-by: Michał Górny 
>>> ---
>>>  lib/portage/package/ebuild/fetch.py | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/portage/package/ebuild/fetch.py 
>>> b/lib/portage/package/ebuild/fetch.py
>>> index cedf12b19..be277f1a3 100644
>>> --- a/lib/portage/package/ebuild/fetch.py
>>> +++ b/lib/portage/package/ebuild/fetch.py
>>> @@ -11,6 +11,7 @@ import io
>>>  import itertools
>>>  import json
>>>  import logging
>>> +import os as real_os
>>>  import random
>>>  import re
>>>  import stat
>>> @@ -270,7 +271,7 @@ class FlatLayout(object):
>>> return filename
>>>  
>>> def get_filenames(self, distdir):
>>> -   for dirpath, dirnames, filenames in os.walk(distdir,
>>> +   for dirpath, dirnames, filenames in real_os.walk(distdir,
>>> onerror=_raise_exc):
>>> return iter(filenames)
>>>  
>>>
>>
>> The real_os.walk will trigger UnicodeEncodeError if distdir can't be
>> encoded with sys.getfilesystemencoding(). It's an edge case, but
>> generally I prefer to handle it.
>>
>> We can continue to use portage.os for the os.walk call, and turn
>> get_filenames into a generator method like this:
>>
>> for filename in filenames:
>> try:
>> yield portage._unicode_decode(filename, errors='strict')
>> except UnicodeDecodeError:
>> # Ignore it. Distfiles names must have valid UTF8 encoding.
>> pass
> 
> Since you've already written it, could you commit it?  I don't wish to
> have my name on the implicit module overrides hackery I don't approve
> of.

Done:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=d9855418352398013ae787bb73f70e935ec109ca

I don't really like the portage.os unicode wrapper either, but I'm not
aware of a good alternative to solve the pervasive UnicodeEncodeError
issue that I've mentioned.
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-portage-dev] [PATCH 1/2] fetch: Use real os.walk() to avoid unicode issues with Portage

2019-10-21 Thread Michał Górny
On Mon, 2019-10-21 at 02:10 -0700, Zac Medico wrote:
> On 10/21/19 1:43 AM, Michał Górny wrote:
> > Use real os.walk() when getting filenames for FlatLayout.  Unlike
> > the wrapped Portage module, it return str output for str path parameter,
> > so we don't have to recode it back and forth.
> > 
> > Signed-off-by: Michał Górny 
> > ---
> >  lib/portage/package/ebuild/fetch.py | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/lib/portage/package/ebuild/fetch.py 
> > b/lib/portage/package/ebuild/fetch.py
> > index cedf12b19..be277f1a3 100644
> > --- a/lib/portage/package/ebuild/fetch.py
> > +++ b/lib/portage/package/ebuild/fetch.py
> > @@ -11,6 +11,7 @@ import io
> >  import itertools
> >  import json
> >  import logging
> > +import os as real_os
> >  import random
> >  import re
> >  import stat
> > @@ -270,7 +271,7 @@ class FlatLayout(object):
> > return filename
> >  
> > def get_filenames(self, distdir):
> > -   for dirpath, dirnames, filenames in os.walk(distdir,
> > +   for dirpath, dirnames, filenames in real_os.walk(distdir,
> > onerror=_raise_exc):
> > return iter(filenames)
> >  
> > 
> 
> The real_os.walk will trigger UnicodeEncodeError if distdir can't be
> encoded with sys.getfilesystemencoding(). It's an edge case, but
> generally I prefer to handle it.
> 
> We can continue to use portage.os for the os.walk call, and turn
> get_filenames into a generator method like this:
> 
> for filename in filenames:
> try:
> yield portage._unicode_decode(filename, errors='strict')
> except UnicodeDecodeError:
> # Ignore it. Distfiles names must have valid UTF8 encoding.
> pass

Since you've already written it, could you commit it?  I don't wish to
have my name on the implicit module overrides hackery I don't approve
of.

-- 
Best regards,
Michał Górny



signature.asc
Description: This is a digitally signed message part


Re: [gentoo-portage-dev] [PATCH 1/2] fetch: Use real os.walk() to avoid unicode issues with Portage

2019-10-21 Thread Zac Medico
On 10/21/19 1:43 AM, Michał Górny wrote:
> Use real os.walk() when getting filenames for FlatLayout.  Unlike
> the wrapped Portage module, it return str output for str path parameter,
> so we don't have to recode it back and forth.
> 
> Signed-off-by: Michał Górny 
> ---
>  lib/portage/package/ebuild/fetch.py | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/portage/package/ebuild/fetch.py 
> b/lib/portage/package/ebuild/fetch.py
> index cedf12b19..be277f1a3 100644
> --- a/lib/portage/package/ebuild/fetch.py
> +++ b/lib/portage/package/ebuild/fetch.py
> @@ -11,6 +11,7 @@ import io
>  import itertools
>  import json
>  import logging
> +import os as real_os
>  import random
>  import re
>  import stat
> @@ -270,7 +271,7 @@ class FlatLayout(object):
>   return filename
>  
>   def get_filenames(self, distdir):
> - for dirpath, dirnames, filenames in os.walk(distdir,
> + for dirpath, dirnames, filenames in real_os.walk(distdir,
>   onerror=_raise_exc):
>   return iter(filenames)
>  
> 

The real_os.walk will trigger UnicodeEncodeError if distdir can't be
encoded with sys.getfilesystemencoding(). It's an edge case, but
generally I prefer to handle it.

We can continue to use portage.os for the os.walk call, and turn
get_filenames into a generator method like this:

for filename in filenames:
try:
yield portage._unicode_decode(filename, errors='strict')
except UnicodeDecodeError:
# Ignore it. Distfiles names must have valid UTF8 encoding.
pass
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature