Re: [gentoo-portage-dev] [PATCH 1/2] fetch: Use real os.walk() to avoid unicode issues with Portage

2019-10-21 Thread Zac Medico
On 10/21/19 2:16 AM, Michał Górny wrote:
> On Mon, 2019-10-21 at 02:10 -0700, Zac Medico wrote:
>> On 10/21/19 1:43 AM, Michał Górny wrote:
>>> Use real os.walk() when getting filenames for FlatLayout.  Unlike
>>> the wrapped Portage module, it return str output for str path parameter,
>>> so we don't have to recode it back and forth.
>>>
>>> Signed-off-by: Michał Górny 
>>> ---
>>>  lib/portage/package/ebuild/fetch.py | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/portage/package/ebuild/fetch.py 
>>> b/lib/portage/package/ebuild/fetch.py
>>> index cedf12b19..be277f1a3 100644
>>> --- a/lib/portage/package/ebuild/fetch.py
>>> +++ b/lib/portage/package/ebuild/fetch.py
>>> @@ -11,6 +11,7 @@ import io
>>>  import itertools
>>>  import json
>>>  import logging
>>> +import os as real_os
>>>  import random
>>>  import re
>>>  import stat
>>> @@ -270,7 +271,7 @@ class FlatLayout(object):
>>> return filename
>>>  
>>> def get_filenames(self, distdir):
>>> -   for dirpath, dirnames, filenames in os.walk(distdir,
>>> +   for dirpath, dirnames, filenames in real_os.walk(distdir,
>>> onerror=_raise_exc):
>>> return iter(filenames)
>>>  
>>>
>>
>> The real_os.walk will trigger UnicodeEncodeError if distdir can't be
>> encoded with sys.getfilesystemencoding(). It's an edge case, but
>> generally I prefer to handle it.
>>
>> We can continue to use portage.os for the os.walk call, and turn
>> get_filenames into a generator method like this:
>>
>> for filename in filenames:
>> try:
>> yield portage._unicode_decode(filename, errors='strict')
>> except UnicodeDecodeError:
>> # Ignore it. Distfiles names must have valid UTF8 encoding.
>> pass
> 
> Since you've already written it, could you commit it?  I don't wish to
> have my name on the implicit module overrides hackery I don't approve
> of.

Done:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=d9855418352398013ae787bb73f70e935ec109ca

I don't really like the portage.os unicode wrapper either, but I'm not
aware of a good alternative to solve the pervasive UnicodeEncodeError
issue that I've mentioned.
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-portage-dev] [PATCH 1/2] fetch: Use real os.walk() to avoid unicode issues with Portage

2019-10-21 Thread Michał Górny
On Mon, 2019-10-21 at 02:10 -0700, Zac Medico wrote:
> On 10/21/19 1:43 AM, Michał Górny wrote:
> > Use real os.walk() when getting filenames for FlatLayout.  Unlike
> > the wrapped Portage module, it return str output for str path parameter,
> > so we don't have to recode it back and forth.
> > 
> > Signed-off-by: Michał Górny 
> > ---
> >  lib/portage/package/ebuild/fetch.py | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/lib/portage/package/ebuild/fetch.py 
> > b/lib/portage/package/ebuild/fetch.py
> > index cedf12b19..be277f1a3 100644
> > --- a/lib/portage/package/ebuild/fetch.py
> > +++ b/lib/portage/package/ebuild/fetch.py
> > @@ -11,6 +11,7 @@ import io
> >  import itertools
> >  import json
> >  import logging
> > +import os as real_os
> >  import random
> >  import re
> >  import stat
> > @@ -270,7 +271,7 @@ class FlatLayout(object):
> > return filename
> >  
> > def get_filenames(self, distdir):
> > -   for dirpath, dirnames, filenames in os.walk(distdir,
> > +   for dirpath, dirnames, filenames in real_os.walk(distdir,
> > onerror=_raise_exc):
> > return iter(filenames)
> >  
> > 
> 
> The real_os.walk will trigger UnicodeEncodeError if distdir can't be
> encoded with sys.getfilesystemencoding(). It's an edge case, but
> generally I prefer to handle it.
> 
> We can continue to use portage.os for the os.walk call, and turn
> get_filenames into a generator method like this:
> 
> for filename in filenames:
> try:
> yield portage._unicode_decode(filename, errors='strict')
> except UnicodeDecodeError:
> # Ignore it. Distfiles names must have valid UTF8 encoding.
> pass

Since you've already written it, could you commit it?  I don't wish to
have my name on the implicit module overrides hackery I don't approve
of.

-- 
Best regards,
Michał Górny



signature.asc
Description: This is a digitally signed message part


Re: [gentoo-portage-dev] [PATCH 1/2] fetch: Use real os.walk() to avoid unicode issues with Portage

2019-10-21 Thread Zac Medico
On 10/21/19 1:43 AM, Michał Górny wrote:
> Use real os.walk() when getting filenames for FlatLayout.  Unlike
> the wrapped Portage module, it return str output for str path parameter,
> so we don't have to recode it back and forth.
> 
> Signed-off-by: Michał Górny 
> ---
>  lib/portage/package/ebuild/fetch.py | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/portage/package/ebuild/fetch.py 
> b/lib/portage/package/ebuild/fetch.py
> index cedf12b19..be277f1a3 100644
> --- a/lib/portage/package/ebuild/fetch.py
> +++ b/lib/portage/package/ebuild/fetch.py
> @@ -11,6 +11,7 @@ import io
>  import itertools
>  import json
>  import logging
> +import os as real_os
>  import random
>  import re
>  import stat
> @@ -270,7 +271,7 @@ class FlatLayout(object):
>   return filename
>  
>   def get_filenames(self, distdir):
> - for dirpath, dirnames, filenames in os.walk(distdir,
> + for dirpath, dirnames, filenames in real_os.walk(distdir,
>   onerror=_raise_exc):
>   return iter(filenames)
>  
> 

The real_os.walk will trigger UnicodeEncodeError if distdir can't be
encoded with sys.getfilesystemencoding(). It's an edge case, but
generally I prefer to handle it.

We can continue to use portage.os for the os.walk call, and turn
get_filenames into a generator method like this:

for filename in filenames:
try:
yield portage._unicode_decode(filename, errors='strict')
except UnicodeDecodeError:
# Ignore it. Distfiles names must have valid UTF8 encoding.
pass
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature


[gentoo-portage-dev] [PATCH 1/2] fetch: Use real os.walk() to avoid unicode issues with Portage

2019-10-21 Thread Michał Górny
Use real os.walk() when getting filenames for FlatLayout.  Unlike
the wrapped Portage module, it return str output for str path parameter,
so we don't have to recode it back and forth.

Signed-off-by: Michał Górny 
---
 lib/portage/package/ebuild/fetch.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/portage/package/ebuild/fetch.py 
b/lib/portage/package/ebuild/fetch.py
index cedf12b19..be277f1a3 100644
--- a/lib/portage/package/ebuild/fetch.py
+++ b/lib/portage/package/ebuild/fetch.py
@@ -11,6 +11,7 @@ import io
 import itertools
 import json
 import logging
+import os as real_os
 import random
 import re
 import stat
@@ -270,7 +271,7 @@ class FlatLayout(object):
return filename
 
def get_filenames(self, distdir):
-   for dirpath, dirnames, filenames in os.walk(distdir,
+   for dirpath, dirnames, filenames in real_os.walk(distdir,
onerror=_raise_exc):
return iter(filenames)
 
-- 
2.23.0