Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

2020-07-06 Thread Sam Eiderman
On Mon, Jul 6, 2020 at 2:39 PM Richard W.M. Jones  wrote:
>
> Hi Sam,
>
> I was doing some work on the Python bindings, starting with removing
> support for Python 2 since it's EOL.  I thought I would have a look at
> this patch.
>

This is great, I'm currently working on adding python3 type hints to
the auto-generated functions, due to libguestfs's nature this is
easily possible.
(However a bit tricky if we want to preserve the 79 chars per line and
correct tabbing)


> So firstly I think the last version posted is:
>
>   https://www.redhat.com/archives/libguestfs/2020-April/msg00190.html
>
> My impression of this is that we shouldn't just hack the Python
> bindings to make this apparently work.  But I wanted to ask you a few
> questions about this:
>
>  - Does the SUSE RPM output contain a mix of encodings?  Or is
>it all latin-1 or utf-8?
>

Not so sure about that, only the packages mentioned in the commit
message were the ones that failed that utf8 conversion function - Not
sure if all the other packages are latin1 encoded but succeed to be
utf-8 decoded.

>  - Is there any indication of the correct encoding from RPM?

As far as I know it must always be utf-8

>
>  - Can we not instead escape the bad sequences using whatever is the
>C-level equivalent of str.encode(..., 'backslashreplace')?
>Or I guess better, escape them as Unicode compatibility characters
>https://en.wikipedia.org/wiki/Unicode_compatibility_characters

The v2 of my patch which was reviewed by Daniel does exactly that, I used:

return PyUnicode_Decode(str, size, "utf-8", "replace");

However see Nir's comment

>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-df lists disk usage of guests without needing to install any
> software inside the virtual machine.  Supports Linux and Windows.
> http://people.redhat.com/~rjones/virt-df/
>

Thanks!

___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

2020-07-06 Thread Richard W.M. Jones
Hi Sam,

I was doing some work on the Python bindings, starting with removing
support for Python 2 since it's EOL.  I thought I would have a look at
this patch.

So firstly I think the last version posted is:

  https://www.redhat.com/archives/libguestfs/2020-April/msg00190.html

My impression of this is that we shouldn't just hack the Python
bindings to make this apparently work.  But I wanted to ask you a few
questions about this:

 - Does the SUSE RPM output contain a mix of encodings?  Or is
   it all latin-1 or utf-8?

 - Is there any indication of the correct encoding from RPM?

 - Can we not instead escape the bad sequences using whatever is the
   C-level equivalent of str.encode(..., 'backslashreplace')?
   Or I guess better, escape them as Unicode compatibility characters
   https://en.wikipedia.org/wiki/Unicode_compatibility_characters

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

2020-06-30 Thread Sam Eiderman
I see, well the problem is that for some reason SUSE11 did not encode
some of their packages as UTF8 but rather used Latin-1.

There are multiple possible solutions here:
1. Do not decode application description as a string, but rather as a
byte array - I am not sure regarding other than Python bindings, in C
everything is a byte array anyway (as long as it is null terminated) -
This will make this field less usable and require more decoding to be
done by the user
2. Fallback decode "application description" specifically as latin1
(Something more closer to the first patch I submitted)
3. Fallback decode every string returned to Python as latin1 from any
libguestfs API - the current patch

Sam


On Tue, Jun 30, 2020 at 12:00 PM Pino Toscano  wrote:
>
> On Tuesday, 30 June 2020 10:53:54 CEST Sam Eiderman wrote:
> > Hey Pino,
> >
> > Can you search for the previous patches I submitted? I had some discussions
> > regarding this with Daniel and Nir.
>
> Sure, I did read those, and I took it into account. What I said does not
> invalidate nor contradict that.
>
> --
> Pino Toscano

___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

2020-06-30 Thread Sam Eiderman
Regarding reproducing this - if possible, simply install SLES11 SP4 from CD.
I'm not sure how easy it will be for you nowadays since Suse just removed
SLES11 from their Downloads page.

Sam

On Tue, Jun 30, 2020 at 11:53 AM Sam Eiderman  wrote:

> Hey Pino,
>
> Can you search for the previous patches I submitted? I had some
> discussions regarding this with Daniel and Nir.
>
> Thanks!
>
> On Tue, Jun 30, 2020 at 11:43 AM Pino Toscano  wrote:
>
>> On Sunday, 26 April 2020 20:14:03 CEST Sam Eiderman wrote:
>> > The python3 bindings create PyUnicode objects from application strings
>> > on the guest (i.e. installed rpm, deb packages).
>> > It is documented that rpm package fields such as description should be
>> > utf8 encoded - however in some cases they are not a valid unicode
>> > string, on SLES11 SP4 the encoding of the description of the following
>> > packages is latin1 and they fail to be converted to unicode using
>> > guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
>>
>> Sorry, I wanted to reach our resident Python maintainers to get their
>> feedback, and so far had no time for it. Will do it shortly.
>>
>> BTW do you have a reproducer I can actually try freely?
>>
>> > diff --git a/python/handle.c b/python/handle.c
>> > index 2fb8c18f0..fe89dc58a 100644
>> > --- a/python/handle.c
>> > +++ b/python/handle.c
>> > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
>> >  #if PY_MAJOR_VERSION < 3
>> >return PyString_FromString (str);
>> >  #else
>> > -  return PyUnicode_FromString (str);
>> > +  return guestfs_int_py_fromstringsize (str, strlen (str));
>> >  #endif
>> >  }
>> >
>> > @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str,
>> size_t size)
>> >  #if PY_MAJOR_VERSION < 3
>> >return PyString_FromStringAndSize (str, size);
>> >  #else
>> > -  return PyUnicode_FromStringAndSize (str, size);
>> > +  PyObject *s = PyUnicode_FromString (str);
>> > +  if (s == NULL) {
>> > +PyErr_Clear ();
>> > +s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");
>>
>> Minor nit: space between "strlen" and the opening bracket.
>>
>> Also, isn't there any error we can check as a way to detect this
>> situation, rather than always attempting to decode it as latin1?
>>
>> Thanks,
>> --
>> Pino Toscano
>
>
___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

2020-06-30 Thread Sam Eiderman
Hey Pino,

Can you search for the previous patches I submitted? I had some discussions
regarding this with Daniel and Nir.

Thanks!

On Tue, Jun 30, 2020 at 11:43 AM Pino Toscano  wrote:

> On Sunday, 26 April 2020 20:14:03 CEST Sam Eiderman wrote:
> > The python3 bindings create PyUnicode objects from application strings
> > on the guest (i.e. installed rpm, deb packages).
> > It is documented that rpm package fields such as description should be
> > utf8 encoded - however in some cases they are not a valid unicode
> > string, on SLES11 SP4 the encoding of the description of the following
> > packages is latin1 and they fail to be converted to unicode using
> > guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
>
> Sorry, I wanted to reach our resident Python maintainers to get their
> feedback, and so far had no time for it. Will do it shortly.
>
> BTW do you have a reproducer I can actually try freely?
>
> > diff --git a/python/handle.c b/python/handle.c
> > index 2fb8c18f0..fe89dc58a 100644
> > --- a/python/handle.c
> > +++ b/python/handle.c
> > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
> >  #if PY_MAJOR_VERSION < 3
> >return PyString_FromString (str);
> >  #else
> > -  return PyUnicode_FromString (str);
> > +  return guestfs_int_py_fromstringsize (str, strlen (str));
> >  #endif
> >  }
> >
> > @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str,
> size_t size)
> >  #if PY_MAJOR_VERSION < 3
> >return PyString_FromStringAndSize (str, size);
> >  #else
> > -  return PyUnicode_FromStringAndSize (str, size);
> > +  PyObject *s = PyUnicode_FromString (str);
> > +  if (s == NULL) {
> > +PyErr_Clear ();
> > +s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");
>
> Minor nit: space between "strlen" and the opening bracket.
>
> Also, isn't there any error we can check as a way to detect this
> situation, rather than always attempting to decode it as latin1?
>
> Thanks,
> --
> Pino Toscano
___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

2020-06-30 Thread Sam Eiderman
gentle ping

On Wed, Jun 3, 2020 at 2:52 PM Sam Eiderman  wrote:

> On Wed, May 13, 2020 at 10:06 PM Richard W.M. Jones 
> wrote:
> >
> > On Sun, Apr 26, 2020 at 09:14:03PM +0300, Sam Eiderman wrote:
> > > The python3 bindings create PyUnicode objects from application strings
> > > on the guest (i.e. installed rpm, deb packages).
> > > It is documented that rpm package fields such as description should be
> > > utf8 encoded - however in some cases they are not a valid unicode
> > > string, on SLES11 SP4 the encoding of the description of the following
> > > packages is latin1 and they fail to be converted to unicode using
> > > guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
> > >
> > >  PackageKit
> > >  aaa_base
> > >  coreutils
> > >  dejavu
> > >  desktop-data-SLED
> > >  gnome-utils
> > >  hunspell
> > >  hunspell-32bit
> > >  hunspell-tools
> > >  libblocxx6
> > >  libexif
> > >  libgphoto2
> > >  libgtksourceview-2_0-0
> > >  libmpfr1
> > >  libopensc2
> > >  libopensc2-32bit
> > >  liborc-0_4-0
> > >  libpackagekit-glib10
> > >  libpixman-1-0
> > >  libpixman-1-0-32bit
> > >  libpoppler-glib4
> > >  libpoppler5
> > >  libsensors3
> > >  libtelepathy-glib0
> > >  m4
> > >  opensc
> > >  opensc-32bit
> > >  permissions
> > >  pinentry
> > >  poppler-tools
> > >  python-gtksourceview
> > >  splashy
> > >  syslog-ng
> > >  tar
> > >  tightvnc
> > >  xorg-x11
> > >  xorg-x11-xauth
> > >  yast2-mouse
> > >
> > > Fix this by globally changing guestfs_int_py_fromstring()
> > > and guestfs_int_py_fromstringsize() to fallback to latin1 decoding if
> > > utf-8 decoding fails.
> > >
> > > Using the "strict" error handler doesn't matter in the case of latin1
> > > and has the same effect of "replace":
> > >
> > >  https://docs.python.org/3/library/codecs.html#error-handlers
> > >
> > > Signed-off-by: Sam Eiderman 
> > > ---
> > >  python/handle.c | 9 +++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/python/handle.c b/python/handle.c
> > > index 2fb8c18f0..fe89dc58a 100644
> > > --- a/python/handle.c
> > > +++ b/python/handle.c
> > > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
> > >  #if PY_MAJOR_VERSION < 3
> > >return PyString_FromString (str);
> > >  #else
> > > -  return PyUnicode_FromString (str);
> > > +  return guestfs_int_py_fromstringsize (str, strlen (str));
> > >  #endif
> > >  }
> > >
> > > @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str,
> size_t size)
> > >  #if PY_MAJOR_VERSION < 3
> > >return PyString_FromStringAndSize (str, size);
> > >  #else
> > > -  return PyUnicode_FromStringAndSize (str, size);
> > > +  PyObject *s = PyUnicode_FromString (str);
> > > +  if (s == NULL) {
> > > +PyErr_Clear ();
> > > +s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");
> > > +  }
> > > +  return s;
> > >  #endif
> > >  }
> >
> > Looks OK to me.  Pino - any objections to merging this?
> >
> > Rich.
> >
> > --
> > Richard Jones, Virtualization Group, Red Hat
> http://people.redhat.com/~rjones
> > Read my programming and virtualization blog: http://rwmj.wordpress.com
> > virt-df lists disk usage of guests without needing to install any
> > software inside the virtual machine.  Supports Linux and Windows.
> > http://people.redhat.com/~rjones/virt-df/
> >
>
___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

2020-06-30 Thread Pino Toscano
On Tuesday, 30 June 2020 10:53:54 CEST Sam Eiderman wrote:
> Hey Pino,
> 
> Can you search for the previous patches I submitted? I had some discussions
> regarding this with Daniel and Nir.

Sure, I did read those, and I took it into account. What I said does not
invalidate nor contradict that.

-- 
Pino Toscano

signature.asc
Description: This is a digitally signed message part.
___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

2020-06-30 Thread Pino Toscano
On Sunday, 26 April 2020 20:14:03 CEST Sam Eiderman wrote:
> The python3 bindings create PyUnicode objects from application strings
> on the guest (i.e. installed rpm, deb packages).
> It is documented that rpm package fields such as description should be
> utf8 encoded - however in some cases they are not a valid unicode
> string, on SLES11 SP4 the encoding of the description of the following
> packages is latin1 and they fail to be converted to unicode using
> guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):

Sorry, I wanted to reach our resident Python maintainers to get their
feedback, and so far had no time for it. Will do it shortly.

BTW do you have a reproducer I can actually try freely?

> diff --git a/python/handle.c b/python/handle.c
> index 2fb8c18f0..fe89dc58a 100644
> --- a/python/handle.c
> +++ b/python/handle.c
> @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
>  #if PY_MAJOR_VERSION < 3
>return PyString_FromString (str);
>  #else
> -  return PyUnicode_FromString (str);
> +  return guestfs_int_py_fromstringsize (str, strlen (str));
>  #endif
>  }
>  
> @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t 
> size)
>  #if PY_MAJOR_VERSION < 3
>return PyString_FromStringAndSize (str, size);
>  #else
> -  return PyUnicode_FromStringAndSize (str, size);
> +  PyObject *s = PyUnicode_FromString (str);
> +  if (s == NULL) {
> +PyErr_Clear ();
> +s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");

Minor nit: space between "strlen" and the opening bracket.

Also, isn't there any error we can check as a way to detect this
situation, rather than always attempting to decode it as latin1?

Thanks,
-- 
Pino Toscano

signature.asc
Description: This is a digitally signed message part.
___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

2020-06-03 Thread Sam Eiderman
On Wed, May 13, 2020 at 10:06 PM Richard W.M. Jones  wrote:
>
> On Sun, Apr 26, 2020 at 09:14:03PM +0300, Sam Eiderman wrote:
> > The python3 bindings create PyUnicode objects from application strings
> > on the guest (i.e. installed rpm, deb packages).
> > It is documented that rpm package fields such as description should be
> > utf8 encoded - however in some cases they are not a valid unicode
> > string, on SLES11 SP4 the encoding of the description of the following
> > packages is latin1 and they fail to be converted to unicode using
> > guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
> >
> >  PackageKit
> >  aaa_base
> >  coreutils
> >  dejavu
> >  desktop-data-SLED
> >  gnome-utils
> >  hunspell
> >  hunspell-32bit
> >  hunspell-tools
> >  libblocxx6
> >  libexif
> >  libgphoto2
> >  libgtksourceview-2_0-0
> >  libmpfr1
> >  libopensc2
> >  libopensc2-32bit
> >  liborc-0_4-0
> >  libpackagekit-glib10
> >  libpixman-1-0
> >  libpixman-1-0-32bit
> >  libpoppler-glib4
> >  libpoppler5
> >  libsensors3
> >  libtelepathy-glib0
> >  m4
> >  opensc
> >  opensc-32bit
> >  permissions
> >  pinentry
> >  poppler-tools
> >  python-gtksourceview
> >  splashy
> >  syslog-ng
> >  tar
> >  tightvnc
> >  xorg-x11
> >  xorg-x11-xauth
> >  yast2-mouse
> >
> > Fix this by globally changing guestfs_int_py_fromstring()
> > and guestfs_int_py_fromstringsize() to fallback to latin1 decoding if
> > utf-8 decoding fails.
> >
> > Using the "strict" error handler doesn't matter in the case of latin1
> > and has the same effect of "replace":
> >
> >  https://docs.python.org/3/library/codecs.html#error-handlers
> >
> > Signed-off-by: Sam Eiderman 
> > ---
> >  python/handle.c | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/python/handle.c b/python/handle.c
> > index 2fb8c18f0..fe89dc58a 100644
> > --- a/python/handle.c
> > +++ b/python/handle.c
> > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
> >  #if PY_MAJOR_VERSION < 3
> >return PyString_FromString (str);
> >  #else
> > -  return PyUnicode_FromString (str);
> > +  return guestfs_int_py_fromstringsize (str, strlen (str));
> >  #endif
> >  }
> >
> > @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t 
> > size)
> >  #if PY_MAJOR_VERSION < 3
> >return PyString_FromStringAndSize (str, size);
> >  #else
> > -  return PyUnicode_FromStringAndSize (str, size);
> > +  PyObject *s = PyUnicode_FromString (str);
> > +  if (s == NULL) {
> > +PyErr_Clear ();
> > +s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");
> > +  }
> > +  return s;
> >  #endif
> >  }
>
> Looks OK to me.  Pino - any objections to merging this?
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-df lists disk usage of guests without needing to install any
> software inside the virtual machine.  Supports Linux and Windows.
> http://people.redhat.com/~rjones/virt-df/
>

___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

2020-05-13 Thread Sam Eiderman
bump

On Sun, Apr 26, 2020 at 9:14 PM Sam Eiderman  wrote:
>
> The python3 bindings create PyUnicode objects from application strings
> on the guest (i.e. installed rpm, deb packages).
> It is documented that rpm package fields such as description should be
> utf8 encoded - however in some cases they are not a valid unicode
> string, on SLES11 SP4 the encoding of the description of the following
> packages is latin1 and they fail to be converted to unicode using
> guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
>
>  PackageKit
>  aaa_base
>  coreutils
>  dejavu
>  desktop-data-SLED
>  gnome-utils
>  hunspell
>  hunspell-32bit
>  hunspell-tools
>  libblocxx6
>  libexif
>  libgphoto2
>  libgtksourceview-2_0-0
>  libmpfr1
>  libopensc2
>  libopensc2-32bit
>  liborc-0_4-0
>  libpackagekit-glib10
>  libpixman-1-0
>  libpixman-1-0-32bit
>  libpoppler-glib4
>  libpoppler5
>  libsensors3
>  libtelepathy-glib0
>  m4
>  opensc
>  opensc-32bit
>  permissions
>  pinentry
>  poppler-tools
>  python-gtksourceview
>  splashy
>  syslog-ng
>  tar
>  tightvnc
>  xorg-x11
>  xorg-x11-xauth
>  yast2-mouse
>
> Fix this by globally changing guestfs_int_py_fromstring()
> and guestfs_int_py_fromstringsize() to fallback to latin1 decoding if
> utf-8 decoding fails.
>
> Using the "strict" error handler doesn't matter in the case of latin1
> and has the same effect of "replace":
>
>  https://docs.python.org/3/library/codecs.html#error-handlers
>
> Signed-off-by: Sam Eiderman 
> ---
>  python/handle.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/python/handle.c b/python/handle.c
> index 2fb8c18f0..fe89dc58a 100644
> --- a/python/handle.c
> +++ b/python/handle.c
> @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
>  #if PY_MAJOR_VERSION < 3
>return PyString_FromString (str);
>  #else
> -  return PyUnicode_FromString (str);
> +  return guestfs_int_py_fromstringsize (str, strlen (str));
>  #endif
>  }
>
> @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t 
> size)
>  #if PY_MAJOR_VERSION < 3
>return PyString_FromStringAndSize (str, size);
>  #else
> -  return PyUnicode_FromStringAndSize (str, size);
> +  PyObject *s = PyUnicode_FromString (str);
> +  if (s == NULL) {
> +PyErr_Clear ();
> +s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");
> +  }
> +  return s;
>  #endif
>  }
>
> --
> 2.26.2.303.gf8c07b1a785-goog
>

___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH v3] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

2020-05-13 Thread Richard W.M. Jones
On Sun, Apr 26, 2020 at 09:14:03PM +0300, Sam Eiderman wrote:
> The python3 bindings create PyUnicode objects from application strings
> on the guest (i.e. installed rpm, deb packages).
> It is documented that rpm package fields such as description should be
> utf8 encoded - however in some cases they are not a valid unicode
> string, on SLES11 SP4 the encoding of the description of the following
> packages is latin1 and they fail to be converted to unicode using
> guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
> 
>  PackageKit
>  aaa_base
>  coreutils
>  dejavu
>  desktop-data-SLED
>  gnome-utils
>  hunspell
>  hunspell-32bit
>  hunspell-tools
>  libblocxx6
>  libexif
>  libgphoto2
>  libgtksourceview-2_0-0
>  libmpfr1
>  libopensc2
>  libopensc2-32bit
>  liborc-0_4-0
>  libpackagekit-glib10
>  libpixman-1-0
>  libpixman-1-0-32bit
>  libpoppler-glib4
>  libpoppler5
>  libsensors3
>  libtelepathy-glib0
>  m4
>  opensc
>  opensc-32bit
>  permissions
>  pinentry
>  poppler-tools
>  python-gtksourceview
>  splashy
>  syslog-ng
>  tar
>  tightvnc
>  xorg-x11
>  xorg-x11-xauth
>  yast2-mouse
> 
> Fix this by globally changing guestfs_int_py_fromstring()
> and guestfs_int_py_fromstringsize() to fallback to latin1 decoding if
> utf-8 decoding fails.
> 
> Using the "strict" error handler doesn't matter in the case of latin1
> and has the same effect of "replace":
> 
>  https://docs.python.org/3/library/codecs.html#error-handlers
> 
> Signed-off-by: Sam Eiderman 
> ---
>  python/handle.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/python/handle.c b/python/handle.c
> index 2fb8c18f0..fe89dc58a 100644
> --- a/python/handle.c
> +++ b/python/handle.c
> @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
>  #if PY_MAJOR_VERSION < 3
>return PyString_FromString (str);
>  #else
> -  return PyUnicode_FromString (str);
> +  return guestfs_int_py_fromstringsize (str, strlen (str));
>  #endif
>  }
>  
> @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t 
> size)
>  #if PY_MAJOR_VERSION < 3
>return PyString_FromStringAndSize (str, size);
>  #else
> -  return PyUnicode_FromStringAndSize (str, size);
> +  PyObject *s = PyUnicode_FromString (str);
> +  if (s == NULL) {
> +PyErr_Clear ();
> +s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");
> +  }
> +  return s;
>  #endif
>  }

Looks OK to me.  Pino - any objections to merging this?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs