On Wed, May 13, 2020 at 10:06 PM Richard W.M. Jones <rjo...@redhat.com> wrote: > > On Sun, Apr 26, 2020 at 09:14:03PM +0300, Sam Eiderman wrote: > > The python3 bindings create PyUnicode objects from application strings > > on the guest (i.e. installed rpm, deb packages). > > It is documented that rpm package fields such as description should be > > utf8 encoded - however in some cases they are not a valid unicode > > string, on SLES11 SP4 the encoding of the description of the following > > packages is latin1 and they fail to be converted to unicode using > > guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()): > > > > PackageKit > > aaa_base > > coreutils > > dejavu > > desktop-data-SLED > > gnome-utils > > hunspell > > hunspell-32bit > > hunspell-tools > > libblocxx6 > > libexif > > libgphoto2 > > libgtksourceview-2_0-0 > > libmpfr1 > > libopensc2 > > libopensc2-32bit > > liborc-0_4-0 > > libpackagekit-glib10 > > libpixman-1-0 > > libpixman-1-0-32bit > > libpoppler-glib4 > > libpoppler5 > > libsensors3 > > libtelepathy-glib0 > > m4 > > opensc > > opensc-32bit > > permissions > > pinentry > > poppler-tools > > python-gtksourceview > > splashy > > syslog-ng > > tar > > tightvnc > > xorg-x11 > > xorg-x11-xauth > > yast2-mouse > > > > Fix this by globally changing guestfs_int_py_fromstring() > > and guestfs_int_py_fromstringsize() to fallback to latin1 decoding if > > utf-8 decoding fails. > > > > Using the "strict" error handler doesn't matter in the case of latin1 > > and has the same effect of "replace": > > > > https://docs.python.org/3/library/codecs.html#error-handlers > > > > Signed-off-by: Sam Eiderman <sam...@google.com> > > --- > > python/handle.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/python/handle.c b/python/handle.c > > index 2fb8c18f0..fe89dc58a 100644 > > --- a/python/handle.c > > +++ b/python/handle.c > > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str) > > #if PY_MAJOR_VERSION < 3 > > return PyString_FromString (str); > > #else > > - return PyUnicode_FromString (str); > > + return guestfs_int_py_fromstringsize (str, strlen (str)); > > #endif > > } > > > > @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t > > size) > > #if PY_MAJOR_VERSION < 3 > > return PyString_FromStringAndSize (str, size); > > #else > > - return PyUnicode_FromStringAndSize (str, size); > > + PyObject *s = PyUnicode_FromString (str); > > + if (s == NULL) { > > + PyErr_Clear (); > > + s = PyUnicode_Decode (str, strlen(str), "latin1", "strict"); > > + } > > + return s; > > #endif > > } > > Looks OK to me. Pino - any objections to merging this? > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > virt-df lists disk usage of guests without needing to install any > software inside the virtual machine. Supports Linux and Windows. > http://people.redhat.com/~rjones/virt-df/ >
_______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://www.redhat.com/mailman/listinfo/libguestfs