Bug#932491: python3-apt: segfault reading from lzma stream

2023-11-02 Thread Julian Andres Klode
Control: clone -1 -2
Control: retitle -2 python3-apt: add support for non-gzip compressed file 
objects
Control: severity -2 wishlist

On Thu, Nov 02, 2023 at 01:18:23PM +0100, Cyril Brulebois wrote:
> Cyril Brulebois  (2023-11-02):
> > Today I had a few more minutes to spend on this, so here's a little
> > debugging session. My main system is still bullseye, but the same tests
> > in a bookworm chroots fail the same way.
> 
> “But maybe it's a bug in the lzma library?” one might ask.
> 
> Adding a bzip2 test between gzip and lzma leads to the following, again
> on both bullseye and bookworm (after creating a Test.bz2/Packages.bz2
> from one of the other files):
> 
> With bug-932491-aa.py (bug-932491-a.py + bzip2):
> 
> $ ./bug-932491-aa.py Test
> gz == bz: True
> gz == xz: True
> gz: section 1 size: 29
> gz: section 1 keys: ['Package', 'Desc']
> gz: section 2 size: 47
> gz: section 2 keys: ['Package', 'Desc']
> Traceback (most recent call last):
>   File "/home/kibi/tmp/./bug-932491-c.py", line 37, in 
> tf_bz.step()
> apt_pkg.Error: E:Unable to parse package file  (1)
> 
> $ ./bug-932491-aa.py Packages
> gz == bz: True
> gz == xz: True
> gz: section 1 size: 1281
> gz: section 1 keys: ['Package', 'Version', 'Installed-Size', 
> 'Maintainer', 'Architecture', 'Depends', 'Pre-Depends', 'Description', 
> 'Homepage', 'Description-md5', 'Tag', 'Section', 'Priority', 'Filename', 
> 'Size', 'MD5sum', 'SHA256']
> gz: section 2 size: 585
> gz: section 2 keys: ['Package', 'Version', 'Installed-Size', 
> 'Maintainer', 'Architecture', 'Pre-Depends', 'Suggests', 'Description', 
> 'Homepage', 'Description-md5', 'Tag', 'Section', 'Priority', 'Filename', 
> 'Size', 'MD5sum', 'SHA256']
> bz: section 1 size: 1410
> Segmentation fault
> 
> With bug-932491-bb.py (bug-932491-b.py + bzip2):
> 
> $ ./bug-932491-bb.py Test
> gz packages: 2
> Traceback (most recent call last):
>   File "/home/kibi/tmp/./bug-932491-bb.py", line 26, in 
> for stanza in tf_bz:
> apt_pkg.Error: E:Unable to parse package file  (1)
> 
> $ ./bug-932491-bb.py Packages
> gz packages: 50771
> Traceback (most recent call last):
>   File "/home/kibi/tmp/./bug-932491-bb.py", line 27, in 
> bz_packages.append(stanza['Package'])
>~~^^^
> KeyError: 'Package'
> 
> 
> It looks like we might be getting chunks of different sizes depending on
> the underlying file objects, and some buffering/seeking code is buggy on
> the apt_pkg side?

You are literally just fuzzing the tagfile parser with compressed
streams, there is no decompression going on.

We don't talk to the the file-like object you pass to at all, we just
call it's fileno() method to get the underlying file descriptor, and
then apt's gzip support reads from that, and that works automagically
because zlib just passes through uncompressed content.

If you want it to automatically guess the compressor, you can do that
by passing a filename with the right file extension.

For existing open files, the right way to approach this arguably is
o provide apt_pkg.FileFd bindings to the FileFd class such that you
can specify a decompressor, and then parse the FileFd to TagFile.

But I think this is a different issue than the segfault because we
probably still should not be segfaulting on fuzzing with random
data like you do, we probably ought to error out at some point.


-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer  i speak de, en


signature.asc
Description: PGP signature


Bug#932491: python3-apt: segfault reading from lzma stream

2023-11-02 Thread Cyril Brulebois
Cyril Brulebois  (2023-11-02):
> Today I had a few more minutes to spend on this, so here's a little
> debugging session. My main system is still bullseye, but the same tests
> in a bookworm chroots fail the same way.

“But maybe it's a bug in the lzma library?” one might ask.

Adding a bzip2 test between gzip and lzma leads to the following, again
on both bullseye and bookworm (after creating a Test.bz2/Packages.bz2
from one of the other files):

With bug-932491-aa.py (bug-932491-a.py + bzip2):

$ ./bug-932491-aa.py Test
gz == bz: True
gz == xz: True
gz: section 1 size: 29
gz: section 1 keys: ['Package', 'Desc']
gz: section 2 size: 47
gz: section 2 keys: ['Package', 'Desc']
Traceback (most recent call last):
  File "/home/kibi/tmp/./bug-932491-c.py", line 37, in 
tf_bz.step()
apt_pkg.Error: E:Unable to parse package file  (1)

$ ./bug-932491-aa.py Packages
gz == bz: True
gz == xz: True
gz: section 1 size: 1281
gz: section 1 keys: ['Package', 'Version', 'Installed-Size', 'Maintainer', 
'Architecture', 'Depends', 'Pre-Depends', 'Description', 'Homepage', 
'Description-md5', 'Tag', 'Section', 'Priority', 'Filename', 'Size', 'MD5sum', 
'SHA256']
gz: section 2 size: 585
gz: section 2 keys: ['Package', 'Version', 'Installed-Size', 'Maintainer', 
'Architecture', 'Pre-Depends', 'Suggests', 'Description', 'Homepage', 
'Description-md5', 'Tag', 'Section', 'Priority', 'Filename', 'Size', 'MD5sum', 
'SHA256']
bz: section 1 size: 1410
Segmentation fault

With bug-932491-bb.py (bug-932491-b.py + bzip2):

$ ./bug-932491-bb.py Test
gz packages: 2
Traceback (most recent call last):
  File "/home/kibi/tmp/./bug-932491-bb.py", line 26, in 
for stanza in tf_bz:
apt_pkg.Error: E:Unable to parse package file  (1)

$ ./bug-932491-bb.py Packages
gz packages: 50771
Traceback (most recent call last):
  File "/home/kibi/tmp/./bug-932491-bb.py", line 27, in 
bz_packages.append(stanza['Package'])
   ~~^^^
KeyError: 'Package'


It looks like we might be getting chunks of different sizes depending on
the underlying file objects, and some buffering/seeking code is buggy on
the apt_pkg side?


Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- Freelance Consultant
#!/usr/bin/python3
"""
Test case for #932491, version a+bz2
"""
import bz2
import gzip
import lzma
import sys

import apt_pkg

root = sys.argv[1]

# Check data decompression works fine:
with gzip.open(f'{root}.gz') as gz:
gz_text = gz.read()
with bz2.open(f'{root}.bz2') as bz:
bz_text = bz.read()
with lzma.open(f'{root}.xz') as xz:
xz_text = xz.read()
print(f'gz == bz: {gz_text == bz_text}')
print(f'gz == xz: {gz_text == xz_text}')

# Perform 2 manual steps with gz:
with gzip.open(f'{root}.gz') as gz:
tf_gz = apt_pkg.TagFile(gz)
tf_gz.step()
print(f'gz: section 1 size: {tf_gz.section.bytes()}')
print(f'gz: section 1 keys: {tf_gz.section.keys()}')
tf_gz.step()
print(f'gz: section 2 size: {tf_gz.section.bytes()}')
print(f'gz: section 2 keys: {tf_gz.section.keys()}')

# Perform 2 manual steps with bz:
with bz2.open(f'{root}.bz2') as bz:
tf_bz = apt_pkg.TagFile(bz)
tf_bz.step()
print(f'bz: section 1 size: {tf_bz.section.bytes()}')
print(f'bz: section 1 keys: {tf_bz.section.keys()}')
tf_bz.step()
print(f'bz: section 2 size: {tf_bz.section.bytes()}')
print(f'bz: section 2 keys: {tf_bz.section.keys()}')

# Perform 2 manual steps with xz:
with lzma.open(f'{root}.xz') as xz:
tf_xz = apt_pkg.TagFile(xz)
tf_xz.step()
print(f'xz: section 1 size: {tf_xz.section.bytes()}')
print(f'xz: section 1 keys: {tf_xz.section.keys()}')
tf_xz.step()
print(f'xz: section 2 size: {tf_xz.section.bytes()}')
print(f'xz: section 2 keys: {tf_xz.section.keys()}')
#!/usr/bin/python3
"""
Test case for #932491: version b+bz2
"""
import bz2
import gzip
import lzma
import sys

import apt_pkg

root = sys.argv[1]

# Start a loop:
gz_packages = []
with gzip.open(f'{root}.gz') as gz:
tf_gz = apt_pkg.TagFile(gz)
for stanza in tf_gz:
gz_packages.append(stanza['Package'])
print(f'gz packages: {len(gz_packages)}')

# Start a loop:
bz_packages = []
with bz2.open(f'{root}.bz2') as bz:
tf_bz = apt_pkg.TagFile(bz)
for stanza in tf_bz:
bz_packages.append(stanza['Package'])
print(f'bz packages: {len(bz_packages)}')

# Start a loop:
xz_packages = []
with lzma.open(f'{root}.xz') as xz:
tf_xz = apt_pkg.TagFile(xz)
for stanza in tf_xz:
print('.', end='')
xz_packages.append(stanza['Package'])
print()
print(f'xz packages: {len(xz_packages)}')


signature.asc
Description: PGP signature


Bug#932491: python3-apt: segfault reading from lzma stream

2023-11-01 Thread Cyril Brulebois
Control: severity -1 important

Hi,

David Bremner  (2019-07-19):
> The following script segfaults if python3-apt is installed, but
> completes if not. Replacing lzma.open with open (and replacing
> Sources.xz with Sources) also makes the segfault go away.  It seems to
> be the same with python3-apt 1.8.4. I didn't check the python2 version
> because lzma is (afaik) python3 only.
> 
> #!/usr/bin/python3
> from debian.deb822 import Sources
> import lzma
> 
> with lzma.open('Sources.xz', mode='rb') as f:
> for src in Sources.iter_paragraphs(f):
> package_name = src.get('Package')
> version = src.get('Version')

This isn't my first attempt at dealing with .xz files using python3-apt,
and I've never managed to get something to work without resorting to
temporary, uncompressed files…

Initial code was:

import gzip
with gzip.open('Packages.gz') as f:
tf = apt_pkg.TagFile(f)
for stanza in tf:
do_something_with(stanza)

which should be replaceable with the following given the documentation
of all relevant modules:

import lzma
with lzma.open('Packages.xz') as f:
tf = apt_pkg.TagFile(f)
for stanza in tf:
do_something_with(stanza)

Using lzma.LZMAFile(), toying with text vs. binary mode, encoding, bytes
flag, etc. didn't help…


Today I had a few more minutes to spend on this, so here's a little
debugging session. My main system is still bullseye, but the same tests
in a bookworm chroots fail the same way.

Depending on the input data, I'm seeing various expressions of the same
bug, some include a SIGSEGV, some don't.

Here's some sample data:

# Real files, SIGSEGV (archived suite == those files won't
# change over time, other indices would do just fine):
wget 
http://archive.debian.org/debian/dists/stretch/main/binary-amd64/Packages.gz
wget 
http://archive.debian.org/debian/dists/stretch/main/binary-amd64/Packages.xz

# Smaller stanzas, different errors
printf "Key1: Short1\nKey2: Short2\n\nKey3: SlightlyLonger1\nKey4: 
SlightlyLonger2\n\n" > Test
gzip -k -f Test
xz -k -f Test

Trying to understand why the lzma case was failing, I tried digging into
apt_pkg.TagFile's internal data, leading to the bug-932491-a.py test
case you'll find attached.

Running it against the Test{.gz,.xz} pair gives:

$ ./bug-932491-a.py Test
gz == xz: True
gz: section 1 size: 26
gz: section 1 keys: ['Key1', 'Key2']
gz: section 2 size: 44
gz: section 2 keys: ['Key3', 'Key4']
Traceback (most recent call last):
  File "/path/to/bug-932491-a.py", line 33, in 
tf_xz.step()
apt_pkg.Error: E:Unable to parse package file  (1)

Running it against the Packages{.gz,.xz} pair gives:

$ ./bug-932491-a.py Packages
gz == xz: True
gz: section 1 size: 1281
gz: section 1 keys: ['Package', 'Version', 'Installed-Size', 'Maintainer', 
'Architecture', 'Depends', 'Pre-Depends', 'Description', 'Homepage', 
'Description-md5', 'Tag', 'Section', 'Priority', 'Filename', 'Size', 'MD5sum', 
'SHA256']
gz: section 2 size: 585
gz: section 2 keys: ['Package', 'Version', 'Installed-Size', 'Maintainer', 
'Architecture', 'Pre-Depends', 'Suggests', 'Description', 'Homepage', 
'Description-md5', 'Tag', 'Section', 'Priority', 'Filename', 'Size', 'MD5sum', 
'SHA256']
xz: section 1 size: 163530
Segmentation fault

See how crazy the size of the first section is…

The stacktrace can be huge, and this should be easily reproducible so
I'm not attaching anything else, but here's where things explode:

Program received signal SIGSEGV, Segmentation fault.
TagSecKeys (Self=, 
Args=Args@entry=()) at python/tag.cc:284
284   Py_DECREF(Obj);
(gdb) l
279   const char *End = Start;
280   for (; End < Stop && *End != ':'; End++);
281 
282   PyObject *Obj;
283   PyList_Append(List,Obj = 
PyString_FromStringAndSize(Start,End-Start));
284   Py_DECREF(Obj);
285}
286return List;
287 }
288 
(gdb) p List
$1 = []
(gdb) p Obj
$2 = 0x0


I was mentioning different expressions… Let's see what happens with the
approach I was starting from, using a for loop on the TagFile object,
against the Packages{.gz,.xz} pair again. The bug-932491-b.py test case
implements a demo using gzip then lzma, printing a dot for each
iteration, showing that the lzma problem shows up on the very first
iteration:

$ ./bin/bug-932491-b.py Packages
gz packages: 50771
.Traceback (most recent call last):
  File "/path/to/bug-932491-b.py", line 27, in 
xz_packages.append(stanza['Package'])
   ~~^^^
KeyError: 'Package'

Since we're only getting xz files for some suites already, it would be
best if they would be manageable through python3-apt…


Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- 

Bug#932491: python3-apt: segfault reading from lzma stream

2019-07-19 Thread David Bremner
Package: python3-apt
Version: 1.9.0
Severity: normal

The following script segfaults if python3-apt is installed, but
completes if not. Replacing lzma.open with open (and replacing
Sources.xz with Sources) also makes the segfault go away.  It seems to
be the same with python3-apt 1.8.4. I didn't check the python2 version
because lzma is (afaik) python3 only.

#!/usr/bin/python3
from debian.deb822 import Sources
import lzma

with lzma.open('Sources.xz', mode='rb') as f:
for src in Sources.iter_paragraphs(f):
package_name = src.get('Package')
version = src.get('Version')


-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'testing-debug'), (500, 
'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.0-5-amd64 (SMP w/4 CPU cores)
Kernel taint flags: TAINT_WARN, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_CA:en (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages python3-apt depends on:
ii  libapt-pkg5.90 1.9.1
ii  libc6  2.28-10
ii  libgcc11:8.3.0-6
ii  libstdc++6 8.3.0-6
ii  python-apt-common  1.8.4
ii  python33.7.3-1

Versions of packages python3-apt recommends:
ii  iso-codes4.3-1
ii  lsb-release  10.2019051400

Versions of packages python3-apt suggests:
ii  apt  1.9.1
pn  python-apt-doc   
ii  python3-apt-dbg  1.9.0

-- no debconf information