[issue40301] zipfile module: new feature (two lines of code), useful for test, security and forensics

2020-04-18 Thread Massimo Sala


Massimo Sala  added the comment:

I choosed to use the internal variable *concat* because
- if I recollect correctly, it is calculated before successive routines;
- I didn't see your solution (!), there is a very nice computed variable in
front of my eyes.

Mmh
1) Reliability
Cannot be sure this always run with malformed files :
for zinfo in zf.infolist():

We can try / except but we loose the computation.
If *concat* is already computed (unless completely damaged files), IMHO my
solution is better.

2) Performance
What are the performance for big files?
Are there file seeks due to traversing zf.infolist() ?

> Daniel wrote:
> the advantage is that it already works in python 2.7 so there is no need
to patch Python

Yes, indeed.

If I am right about the pros of my patch, I stand for it.

Many thanks for you attention.

On Sat, 18 Apr 2020 at 15:45, Daniel Hillier  wrote:

>
> Daniel Hillier  added the comment:
>
> Hi Massimo,
>
> Unless I'm missing something about your requirements, the advantage is that
> it already works in python 2.7 so there is no need to patch Python. Just
> bundle the above function with your analysis tool and you're good to go.
>
> Cheers,
> Dan
>
> On Sat, Apr 18, 2020 at 11:36 PM Massimo Sala 
> wrote:
>
> >
> > Massimo Sala  added the comment:
> >
> > Hi Daniel
> >
> > Could you please elaborate the advantages of your loop versus my two
> lines
> > of code?
> > I don't grasp...
> >
> > Thanks, Massimo
> >
> > On Sat, 18 Apr 2020 at 03:26, Daniel Hillier 
> > wrote:
> >
> > >
> > > Daniel Hillier  added the comment:
> > >
> > > Could something similar be achieved by looking for the earliest file
> > > header offset?
> > >
> > > def find_earliest_header_offset(zf):
> > > earliest_offset = None
> > > for zinfo in zf.infolist():
> > > if earliest_offset is None:
> > > earliest_offset = zinfo.header_offset
> > > else:
> > > earliest_offset = min(zinfo.header_offset, earliest_offset)
> > > return earliest_offset
> > >
> > >
> > > You could also adapt this using
> > >
> > > zinfo.compress_size + len(zinfo.FileHeader())
> > >
> > > to see if there were any sections inside the archive which were not
> > > referenced from the central directory. Not sure if zip files with
> > arbitrary
> > > bytes inside the archive would be valid everywhere, but I think they
> are
> > > using zipfile.
> > >
> > > You can also have zipped content inside an archive which has a valid
> > > fileheader but no reference from the central directory. Those entries
> are
> > > discoverable by implementations which process content serially from the
> > > start of the file but not implementations which rely on the central
> > > directory.
> > >
> > > --
> > > nosy: +dhillier
> > >
> > > ___
> > > Python tracker 
> > > <https://bugs.python.org/issue40301>
> > > ___
> > >
> >
> > --
> >
> > ___
> > Python tracker 
> > <https://bugs.python.org/issue40301>
> > ___
> >
>
> --
>
> ___
> Python tracker 
> <https://bugs.python.org/issue40301>
> ___
>

--

___
Python tracker 
<https://bugs.python.org/issue40301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40301] zipfile module: new feature (two lines of code), useful for test, security and forensics

2020-04-18 Thread Massimo Sala


Massimo Sala  added the comment:

Sorry I forgot to mention one specific case.
We have valid archives with a starting "blob": digitally signed zip files,
their filename extension is ".zip.p7m".

I agree your tip can be useful to other readers.
Best regards, Sala

On Sat, 18 Apr 2020 at 15:45, Serhiy Storchaka 
wrote:

>
> Serhiy Storchaka  added the comment:
>
> Just check the first 4 bytes of the file. In "normal" ZIP archive they are
> b'PK\3\4' (or b'PK\5\6' if it is empty). It is so reliable as checking the
> offset, and more efficient. It is even more reliable, because a malware can
> have zero ZIP archive offset, but it cannot start with b'PK\3\4'.
>
> --
>
> ___
> Python tracker 
> <https://bugs.python.org/issue40301>
> ___
>

--

___
Python tracker 
<https://bugs.python.org/issue40301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40301] zipfile module: new feature (two lines of code), useful for test, security and forensics

2020-04-18 Thread Massimo Sala


Massimo Sala  added the comment:

Hi Daniel

Could you please elaborate the advantages of your loop versus my two lines
of code?
I don't grasp...

Thanks, Massimo

On Sat, 18 Apr 2020 at 03:26, Daniel Hillier  wrote:

>
> Daniel Hillier  added the comment:
>
> Could something similar be achieved by looking for the earliest file
> header offset?
>
> def find_earliest_header_offset(zf):
> earliest_offset = None
> for zinfo in zf.infolist():
> if earliest_offset is None:
> earliest_offset = zinfo.header_offset
> else:
> earliest_offset = min(zinfo.header_offset, earliest_offset)
> return earliest_offset
>
>
> You could also adapt this using
>
> zinfo.compress_size + len(zinfo.FileHeader())
>
> to see if there were any sections inside the archive which were not
> referenced from the central directory. Not sure if zip files with arbitrary
> bytes inside the archive would be valid everywhere, but I think they are
> using zipfile.
>
> You can also have zipped content inside an archive which has a valid
> fileheader but no reference from the central directory. Those entries are
> discoverable by implementations which process content serially from the
> start of the file but not implementations which rely on the central
> directory.
>
> --
> nosy: +dhillier
>
> ___
> Python tracker 
> <https://bugs.python.org/issue40301>
> ___
>

--

___
Python tracker 
<https://bugs.python.org/issue40301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40301] zipfile module: new feature (two lines of code), useful for test, security and forensics

2020-04-18 Thread Massimo Sala


Massimo Sala  added the comment:

Hi Serhiy

Thanks for the suggestion but I don't need to analyse different
self-extraction payloads (and I think it is always unreliable, there are
too many self-extractors in the wild).

I spend two words about my work.

I analyze ZIP archives because they are the "incarnation" also of microsoft
OOXML and openoffice OASIS ODF documents.

I always find these kind of files with not zero offset aren't strictly
compliant documents (by their respective file formats specifications).
Sometimes there is a self-extrator, sometimes there are pieces of malware
blobs (outside the ZIP structure or inside it, into the compressed files),
sometimes other errors.

For us checking the offset is very effective: we discard "bad" documents at
maximum speed before any other checks and it is more reliable than
antivirus (checking against specific blobs signatures, everytime changing).
With just a single test we have a 100% go/nogo result. Every colleague
grasp this check, there aren't hard to read and maintain routines.

Massimo

On Sat, 18 Apr 2020 at 09:36, Serhiy Storchaka 
wrote:

>
> Serhiy Storchaka  added the comment:
>
> I am not sure it would help you. There are legitimate files which contain
> a payload followed by the ZIP archive (self-extracting archives, programs
> with embedded ZIP archives). And the malware can make the offset of the ZIP
> archive be zero.
>
> If you want to check whether the file looks like an executable, analyze
> first few bytes of the file. All executable files should start by one of
> well recognized signatures, otherwise the OS would not know how to execute
> them and they would not be malware.
>
> --
>
> ___
> Python tracker 
> <https://bugs.python.org/issue40301>
> ___
>

--

___
Python tracker 
<https://bugs.python.org/issue40301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40301] zipfile module: new feature (two lines of code), useful for test, security and forensics

2020-04-18 Thread Massimo Sala


Massimo Sala  added the comment:

On Sat, 18 Apr 2020 at 04:37, Steven D'Aprano 
wrote:
If we made an exception for you, then people using Python 2.7 still
couldn't use this feature:
`myzipfile.offset` would fail on code using Python 2.7, 2.7.1, 2.7.2,
2.7.3, ... 2.7.17 and only work with 2.7.18.
Nobody could use it unless their application required 2.7.18.

Yes, it seems to me obvious it will work only with Python 2.7.18, and I see
no problem.
If you need new features, you have always to update (to a new MINOR version
or, like you said, MAJOR version).

I am used to other softwares where some features are backported to older
versions and IMHO it is very useful.
Sometimes you just need a specific feature and it isn't possible to update
to a MAJOR version.
You have to consider there are many legacy softwares, also in business, and
a version leap means a lot of work and tests.

Speaking in general, not only python: if the maintainers backport that
specific feature, bingo! you have only to update to the same MAJOR new
MINOR version. And this is good for the user base, there isn't "one size
fits all".
I shot my bullet but I cannot change python.org way of life.

Steven many thanks for your answers and patience to explain.
BTW yes I will patch python 2.7 sources and compile it... also on legacy,
intranet, centos 5 servers we cannot update :-)

--

___
Python tracker 
<https://bugs.python.org/issue40301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40301] zipfile module: new feature (two lines of code), useful for test, security and forensics

2020-04-17 Thread Massimo Sala


Massimo Sala  added the comment:

Hi Steven

Every software "ecosystem" has its guidelines and I am a newbie about
python development.

Mmh I see your concerns. I agree about your deletions of all py 3 versions
before the latest 3.9.

About Py 2, I remark these facts:
- there are a lot of forensics tools still written for py 2;
- python 2.7.18 will be forever the last python 2 and I think is it fine to
end-users to have zipfile with this feature both in py 2.7 and py 3.9;
- in the code there isn't any new routine to test: the change is just to
expose one internal variable.

I agree my request is an exception but I think you have to agree this
situation is exceptional.
IMHO rules must exist to help us and I think this request doesn't carry any
burden.

I ask you please
- to reconsider my request
- anyway, to put me in contact with zipfile mainteners, I don't know how to
reach them but I want to hear them about this.

Many thanks, Massimo

--

___
Python tracker 
<https://bugs.python.org/issue40301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40271] Allow shell like paths in

2020-04-16 Thread Massimo Sala


Massimo Sala  added the comment:

Gavin, zipfile works on all the operating systems where python runs.
Your request is OS dependent... BSD? linux?

The tilde isn't into the ZIP file specifications.
I have to agree with Serhiy: the correct solution is
os.path.expanduser("~/stuff")

--
nosy: +massimosala

___
Python tracker 
<https://bugs.python.org/issue40271>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40301] zipfile module: new feature (two lines of code), useful for test, security and forensics

2020-04-16 Thread Massimo Sala


Change by Massimo Sala :


--
title: zipfile module: new feature (two lines of code) -> zipfile module: new 
feature (two lines of code), useful for test, security and forensics

___
Python tracker 
<https://bugs.python.org/issue40301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40301] zipfile module: new feature (two lines of code)

2020-04-16 Thread Massimo Sala


New submission from Massimo Sala :

module zipfile

Tag "Components": I am not sure "Library (Lib)" is the correct one. If it 
isn't, please fix.

I use python to check zip files against malware.
In these files the are binary blobs outside the ZIP archive.
The malware payload isn't inside the ZIP file structure.
Example: a file "openme.zip" with this content :
[blob from offset 0 to offset 5678] 
[ZIP archive from offset 5679 to end of file]

zipfile already handles this, finding the ZIP structure inside the file.

My change is just to add a new public property, to expose an internal variable: 
the file offset of the ZIP structure.

I know, I am after the code freeze of Python 2.7.18.
But the change is really trivial, see the diff.
I hope you can approve this patch for all the Python versions, also for 2.7, to 
have consistency. For 2.7 this is the last call.

--
components: Library (Lib)
files: py27_zipfile.patch
keywords: patch
messages: 366597
nosy: massimosala
priority: normal
severity: normal
status: open
title: zipfile module: new feature (two lines of code)
type: enhancement
versions: Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9
Added file: https://bugs.python.org/file49067/py27_zipfile.patch

___
Python tracker 
<https://bugs.python.org/issue40301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com