Re: Dealing with renamed source packages during CVE triaging

2018-06-21 Thread Brian May
Antoine Beaupré  writes:

> bam: do you want me to start working on that script or were you working
> on this already?

See
https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/8

I personally find this easier to understand as we use the existing CVE
list parser, although I have not considered how to write changes (as
this wasn't a requirement when I wrote this).
-- 
Brian May 



Re: Dealing with renamed source packages during CVE triaging

2018-06-20 Thread Antoine Beaupré
On 2018-06-15 10:27:45, Moritz Muehlenhoff wrote:
> On Fri, Jun 15, 2018 at 04:34:14PM +1000, Brian May wrote:
>> Moritz Muehlenhoff  writes:
>> 
>> > On Wed, Jun 13, 2018 at 05:19:40PM +1000, Brian May wrote:

[...]

>> That generates a report of all packages that we need to check. I assume
>> we would need some way of marking packages that we have checked and
>> found to be not affected, so we can get a list of packages that need
>> immediate attention and don't repeatedly check the same package multiple
>> times. How should we do this? Maybe another file in the security tracker
>> repository?
>
> Maybe start with the script initially and see whether it's useful as an
> approach in general. State tracking can be discussed/added later.

Maybe the same principle applies as with the approach I considered. We
could have a --stop argument that would consider entries up to a certain
CVE number and ignore the rest of the file.

> Lots of the false positives will result from crappy/outdated entries
> in embedded-code-copies, so fixing those up will drastically reduce
> false positives.

If the embedded-code-copies is used more systematically, with a
semi-automated script, in the triaging process, we'll be more inclined
to keep it up to date as well so I think it would actually help with
that as well...

bam: do you want me to start working on that script or were you working
on this already?

Thanks for the feedback,

A.

-- 
Ils versent un pauvre miel sur leurs mots pourris et te parlent de pénurie
Et sur ta faim, sur tes amis, ils aiguisent leur appétit
- Richard Desjardins, La maison est ouverte



Re: Dealing with renamed source packages during CVE triaging

2018-06-18 Thread Brian May
Brian May  writes:

> I will look at making a pull request tomorrow. The changes should be
> reasonably straight forward syntax changes (e.g. use "!=" instead of
> "<>" for the does not equal operator), work with Python3 in stretch, and
> not require any additional dependancies (I think it only depends on
> Python3).

Python3 support:

https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/7

This one implements
bin/list-potential-packages-affected-by-code-copies:

https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/8

At present time I have written this one to work with Python 2.7 and
Python 3.6, but it won't work with Python 3.6 without the other pull
request first
-- 
Brian May 



Re: Dealing with renamed source packages during CVE triaging

2018-06-17 Thread Brian May
Salvatore Bonaccorso  writes:

>> Feel free to make a pull request, I don't think we have a specific 
>> dependency 
>> on Python 2 modules anywhere. But it might take a bit to get 
>> reviewed/deployed
>> as it's not a high priority issue.
>
> To be kept in mind: whatever change is proposed for the code part of
> the security tracker needs potentially to be able to run on the
> security-tracker host soriano (running on stretch), preferably without
> introducing new dependencies if they are not needed. Merge/pull requests
> for those parts are preferred.

I will look at making a pull request tomorrow. The changes should be
reasonably straight forward syntax changes (e.g. use "!=" instead of
"<>" for the does not equal operator), work with Python3 in stretch, and
not require any additional dependancies (I think it only depends on
Python3).

Perhaps the most intrusive change is deleting the py file with the
definition of namedtuple, it is not needed now Python has the
collections module with a built in namedtuple.
-- 
Brian May 



Re: Dealing with renamed source packages during CVE triaging

2018-06-17 Thread Salvatore Bonaccorso
Hi,

On Fri, Jun 15, 2018 at 10:23:15AM +0200, Moritz Muehlenhoff wrote:
> On Fri, Jun 15, 2018 at 05:21:55PM +1000, Brian May wrote:
> > Brian May  writes:
> > 
> > > So we could write a script, lets say:
> > > bin/list-potential-packages-affected-by-code-copies
> > 
> > In investigating the possibility of this, I noticed the scripts in
> > lib/python/sectracker use legacy python coding standards.
> > 
> > I have updated these files on my local box to work with Python 3, but
> > refraining from pushing for now, because of the possibilty I might break
> > something important.
> 
> When the Debian Security Tracker was created, Python 3 didn't even exist
> yet :-)
> 
> Feel free to make a pull request, I don't think we have a specific dependency 
> on Python 2 modules anywhere. But it might take a bit to get reviewed/deployed
> as it's not a high priority issue.

To be kept in mind: whatever change is proposed for the code part of
the security tracker needs potentially to be able to run on the
security-tracker host soriano (running on stretch), preferably without
introducing new dependencies if they are not needed. Merge/pull requests
for those parts are preferred.

Regards,
Salvatore



Re: Dealing with renamed source packages during CVE triaging

2018-06-15 Thread Moritz Muehlenhoff
On Fri, Jun 15, 2018 at 04:34:14PM +1000, Brian May wrote:
> Moritz Muehlenhoff  writes:
> 
> > On Wed, Jun 13, 2018 at 05:19:40PM +1000, Brian May wrote:
> >> "as I said in the mailing list discussion, I don't like the usage of the
> >> undetermined tag... we use it to hide stuff we can't investigate under
> >> the carpet, I would much prefer that we put it as  directly
> >> when it's the case, or  otherwise."
> >
> > Of course, those can be resolved; it just needs someone to do the analysis 
> > work.
> > Switching to some other tags (and incorrect ones!) doesn't change anything.
> 
> Seems like this a mute point anyway, as from the comments you left in
> the pull request, you don't like this approach of automatically adding
> entries in data/CVE/list. Fair enough.
>
> So we could write a script, lets say:
> bin/list-potential-packages-affected-by-code-copies

You're mixing two things; my comment above refers to , those
are one-off investigations and don't need any particular tooling.

> That generates a report of all packages that we need to check. I assume
> we would need some way of marking packages that we have checked and
> found to be not affected, so we can get a list of packages that need
> immediate attention and don't repeatedly check the same package multiple
> times. How should we do this? Maybe another file in the security tracker
> repository?

Maybe start with the script initially and see whether it's useful as an
approach in general. State tracking can be discussed/added later.

Lots of the false positives will result from crappy/outdated entries
in embedded-code-copies, so fixing those up will drastically reduce
false positives.

Cheers,
Moritz



Re: Dealing with renamed source packages during CVE triaging

2018-06-15 Thread Moritz Muehlenhoff
On Fri, Jun 15, 2018 at 05:21:55PM +1000, Brian May wrote:
> Brian May  writes:
> 
> > So we could write a script, lets say:
> > bin/list-potential-packages-affected-by-code-copies
> 
> In investigating the possibility of this, I noticed the scripts in
> lib/python/sectracker use legacy python coding standards.
> 
> I have updated these files on my local box to work with Python 3, but
> refraining from pushing for now, because of the possibilty I might break
> something important.

When the Debian Security Tracker was created, Python 3 didn't even exist
yet :-)

Feel free to make a pull request, I don't think we have a specific dependency 
on Python 2 modules anywhere. But it might take a bit to get reviewed/deployed
as it's not a high priority issue.

Cheers,
Moritz



Re: Dealing with renamed source packages during CVE triaging

2018-06-15 Thread Brian May
Brian May  writes:

> So we could write a script, lets say:
> bin/list-potential-packages-affected-by-code-copies

In investigating the possibility of this, I noticed the scripts in
lib/python/sectracker use legacy python coding standards.

I have updated these files on my local box to work with Python 3, but
refraining from pushing for now, because of the possibilty I might break
something important.

Is Python 2 compatability still required?
-- 
Brian May 



Re: Dealing with renamed source packages during CVE triaging

2018-06-15 Thread Brian May
Moritz Muehlenhoff  writes:

> On Wed, Jun 13, 2018 at 05:19:40PM +1000, Brian May wrote:
>> "as I said in the mailing list discussion, I don't like the usage of the
>> undetermined tag... we use it to hide stuff we can't investigate under
>> the carpet, I would much prefer that we put it as  directly
>> when it's the case, or  otherwise."
>
> Of course, those can be resolved; it just needs someone to do the analysis 
> work.
> Switching to some other tags (and incorrect ones!) doesn't change anything.

Seems like this a mute point anyway, as from the comments you left in
the pull request, you don't like this approach of automatically adding
entries in data/CVE/list. Fair enough.

So we could write a script, lets say:
bin/list-potential-packages-affected-by-code-copies

That generates a report of all packages that we need to check. I assume
we would need some way of marking packages that we have checked and
found to be not affected, so we can get a list of packages that need
immediate attention and don't repeatedly check the same package multiple
times. How should we do this? Maybe another file in the security tracker
repository?

Would anybody object to this approach?
-- 
Brian May 



Re: Dealing with renamed source packages during CVE triaging

2018-06-13 Thread Moritz Muehlenhoff
On Wed, Jun 13, 2018 at 05:19:40PM +1000, Brian May wrote:
> "as I said in the mailing list discussion, I don't like the usage of the
> undetermined tag... we use it to hide stuff we can't investigate under
> the carpet, I would much prefer that we put it as  directly
> when it's the case, or  otherwise."

Of course, those can be resolved; it just needs someone to do the analysis work.
Switching to some other tags (and incorrect ones!) doesn't change anything.
 
Cheers,
Moritz



Re: Dealing with renamed source packages during CVE triaging

2018-06-13 Thread Brian May
Antoine Beaupré  writes:

> https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/4
>
> Comments are welcome there or here.

Current comments on merge request, copied and pasted here, as I think
relevant for the discussion here:

Moritz Muehlenhoff @jmm commented 4 days ago Owner
Strong nack, the data quality of embedded code copies isn't useful for
this. When you've verified a certain package to be affected, add it
manually (with references), but don't dump lots of unactionable data
into the tracker.

Brian May @bam commented 2 minutes ago Developer
@jmm The problem I
believe is how do we keep track of packages that might be affected but
aren't listed in the security tracker? Do we maybe need to keep track of
this information outside the security tracker?
-- 
Brian May 



Re: Dealing with renamed source packages during CVE triaging

2018-06-13 Thread Brian May
Brian May  writes:

> In any case, possibly better to leave feedback on the pull request:

s/pull request/issue/

Sorry for any confusion.
-- 
Brian May 



Re: Dealing with renamed source packages during CVE triaging

2018-06-13 Thread Brian May
Moritz Muehlenhoff  writes:

> On Tue, Jun 12, 2018 at 05:40:34PM +1000, Brian May wrote:
>> 1. Tagging with / instead of .
>
> Nothing of those can automated. The basic point of  is that
> we lack data to make a proper assessment.
>
> The correct way to handle these is to triage 
> https://security-tracker.debian.org/tracker/status/undetermined by contacting
> e.g. upstream developers or the reporters of the vulnerability and then amend
> CVE/list with the necessary information, i.e. either converting them to
>  if it has been confirmed to be an issue or to
> .

>From an email sent to a Freexian list:

"as I said in the mailing list discussion, I don't like the usage of the
undetermined tag... we use it to hide stuff we can't investigate under
the carpet, I would much prefer that we put it as  directly
when it's the case, or  otherwise."

Having said that, not sure I personally understand this concern. It
would simplify things if we could just use .

>> 3. Resolve general issue regarding CVE/list, and if it should be split up.
>
> That has been proposed and nacked several times before. There's simply
> no practical reason for it. It would add multiple complications (starting
> with the MITRE sync, syncing with external parties, changes to the tracker)
> for no measurable gain. Quite the contrary; it's extremely useful to have 
> 20 years of vulnerability data easily available in a single emacs buffer.

The concerns (from reading the PR) were that:

* git can't cope efficiently with such large files.
* emacs can't cope efficiently with such large files.

In any case, possibly better to leave feedback on the pull request:

https://salsa.debian.org/security-tracker-team/security-tracker/issues/2
-- 
Brian May 



Re: Dealing with renamed source packages during CVE triaging

2018-06-12 Thread Moritz Muehlenhoff
On Tue, Jun 12, 2018 at 05:40:34PM +1000, Brian May wrote:
> 1. Tagging with / instead of .

Nothing of those can automated. The basic point of  is that
we lack data to make a proper assessment.

The correct way to handle these is to triage 
https://security-tracker.debian.org/tracker/status/undetermined by contacting
e.g. upstream developers or the reporters of the vulnerability and then amend
CVE/list with the necessary information, i.e. either converting them to
 if it has been confirmed to be an issue or to .

> 3. Resolve general issue regarding CVE/list, and if it should be split up.

That has been proposed and nacked several times before. There's simply
no practical reason for it. It would add multiple complications (starting
with the MITRE sync, syncing with external parties, changes to the tracker)
for no measurable gain. Quite the contrary; it's extremely useful to have 
20 years of vulnerability data easily available in a single emacs buffer.

Cheers,
Moritz



Re: Dealing with renamed source packages during CVE triaging

2018-06-12 Thread Brian May
Antoine Beaupré  writes:

> I've finalized a prototype during my research on this problem, which I
> have detailed on GitLab, as it's really code that should be merged. It
> would also benefit from wider attention considering it affects more than
> LTS now. Anyways, the MR is here:
>
> https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/4
>
> Comments are welcome there or here.
>
> For what it's worth, I reused Lamby's crude parser because I wanted to
> get the prototype out the door. I am also uncertain that a full parser
> can create the CVE/list file as is reliably without introducing
> inconsistent diffs...
>
> I also drifted into the core datastructures of the security tracker, and
> wondered if it would be better to split up our large CVE/list file now
> that we're using git. I had mixed results. For those interested, it is
> documented here:
>
> https://salsa.debian.org/security-tracker-team/security-tracker/issues/2

So if I understand correctly, the parts that aren't done yet are:

1. Tagging with / instead of .
2. Not processing old entries that we don't care about anymore.
3. Resolve general issue regarding CVE/list, and if it should be split up.

For these:

1. We need to be able if the package still exists or not in a given
distribution. This information is not available from the security-tacker
database, we would need to get it using online json calls. For each and
every package we look at. Which is likely to be very slow, although
incremental processing might help ().

2. For incrememntal updates, coming up with a definition of old entries
that is easy to check seems to be the stumbling point here. Particularly
as entries in CVE/list can be created not in order, and old CVEs might
still be very relevant.

Maybe we need to create/update a list of all CVEs we have processed
before?  Would this work, or is there some problem I haven't thought of?

Ideally for this to work properly we would also need to ensure that it
updates all entries in one run, as one run would be all we get. Not
multiple runs as can be the case now.

3. I have not noticed git operations being slow, but then again I don't
often update this file. As a potential compromise, maybe instead of one
file per CVE, one file per year?
-- 
Brian May 



Re: Dealing with renamed source packages during CVE triaging

2018-06-08 Thread Antoine Beaupré
I've finalized a prototype during my research on this problem, which I
have detailed on GitLab, as it's really code that should be merged. It
would also benefit from wider attention considering it affects more than
LTS now. Anyways, the MR is here:

https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/4

Comments are welcome there or here.

For what it's worth, I reused Lamby's crude parser because I wanted to
get the prototype out the door. I am also uncertain that a full parser
can create the CVE/list file as is reliably without introducing
inconsistent diffs...

I also drifted into the core datastructures of the security tracker, and
wondered if it would be better to split up our large CVE/list file now
that we're using git. I had mixed results. For those interested, it is
documented here:

https://salsa.debian.org/security-tracker-team/security-tracker/issues/2

Cheers!

a.
-- 
If it's important for you, you'll find a way.
If it's not, you'll find an excuse.
- Unknown



Re: Dealing with renamed source packages during CVE triaging

2018-06-08 Thread Antoine Beaupré
On 2018-06-08 03:29:38, Brian May wrote:
> Antoine Beaupré  writes:
>
>> Right now, it seems that all scripts that hammer at those files do so
>> with their own ad-hoc parsing code. Is that the recommended way of
>> chopping those files up? Or is there a better parsing library out there?
>
> It sounds like we really good do with a good parsing library. Maybe one
> that supports making changes too.
>
> I could make a start on this.

As I mentioned in the other thread, I am uncertain where to go from
here. Some scripts use JSON, others parse the files by hand... I also
found out yesterday after writing this that there is *already* a parsing
library in the security tracker. It can parse {CVE,DSA,DLA}/list files
and lives in lib/python/bugs.py, but it's somewhat coupled with the
sqlite database - i'm not sure it's usable standalone.

But yeah, maybe clarifying all this stuff would help, for sure... I
would recommend not writing yet another library from scratch however, as
we probably have a dozen such parser already and it's confusing enough
as it is. ;)

a.
-- 
L'ennui avec la grande famille humaine, c'est que tout le monde veut
en être le père.
- Mafalda



Re: Dealing with renamed source packages during CVE triaging

2018-06-08 Thread Brian May
Antoine Beaupré  writes:

> Right now, it seems that all scripts that hammer at those files do so
> with their own ad-hoc parsing code. Is that the recommended way of
> chopping those files up? Or is there a better parsing library out there?

It sounds like we really good do with a good parsing library. Maybe one
that supports making changes too.

I could make a start on this.

Obligatory XKCD:
https://xkcd.com/927/
-- 
Brian May 



Re: Dealing with renamed source packages during CVE triaging

2018-06-07 Thread Antoine Beaupré
Sorry for resurrecting this old thread, but I've been looking at how to
deal with renamed packages in CVE triaging again. When we last talked
about this, we observed how we were sometimes missing packages during
triage, e.g. `tiff3` that was present in wheezy. That's not an issue
anymore since wheezy is gone, but the problem occurs more broadly in
other packages.

In fact, it seems to me this is similar to the broader of embedded code
copies. We could generalize renamed packages to the embedded code copies
problem. We have a database of those in data/embedded-code-copies
already, although I'm not sure how up to date that file actually is, nor
how it is currently used in the workflow.

It seems to me any database of renames we could be would clearly overlap
with the embedded-code-copies file, so I figured I would write a
(Python, we already have Perl and bash ones...) to start with. I have
tried to upload this in a fork on salsa but gave up as push (of a single
commit!) was stuck "resolving deltas"... Anyways, here's the snippet:

https://salsa.debian.org/anarcat/security-tracker/snippets/70

The next step is to figure out how to actually modify the data/CVE/list
file to introduce the changes. Considering the large number of packages
in the embedded-code-copies file, I am not sure we want to retroactively
change all previous entries. jmm suggested we run a cronjob that would
keep track of where it is in history which would resolve this nicely.

One question that remains is what, exactly, to add in the CVE
metadata. One problem we faced last we looked at this is that we needed
to add an entry like:

   SOURCEPACKAGE 

... which would (e.g.) get triaged to:

   SOURCEPACKAGE 
   [wheezy] SOURCEPACKAGE  (or whatever)

... later on. This requires inside knowledge of the suites and their
packages, something I find surprisingly hard to do in the security
tracker.

With embedded-code-copies, we will have to add something for all the
other source packages, e.g.:

   OTHERSOURCE 

Right now, it seems that all scripts that hammer at those files do so
with their own ad-hoc parsing code. Is that the recommended way of
chopping those files up? Or is there a better parsing library out there?

Thanks for any advice,

A.