Re: [Fedora-legal-list] Trivy for licenses

2024-03-04 Thread Maxwell G
On Tue Mar 5, 2024 at 04:06 +, Maxwell G wrote:
> On Mon Mar 4, 2024 at 22:35 +0100, Sandro wrote:
> > On 04-03-2024 07:59, Miroslav Suchý wrote:
> > > It would welcome if anyone can help Robert here: 
> > > https://bugzilla.redhat.com/show_bug.cgi?id=2235055
> >
> > I had a look and it seems the package is currently stuck on broken 
> > python-pymaven-patch, which requires python-lxml < 5~~. In rawhide and 
> > f40 python-lxml was updated to 5.1.0 two months ago.
> >
> > For about as long there has been a PR open for python-pymaven-patch 
> > removing that version constraint. Notably, the maintainer of 
> > python-pymaven-patch is the same person, who submitted the 
> > scancode-toolkit review request.
> >
> > There may be more trouble down the road. But for the moment, I don't see 
> > how others can help driving this forward. A proven packager could merge 
> > the PR. But I don't know how eclipseo, who's a proven packager, would 
> > feel about that.
>
> Fixing FTI bugs that are unaddressed by a package's maintainer
> definitely falls under the purview of a provenpackager.
> I rebased the PR [1] and will merge it once CI passes.
>
> [1] https://src.fedoraproject.org/rpms/python-pymaven-patch/pull-request/1

It also looks like a bunch of the tests are failing and have been
skipped. That's not super ideal. It looks like [1] has been open
upstream for some time. I have not yet looked closely at the failures,
but Philippe, if you have any pointers to give, that would certainly be
helpful.

[1] https://github.com/nexB/scancode-toolkit/issues/3496
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: [Fedora-legal-list] Trivy for licenses

2024-03-04 Thread Maxwell G
On Mon Mar 4, 2024 at 22:35 +0100, Sandro wrote:
> On 04-03-2024 07:59, Miroslav Suchý wrote:
> > It would welcome if anyone can help Robert here: 
> > https://bugzilla.redhat.com/show_bug.cgi?id=2235055
>
> I had a look and it seems the package is currently stuck on broken 
> python-pymaven-patch, which requires python-lxml < 5~~. In rawhide and 
> f40 python-lxml was updated to 5.1.0 two months ago.
>
> For about as long there has been a PR open for python-pymaven-patch 
> removing that version constraint. Notably, the maintainer of 
> python-pymaven-patch is the same person, who submitted the 
> scancode-toolkit review request.
>
> There may be more trouble down the road. But for the moment, I don't see 
> how others can help driving this forward. A proven packager could merge 
> the PR. But I don't know how eclipseo, who's a proven packager, would 
> feel about that.

Fixing FTI bugs that are unaddressed by a package's maintainer
definitely falls under the purview of a provenpackager.
I rebased the PR [1] and will merge it once CI passes.

[1] https://src.fedoraproject.org/rpms/python-pymaven-patch/pull-request/1
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: [Fedora-legal-list] Trivy for licenses

2024-03-04 Thread Maxwell G
On Mon Mar 4, 2024 at 07:59 +0100, Miroslav Suchý wrote:
> Dne 03. 03. 24 v 20:22 Philippe Ombredanne napsal(a):
>
> > If you want robust license detection, consider using ScanCode [2] and
> > Scancode.io [3] for more complex pipelines. Both are tools that I
> > co-maintain and are considered as better tools for this. Do not
> > hesitate to reach out for help!
>
> *nod*
>
> It would welcome if anyone can help Robert here:
> https://bugzilla.redhat.com/show_bug.cgi?id=2235055

Robert has not been very responsive as of late. It might be a good idea
for someone else to pick it up and start a new review ticket.
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: [Fedora-legal-list] Trivy for licenses

2024-03-04 Thread Maxwell G
On Sun Mar 3, 2024 at 20:22 +0100, Philippe Ombredanne wrote:
> Hi  Maxwell:

Hi Philippe,

> On Sun, Mar 3, 2024, Maxwell G wrote:
> > Has anyone every used trivy [1] to scan for licenses? It appears more
> > robust and better maintained than askalono-cli and can detect files with
> > multiple licenses and licenses embedded in file headers.  I have been
> > running it with "trivy fs --scanners license --license-full ."
> >
> > [1] https://github.com/aquasecurity/trivy
>
> IMHO trivy is not a robust tool for license detection from me trying it.

I am not necessarily looking for the most robust tool for license
detection. I am just looking for a relatively performant and reasonably
accurate tool to scan a tree of Go modules for license files for the
go-vendor-tools [1] project that I am working on.

I evaluated askalono-cli and trivy for this purpose, and they both
fulfil that criteria. I implemented support for both of them and added
an option to choose which to use.

The Fedora legal docs describes askalono as:

 It is most useful for quick analysis of packages coming out of
 ecosystems featuring projects known to have (1) highly standardized
 approaches to layout of license information (it specifically looks
 only for files that are named LICENSE or COPYING or some obvious
 variant on those), (2) generally simple license makeup, and (3)
 cultural preferences for a highly limited set of licenses (for
 example, Rust crates that don’t bundle legacy C code, Go modules,
 or Node.js npm packages).

That is exactly what I am using it for. Trivy does a better job at
detecting license files paths than asaklono and can also handle files
with multiple licenses and some license headers. My code already checks
that there is at least one license file in each Go module, so if one is
missing, the go-vendor-tools license checker will fail and require the
user to take manual action.

The Go ecosystem is relatively standardized in terms of licensing, so I
do not feel the need to use a tool like scancode which analyzes every
single file and takes a very long time to run.

> It is mostly based on google/licenseclassifier which had a single
> commit in the last 17 months, and this means this is not more
> maintained than askalono (and frankly both are fairly lightweight

> I would not rely on
> these for anything serious and certainly not to scan code for license
> prior to its inclusion in Fedora. tools for license detection).

I am striving for "reasonably sure" that all license texts are accounted
for as opposed to spending 45 minutes performing a detailed license
files for each package.

> If you want robust license detection, consider using ScanCode [2] and
> Scancode.io [3] for more complex pipelines. Both are tools that I
> co-maintain and are considered as better tools for this. Do not
> hesitate to reach out for help!

I will definitely spend more time playing with  scancode-toolkit, but I
worry about the amount of time it takes to run on a large go vendor tree
and that it has not been packaged for Fedora yet---it has a lot of
Python dependencies. I opened [2] to track implementing a scancode
backend for go-vendor-tools. I will be sure to let you know if I have
any questions!

[1] https://gitlab.com/gotmax23/go-vendor-tools/
[2] https://gitlab.com/gotmax23/go-vendor-tools/-/issues/15

Thanks,
Maxwell
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: [Fedora-legal-list] Trivy for licenses

2024-03-04 Thread Sandro

On 04-03-2024 07:59, Miroslav Suchý wrote:
It would welcome if anyone can help Robert here: 
https://bugzilla.redhat.com/show_bug.cgi?id=2235055


I had a look and it seems the package is currently stuck on broken 
python-pymaven-patch, which requires python-lxml < 5~~. In rawhide and 
f40 python-lxml was updated to 5.1.0 two months ago.


For about as long there has been a PR open for python-pymaven-patch 
removing that version constraint. Notably, the maintainer of 
python-pymaven-patch is the same person, who submitted the 
scancode-toolkit review request.


There may be more trouble down the road. But for the moment, I don't see 
how others can help driving this forward. A proven packager could merge 
the PR. But I don't know how eclipseo, who's a proven packager, would 
feel about that.


-- Sandro
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: [Fedora-legal-list] Trivy for licenses

2024-03-03 Thread Miroslav Suchý

Dne 03. 03. 24 v 20:22 Philippe Ombredanne napsal(a):

It is mostly based on google/licenseclassifier which had a single
commit in the last 17 months, and this means this is not more
maintained than askalono (and frankly both are fairly lightweight
tools for license detection). Trivy adds SPDX expression parsing on
top of the google/licenseclassifier and that's it. I would not rely on
these for anything serious and certainly not to scan code for license
prior to its inclusion in Fedora.


On the other hand, you can have custom config

https://aquasecurity.github.io/trivy/v0.49/docs/scanner/license/#custom-classification

and we can easily generate config for trivy from fedora-license-data. So you will have clacification specifically for 
Fedora.




If you want robust license detection, consider using ScanCode [2] and
Scancode.io [3] for more complex pipelines. Both are tools that I
co-maintain and are considered as better tools for this. Do not
hesitate to reach out for help!


*nod*

It would welcome if anyone can help Robert here: 
https://bugzilla.redhat.com/show_bug.cgi?id=2235055

--
Miroslav Suchy, RHCA
Red Hat, Manager, Packit and CPT, #brno, #fedora-buildsys
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: [Fedora-legal-list] Trivy for licenses

2024-03-03 Thread Philippe Ombredanne
Hi  Maxwell:

On Sun, Mar 3, 2024, Maxwell G wrote:
> Has anyone every used trivy [1] to scan for licenses? It appears more
> robust and better maintained than askalono-cli and can detect files with
> multiple licenses and licenses embedded in file headers.  I have been
> running it with "trivy fs --scanners license --license-full ."
>
> [1] https://github.com/aquasecurity/trivy

IMHO trivy is not a robust tool for license detection from me trying it.

It is mostly based on google/licenseclassifier which had a single
commit in the last 17 months, and this means this is not more
maintained than askalono (and frankly both are fairly lightweight
tools for license detection). Trivy adds SPDX expression parsing on
top of the google/licenseclassifier and that's it. I would not rely on
these for anything serious and certainly not to scan code for license
prior to its inclusion in Fedora.

If you want robust license detection, consider using ScanCode [2] and
Scancode.io [3] for more complex pipelines. Both are tools that I
co-maintain and are considered as better tools for this. Do not
hesitate to reach out for help!

Not directly related, I just found out ScanCode has been used for
building large code LLMs [4]

[1] https://github.com/google/licenseclassifier
[2] https://github.com/nexB/scancode-toolkit
[3] https://github.com/nexB/scancode.io
[4] https://huggingface.co/papers/2402.19173

--
Cordially
Philippe Ombredanne

+1 650 799 0949 | pombreda...@nexb.com
AboutCode - Open source for open source - https://www.aboutcode.org
VulnerableCode - the open code and open data vulnerability database -
https://github.com/nexb/vulnerablecode
ScanCode - scan your code, for origin/license/vulnerabilities, report
SBOMs - https://github.com/nexB/scancode-toolkit
https://github.com/nexB/scancode.io
package-url - the mostly universal SBOM identifier for packages -
https://github.com/package-url
DejaCode - What's in your code?! - http://www.dejacode.com
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue