From this perspective, to get our source files, we'll get the list of mime type 
of all programming languages and for each mime types we list the extensions we 
want to consider.
Yes.    If tools/scripts exist that can do this already that are compatible 
with python, it might make sense to use them in initial version.

Besides looking at filename extensions, you can also use python-magic (the 
Python interface to libmagic) to get information on the contents of files.


-- zvr –

From: [email protected] 
[mailto:[email protected]] On Behalf Of Kate Stewart
Sent: Tuesday, 30 May, 2017 20:04
To: Krys Nuvadga <[email protected]>
Cc: [email protected]
Subject: Re: [spdx-tech] GSoC "License Coverage Grader" Project Status Update



On Mon, May 29, 2017 at 9:45 AM, Krys Nuvadga 
<[email protected]<mailto:[email protected]>> wrote:
Hi mentors,
After some brainstorming sessions and research, here are some aspects of the 
project which I would like us to be clear with.

For me it would make sense for us to consider only files that contain program 
instructions, possibly with comments, written using a human-readable 
programming language, usually as ordinary text as source files. For our 
purpose, an intermediate file "is not real source code and does not count as 
source file since there are generated by the machine.

As a starting point, yes, it makes sense to go with ignoring generated files 
for right now.

Just as a note - there are some subtle cases where user interfaces are 
generated by combining code from source coded based libraries.  In those cases, 
some of those we might want to revisit this when we have examples,  but that's 
to be considered in future.


From this perspective, to get our source files, we'll get the list of mime type 
of all programming languages and for each mime types we list the extensions we 
want to consider.

Yes.    If tools/scripts exist that can do this already that are compatible 
with python, it might make sense to use them in initial version.


In terms of the sketch of what should be parametrized, the grader tool, I 
think, should be taking just the spdx document of the package to be evaluated.

The spdx document is definitely one of the parameters.  It may also make sense 
to optionally parameterize the number of lines & bytes - so that there is a 
reasonable default for a run, but there can be an override to experiment and as 
we get different input from legal.

For some languages 10 lines and minimum 100 characters is reasonable for 
describing something worth licensing.   But Yev has pointed out in the past, 
and I think this agrees with what you're seeing,  that some languages only need 
to have 3 or 4 lines to express something significant.    So making the 
defaults a bit smaller may make sense in the tool.

Thanks for joining the call today!

Kate
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to