From this perspective, to get our source files, we'll get the list of mime type of all programming languages and for each mime types we list the extensions we want to consider. Yes. If tools/scripts exist that can do this already that are compatible with python, it might make sense to use them in initial version.
Besides looking at filename extensions, you can also use python-magic (the Python interface to libmagic) to get information on the contents of files. -- zvr – From: [email protected] [mailto:[email protected]] On Behalf Of Kate Stewart Sent: Tuesday, 30 May, 2017 20:04 To: Krys Nuvadga <[email protected]> Cc: [email protected] Subject: Re: [spdx-tech] GSoC "License Coverage Grader" Project Status Update On Mon, May 29, 2017 at 9:45 AM, Krys Nuvadga <[email protected]<mailto:[email protected]>> wrote: Hi mentors, After some brainstorming sessions and research, here are some aspects of the project which I would like us to be clear with. For me it would make sense for us to consider only files that contain program instructions, possibly with comments, written using a human-readable programming language, usually as ordinary text as source files. For our purpose, an intermediate file "is not real source code and does not count as source file since there are generated by the machine. As a starting point, yes, it makes sense to go with ignoring generated files for right now. Just as a note - there are some subtle cases where user interfaces are generated by combining code from source coded based libraries. In those cases, some of those we might want to revisit this when we have examples, but that's to be considered in future. From this perspective, to get our source files, we'll get the list of mime type of all programming languages and for each mime types we list the extensions we want to consider. Yes. If tools/scripts exist that can do this already that are compatible with python, it might make sense to use them in initial version. In terms of the sketch of what should be parametrized, the grader tool, I think, should be taking just the spdx document of the package to be evaluated. The spdx document is definitely one of the parameters. It may also make sense to optionally parameterize the number of lines & bytes - so that there is a reasonable default for a run, but there can be an override to experiment and as we get different input from legal. For some languages 10 lines and minimum 100 characters is reasonable for describing something worth licensing. But Yev has pointed out in the past, and I think this agrees with what you're seeing, that some languages only need to have 3 or 4 lines to express something significant. So making the defaults a bit smaller may make sense in the tool. Thanks for joining the call today! Kate Intel Deutschland GmbH Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany Tel: +49 89 99 8853-0, www.intel.de Managing Directors: Christin Eisenschmid, Christian Lamprechter Chairperson of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928
_______________________________________________ Spdx-tech mailing list [email protected] https://lists.spdx.org/mailman/listinfo/spdx-tech
