In the spirit of “any suggestions and/or modifications will be very much appreciated”, I have inserted comments below.
From: [email protected] <[email protected]> On Behalf Of Smith Tanjong Agbor Sent: Wednesday, June 17, 2020 12:32 To: [email protected]; [email protected] Cc: Gary O'Neall <[email protected]>; [email protected] Subject: Validate license cross references: New fields to be added Hi all, I am working on a Google Summer of Code project that emanates from this discussion/issue<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2FLicenseListPublisher%2Fissues%2F60%23issuecomment-570511697&data=02%7C01%7Cmichael.kaelbling%40siemens.com%7Cd7c5507a4800473b738b08d812d6d551%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C1%7C637280061198707175&sdata=%2B91xSgGaHQ8tUV%2FvyZ%2F9ETzRJz82lH1kMNxUsXf0Ly4%3D&reserved=0>; concerning the validation of license cross references. Here is a link to my GSOC proposal<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F10RlmmsnJ7suDudjgugHMZkOOa-1IsY2Bv_Ew_tgzpv4%2Fedit&data=02%7C01%7Cmichael.kaelbling%40siemens.com%7Cd7c5507a4800473b738b08d812d6d551%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C1%7C637280061198707175&sdata=iLaenAoL2Xda%2FtuXKPPR7%2BDFjlsKvsDIg%2FmjqeMLCUY%3D&reserved=0>. The focus is on improving the LicenseListPublisher<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2FLicenseListPublisher&data=02%7C01%7Cmichael.kaelbling%40siemens.com%7Cd7c5507a4800473b738b08d812d6d551%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C1%7C637280061198717129&sdata=ZujYLHGnpimli4lx4e7T2QGWctKTAEc1GbcK%2BSgCzHw%3D&reserved=0> repository to have generated license data<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2Flicense-list-data&data=02%7C01%7Cmichael.kaelbling%40siemens.com%7Cd7c5507a4800473b738b08d812d6d551%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C1%7C637280061198717129&sdata=CJBsqA%2FRI90Ld9FMUX%2FqyDsnPnIL4%2F8UlAJwC2ZgXT4%3D&reserved=0> updated with fields on the validity of the crossref, among others. Inorder to do this, the structure of the crossref shall change(in some cases, eg JSON), and in others, there shall be additional tags. In general the following are fields which shall be added to the crossrefs: "isValid": true/false, Indicates whether or not the crossref url is a valid url (ex: not some local file link) Must a valid URL be based on one of only two/three schemes: http, https, and ftp? Is http://localhost/ or https://127.0.0.1 valid? "isWayBackLink": true/false, Indicates whether or not the url is a link from a previous version(wayback machine) of the site(where the license is located) "extraText": true/false, Indicates whether or not the license from the url has extra text in its description when compared to the license description in the current file. "isMatch": true/false, Indicates whether or not the license from the url link matches(perfectly) the license description in the current file. Rather than true/false perhaps allow the name of the matched algorithm: verbatim noassertion – if no test result is available (for invalid links perhaps) todo – no match attempted “” – no match asserted … verbatim2 – matches with \r == \r\n == \n verbatim3 – matches “ignoring whitespace differences” reflowed text verbatim4 – matches ignoring decoration (comments, flower-boxes) template – matches template verbatim (see ppalaga’s comment) et cetera as they become available "url": "http://landley.net/toybox/license.html<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flandley.net%2Ftoybox%2Flicense.html&data=02%7C01%7Cmichael.kaelbling%40siemens.com%7Cd7c5507a4800473b738b08d812d6d551%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C1%7C637280061198727086&sdata=UG9wF2id8FbX%2B7fjrAZqd9kFIpGDijFbf1F3%2BvtniXE%3D&reserved=0>", This is the url of the license text/description "isDead": true/false Indicates whether or not the url is a dead link(a link that returns a page different from HTTP_200, could be bad request HTTP_400, not found HTTP_404, forbidden HTTP_403, etc) Rather than true/false (since dead sites can be reanimated), how about a date for the most-recent HTTP-200 response? “dateMRHTTP200”: “UTC date” Please consider this as a proposal and any suggestions and/or modifications will be very much appreciated. Thanks, Smith -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#3886): https://lists.spdx.org/g/Spdx-tech/message/3886 Mute This Topic: https://lists.spdx.org/mt/74934696/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
