Hi all,

1- I don't think http://localhost/ or https://127.0.0.1 should be valid
urls. So I shall consider those exceptions in my code.
2- In addition to https, http and ftp; Are there any other protocols you
would like us to consider for license urls?

3- I think this:

Rather than true/false perhaps allow the name of the matched algorithm:
verbatim
noassertion – if no test result is available (for invalid links perhaps)
todo – no match attempted

“” – no match asserted
…
verbatim2 – matches with \r == \r\n == \n
verbatim3 – matches “ignoring whitespace differences” reflowed text

verbatim4 – matches ignoring decoration (comments, flower-boxes)
template – matches template verbatim (see ppalaga’s comment)
et cetera as they become available
shall provide more information than what I suggested previously. It will
also enable us to add values without changing the structure of the data.

4- Concerning the date of the most recent HTTP-200 response, we can have
two values; the date of the most recent HTTP-200/or not and true/false. I
think this will allow us to have dates in any case; and whether the link is
dead or not.

Concerning Brad's reply;

1- I would suggest storing the dates of events for all fields, except the
url.
For instance:
isValid: {val: true/false, date: date_utc},
isDead: {val: true/false, date: date_utc}, etc

2- I would really like to have more input on this. I really do not know if
the inclusion of the DNS, CDN, private network, etc to evaluate the
validity of an url is ok. I am more inclined towards using a regex, and not
requiring that a link is valid before establishing whether it is dead or
not. I think that could help.

Any more comments/suggestions are welcome.

Thanks,
Smith

Le mer. 17 juin 2020 à 21:14, Kaelbling, Michael <
[email protected]> a écrit :

> In the spirit of “any suggestions and/or modifications will be very much
> appreciated”, I have inserted comments below.
>
>
>
> *From:* [email protected] <[email protected]> *On Behalf
> Of *Smith Tanjong Agbor
> *Sent:* Wednesday, June 17, 2020 12:32
> *To:* [email protected]; [email protected]
> *Cc:* Gary O'Neall <[email protected]>; [email protected]
> *Subject:* Validate license cross references: New fields to be added
>
>
>
> Hi all,
>
>
>
> I am working on a Google Summer of Code project that emanates from this
> discussion/issue
> <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2FLicenseListPublisher%2Fissues%2F60%23issuecomment-570511697&data=02%7C01%7Cmichael.kaelbling%40siemens.com%7Cd7c5507a4800473b738b08d812d6d551%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C1%7C637280061198707175&sdata=%2B91xSgGaHQ8tUV%2FvyZ%2F9ETzRJz82lH1kMNxUsXf0Ly4%3D&reserved=0>;
> concerning the validation of license cross references. Here is a link to
> my GSOC proposal
> <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F10RlmmsnJ7suDudjgugHMZkOOa-1IsY2Bv_Ew_tgzpv4%2Fedit&data=02%7C01%7Cmichael.kaelbling%40siemens.com%7Cd7c5507a4800473b738b08d812d6d551%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C1%7C637280061198707175&sdata=iLaenAoL2Xda%2FtuXKPPR7%2BDFjlsKvsDIg%2FmjqeMLCUY%3D&reserved=0>
> .
>
>
>
> The focus is on improving the LicenseListPublisher
> <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2FLicenseListPublisher&data=02%7C01%7Cmichael.kaelbling%40siemens.com%7Cd7c5507a4800473b738b08d812d6d551%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C1%7C637280061198717129&sdata=ZujYLHGnpimli4lx4e7T2QGWctKTAEc1GbcK%2BSgCzHw%3D&reserved=0>
> repository to have generated license data
> <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2Flicense-list-data&data=02%7C01%7Cmichael.kaelbling%40siemens.com%7Cd7c5507a4800473b738b08d812d6d551%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C1%7C637280061198717129&sdata=CJBsqA%2FRI90Ld9FMUX%2FqyDsnPnIL4%2F8UlAJwC2ZgXT4%3D&reserved=0>
>  updated
> with fields on the validity of the crossref, among others.
>
>
>
> Inorder to do this, the structure of the crossref shall change(in some
> cases, eg JSON), and in others, there shall be additional tags. In general
> the following are fields which shall be added to the crossrefs:
>
>
>
> *"isValid": true/false**,*
>
> Indicates whether or not the crossref url is a valid url (ex: not some
> local file link)
>
> Must a valid URL be based on one of only two/three schemes: http, https,
> and ftp? Is http://localhost/ or https://127.0.0.1 valid?
>
>
> *"isWayBackLink": true/false**,*
>
> Indicates whether or not the url is a link from a previous version(wayback
> machine) of the site(where the license is located)
>
>
> *"extraText": true/false**,*
>
> Indicates whether or not the license from the url has extra text in its
> description when compared to the license description in the current file.
>
>
> * "isMatch": true/false,*
>
> Indicates whether or not the license from the url link matches(perfectly)
> the license description in the current file.
>
> Rather than true/false perhaps allow the name of the matched algorithm:
> verbatim
> noassertion – if no test result is available (for invalid links perhaps)
> todo – no match attempted
>
> “” – no match asserted
> …
> verbatim2 – matches with \r == \r\n == \n
> verbatim3 – matches “ignoring whitespace differences” reflowed text
>
> verbatim4 – matches ignoring decoration (comments, flower-boxes)
> template – matches template verbatim (see ppalaga’s comment)
> et cetera as they become available
>
>
> * "url": "**http://landley.net/toybox/license.html*
> <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flandley.net%2Ftoybox%2Flicense.html&data=02%7C01%7Cmichael.kaelbling%40siemens.com%7Cd7c5507a4800473b738b08d812d6d551%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C1%7C637280061198727086&sdata=UG9wF2id8FbX%2B7fjrAZqd9kFIpGDijFbf1F3%2BvtniXE%3D&reserved=0>
> *",*
>
> This is the url of the license text/description
>
>
> *"isDead": true/false*
>
> Indicates whether or not the url is a dead link(a link that returns a page
> different from HTTP_200, could be bad request HTTP_400, not found
> HTTP_404, forbidden HTTP_403, etc)
>
> Rather than true/false (since dead sites can be reanimated), how about a
> date for the most-recent HTTP-200 response? “dateMRHTTP200”: “UTC date”
>
>
>
> Please consider this as a proposal and any suggestions and/or
> modifications will be very much appreciated.
>
>
>
> Thanks,
>
> Smith
>
>
>
>
>
> 
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#3888): https://lists.spdx.org/g/Spdx-tech/message/3888
Mute This Topic: https://lists.spdx.org/mt/74934696/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub  
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to