Hi all,

I'm a bit late to this thread, but wanted to add a few bits to what others have said.

First of all, Gary's brief history of the SPDX License List format reminded me that we ought to document this, as it does provide some context on how we got here. Turns out the timing was mostly documented in https://github.com/spdx/license-list-XML/blob/main/DOCS/history.md - but I made a few updates to add a bit of color and some links - see https://github.com/spdx/license-list-XML/commit/028424f7567ce9010692c7891905671e5c2e5278 if you want to check out the diff.

Suffice to say, the conversion from a spreadsheet and text files to the XML format was a very long process with much discussion. I'm obviously not going to recap that here, but if you are interested in getting a general idea, there are some links in the history to the old wiki that captures some of the working discussions. For that reason alone, I'm not much in favor of changing the format again, unless someone has a really compelling reason and a complete plan, along with the commitment to lead the work and any necessary tooling.

That being said, I understand what Richard is talking about in terms of the need to look at the XML to fully determine variations for a match or if further markup might be warranted. Given how the license list has grown, it's not surprising that we are getting more submissions that are "close matches" to something already on the list. Simply determining a match or if it's close enough to something existing, but is ripe for additional markup becomes a very detailed task.

For the vast majority of licenses, this isn't too hard. But for a few, namely the HPND variants and BSD-3-Clause in particular, parsing the allowed variants by looking at the XML files (especially now the regex space character) is really difficult. As a result of my own investigations into the HPND variants, I created this Google doc as a way to see them all in one place with only needing to scroll and used the blue/red text for easier human readability when comparing to a new submission. https://docs.google.com/document/d/1xqSwTfJJ7btkhbblrIAZxOxv0iZPmAMGar9rU7DLKC8/edit

I have also update our documentation with a link to this Google doc here: https://github.com/spdx/license-list-XML/blob/main/DOCS/license-match.md

Suffice to say - me maintaining this Google doc (unless someone wants to help...) is not optimal. If there was a way to use some kind of XML/regex viewing tool to help with the visualization for these challenging licenses, that would be great. But I'm neither the person to know what that would be, how it works, or how it might be implemented.

In the meantime, I hope this helps a bit!

Jilayne


On 1/15/24 12:16 PM, Gary O'Neall wrote:
Just adding a bit of historical context and personal experience to Alexios 
description below - which I largely agree with.

The XML format is actually the 3rd iteration of formats the legal team has used 
to capture license information.

Iteration 1: spreadsheets (open office format)
Iteration 2: spreadsheets with separate text files with a very proprietary 
format for denoting how to format the files in HTML (e.g. if a line starts with 
3 spaces, it is a bullet and should be indented).
Iteration 3: XML

Iteration 2 came out of limitations in the spreadsheet (length of text in a 
cell) and the inability to format the text for good HTML readability.

Iteration 3 came out of frustration trying to maintain iteration 2.  I wasn't 
the driver of the change, but from my own personal experience in iteration 2, 
we found ourselves re-inventing HTML and HTML in the proprietary text formats - 
moving to XML solved that problem.  Having a single spreadsheet with all the 
metadata didn't lend itself well to multiple collaborators - separate files for 
each license metadata made collaboration much easier.  It was a large and 
painful move involving a lot of effort to XML but in IMO resulted in a much 
easier to maintain text format and worth the effort overall.

There are several text formatting alternatives (full HTML, LaTeX, SGML, 
markdown among just a few).  Based on my past experience, I would not want to 
go back to a proprietary text format for the text portion of the license data.

For the metadata, there are several alternatives, but we would need to somehow 
link them to the text format.  Since moving to a different metadata format 
would involve some effort, I would like to see a strong enough benefit to 
justify the effort AND volunteers to help with necessary changes to the tooling.

So far, I have not seen an alternative to XML with enough benefit to go through 
the significant effort of changing - but I'm willing to listen and discuss.

Gary

-----Original Message-----
From:Spdx-legal@lists.spdx.org  <Spdx-legal@lists.spdx.org>  On Behalf Of
Alexios Zavras
Sent: Monday, January 15, 2024 7:07 AM
To: Jonas Smedegaard<d...@jones.dk>; Richard Fontana
<rfont...@redhat.com>
Cc: SPDX-legal<spdx-legal@lists.spdx.org>
Subject: Re: XML format is unsatisfactory

Richard, interesting discussion, but I think your reasoning is backwards, from
historical perspective.
Keep in mind that license-list-XML keeps data and metadata about licenses.

The metadata is easy: stuff like short identifier, when it was added, whether
it's OSI approved, links for the original text, etc. etc.
I agree that these can be represented in any format (XML, YAML, JSON, TOML,
text in key-value pairs, ...).

Then we have the data: it started out as pure text, then we wanted to have
some structure (split into paragraphs, or bullet lists), then we had to 
represent
that some parts of the text are optional (they could be present or not), then
there were also alternatives (something must be there, but maybe the content
is not arbitrary).
XML is the best format for describing text with markups -- the billions of web
pages in HTML (an XML with specific set of tags) attest to that.

A "simple {{mustache-style}} curly braces markup when needed" will simply
not do -- you want tags to contain other elements (an <optional> that includes
<list> that includes <item> that includes <alt>). There are two ways to achieve
this: by marking the start and end or by nesting structures. XML uses start/end
tags; other markup systems use either one (or both). I have long experience in
every conceivable setup.

Based on the requirements to describe the text with markups, XML was
selected. And since we had it for the data, it made sense to have it for
metadata as well.
I don't think that "makes it easier to generate usable HTML" was high in the
criteria ever.


So, is this a discussion on how to represent the metadata, how to represent
the data, or both? If people are uncomfortable with having the metadata
expressed in XML, I would not object to split the metadata in a different
format (YAML, TOML, whatever). But finding an acceptable alternative for the
actual text... I would be *very* interested to see a practical proposal.

--
zvr
-----Original Message-----
From:Spdx-legal@lists.spdx.org  <Spdx-legal@lists.spdx.org>  On Behalf Of
Jonas Smedegaard
Sent: Monday, 15 January, 2024 06:29
To: Richard Fontana<rfont...@redhat.com>
Cc: SPDX-legal<spdx-legal@lists.spdx.org>
Subject: Re: XML format is unsatisfactory

Quoting Richard Fontana (2024-01-15 06:22:47)
On Mon, Jan 15, 2024 at 12:01 AM Jonas Smedegaard<d...@jones.dk>
wrote:
Quoting Richard Fontana (2024-01-14 23:41:55)
On Sun, Jan 14, 2024 at 2:47 PM Jonas Smedegaard<d...@jones.dk>
wrote:
The XML files is a representation of RDF.

Another more human readable and editable RDF repræsentation
exists which is a *lossless* conversion: Turtle.

On a Debian-based system, you can see how MIT license looks as
Turtle by installing te package raptor2-utils, an then run this command:

   rapper -i rdfxml -Ohttp://spdx.org/licenses/  -o turtle
MIT.rdf
This sounds interesting but:

[ref@charlie ~]$ rapper -i rdfxml -Ohttp://spdx.org/licenses/  -o
turtle MIT.rdf
rapper: Parsing URI MIT.rdf with parser rdfxml
rapper: Serializing with serializer turtle and base URI
http://spdx.org/licenses/  @base<http://spdx.org/licenses/>  .
rapper: Error - URI MIT.rdf - Resolving URI failed: Could not
resolve
host: MIT.rdf
rapper: Failed to parse URI MIT.rdf rdfxml content @prefix rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  .

rapper: Parsing returned 0 triples
Sorry, I thought it was obvious but realize now that I should have
been
explicit: You need to `cd` to the path where the MIT.rdf is located
- i.e. you need to checkout the git repository for the sources first.
Ah, I see. So:

[ref@charlie rdfxml]$ rapper -i rdfxml -Ohttp://spdx.org/licenses/  -o
turtle MIT.rdf
rapper: Parsing URIfile:///home/ref/license-list-data/rdfxml/MIT.rdf
with parser rdfxml
rapper: Serializing with serializer turtle and base URI
http://spdx.org/licenses/  @base<http://spdx.org/licenses/>  .
@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  .
@prefix spdx: <../rdf/terms#> .
@prefix doap:<http://usefulinc.com/ns/doap#>  .
@prefix ptr:<http://www.w3.org/2009/pointers#>  .
@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>  .

<MIT>
     spdx:crossRef [
         spdx:isLive false ;
         spdx:isValid true ;
         spdx:isWayBackLink false ;
         spdx:match "N/A" ;
         spdx:order "0"^^<http://www.w3.org/2001/XMLSchema#int>  ;
         spdx:timestamp "2024-01-05T20:12:47Z" ;
         spdx:url"https://opensource.org/licenses/MIT";  ;
         a spdx:CrossRef
     ] ;
     spdx:isDeprecatedLicenseId false ;
     spdx:isFsfLibre true ;
     spdx:isOsiApproved true ;
     spdx:licenseId "MIT" ;
     spdx:licenseText """MIT License

Copyright (c) <year> <copyright holders>

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
\"Software\"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR
ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH
THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
""" ;
     spdx:licenseTextHtml """
       <div class=\"optional-license-text\">
          <p>MIT License</p>

       </div>
       <div class=\"replaceable-license-text\">
          <p>Copyright (c) &lt;year&gt; &lt;copyright holders&gt;
          </p>

       </div>

       <p>Permission is hereby granted, free of charge, to any person
obtaining a copy of <var class=\"replaceable-license-text\"> this
software and
          associated documentation files</var> (the
&quot;Software&quot;), to deal in the Software without restriction,
          including without limitation the rights to use, copy, modify,
merge, publish, distribute, sublicense,
          and/or sell copies of the Software, and to permit persons to
whom the Software is furnished to do so,
          subject to the following conditions:</p>

       <p>The above copyright notice and this permission notice
          <var class=\"optional-license-text\"> (including the next
paragraph)</var>
          shall be included in all copies or substantial
          portions of the Software.</p>

       <p>THE SOFTWARE IS PROVIDED &quot;AS IS&quot;, WITHOUT
WARRANTY
OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
          LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
          NO EVENT SHALL <var class=\"replaceable-license-text\"> THE
AUTHORS OR COPYRIGHT HOLDERS</var> BE LIABLE FOR ANY CLAIM,
DAMAGES OR
OTHER LIABILITY,
          WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE
          SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.</p>

     """ ;
     spdx:name "MIT License" ;
     spdx:standardLicenseTemplate """<<beginOptional>>MIT License

<<endOptional>> <<var;name=\"copyright\";original=\"Copyright (c)
<year> <copyright holders>  \";match=\".{0,5000}\">>

Permission is hereby granted, free of charge, to any person obtaining
a copy of <<var;name=\"software\";original=\"this software and
associated documentation
files\";match=\"this\\s+software\\s+and\\s+associated\\s+documentation
\\s+files|this\\s+source\\s+file\">>
(the \"Software\"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software,
and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice<<beginOptional>>
(including the next paragraph)<<endOptional>> shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT.
IN NO EVENT SHALL <<var;name=\"copyrightHolder\";original=\"THE
AUTHORS OR COPYRIGHT HOLDERS\";match=\".+\">> BE LIABLE FOR ANY
CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR
THE USE OR OTHER DEALINGS IN THE SOFTWARE.

""" ;
     a spdx:ListedLicense ;
     rdfs:seeAlso"https://opensource.org/licenses/MIT";  .

rapper: Parsing returned 19 triples

This doesn't really seem to be any better than (and actually seems
worse than) the license-list-XML file "MIT.xml" for purposes of
readability and maintainability.
How so?

Because it doesn't look like YAML?


  - Jonas

--
  * Jonas Smedegaard - idealist & Internet-arkitekt
  * Tlf.: +45 40843136  Website:http://dr.jones.dk/
  * Sponsorship:https://ko-fi.com/drjones

  [x] quote me freely  [ ] ask before reusing  [ ] keep private





Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0,www.intel.de  <http://www.intel.de>  Managing
Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva Chairperson
of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial
Register: Amtsgericht Muenchen HRB 186928












-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#3519): https://lists.spdx.org/g/Spdx-legal/message/3519
Mute This Topic: https://lists.spdx.org/mt/103724268/21656
Group Owner: spdx-legal+ow...@lists.spdx.org
Unsubscribe: https://lists.spdx.org/g/Spdx-legal/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to