In the process of creating my tutorial I've found some potential issues in the 
SPDX spec.  Below I argue that the license-expression grammar in Appendix IV 
seems to have at least one bug, it fails to discuss whitespace, and is overly 
complicated and confusing when you try to read it carefully.  These aren't 
show-stoppers - the intent seems clear - but I think these issues should be 
addressed.

--- David A. Wheeler

==============================

First, I think there's a bug in compound-expression, which currently says 
"1*(....)".
According to the ABNF spec <http://tools.ietf.org/html/rfc5234> that means "1 
or more".
That means that sequential expressions are allowed, e.g., this would be a legal 
license expression:
  MIT 0BSD GPL-2.0
I don't think the *intent* was to allow adjacent expressions without 
connectives like "AND" and "OR".
A quick fix would be to declare that the "1*" there was supposed to be "1*1".

Second, the spec currently fails to discuss whitespace. How about this, which I 
think covers the intent:
- There MUST NOT be whitespace within an idstring, license-id, 
license-exception-id, and license-ref.
- There MUST NOT be whitespace between a license-id and any following "+".  
This supports easy parsing and backwards compatibility.
- There MUST be whitespace on either side of the operator "WITH".
- There MUST be whitespace and/or parentheses on either side of the operators 
"AND" and "OR".
- There MAY be one or more whitespace elements elsewhere in a 
license-expression. 

Third, the current Appendix IV is also overly complex and confusing:
* There's no need to have "compound-expression" as separate from 
"license-expression".
The "license-expression" is defined to be either simple or compound, but a 
simple-expression is also
a legal compound-expression, so the whole indirection is unnecessary and 
confusing.
* In simple-expression, the "+" should just optionally follow license-id; 
that's how anyone would parse it,
and it's easier to explain too.

So I suggest replacing simple-expression, compound-expression (to be removed), 
and license-expression with this simpler spec:

simple-expression = license-id ["+"] / license-ref
license-expression =   simple-expression [ "WITH" license-exception-id ] /
  license-expression "AND" license-expression /
  license-expression "OR" license-expression /
  "(" license-expression ")"

You could change simple-expression to be:
simple-expression = license-id ["+"][ "WITH" license-exception-id ] / 
license-ref
and omit the ["WITH...] in the following line, but I *like* the idea of allowing
a license-ref with a standard exception. Besides, that's currently allowed, no 
reason
to *remove* this functionality.
Both this and the original description are silent about left-to-right or 
right-to-left;
I don't think it matters, but if someone wants things to be parsed identically, 
perhaps that
should be mentioned.

I can imagine adding suffixes like "!" (I'm *sure* it's only this particular 
version of the license) or
"?" (I'm *not* sure that it's limited to this particular version of the 
license), in addition to "+".
However, that's a separate discussion.

--- David A. Wheeler

_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to