On Mon, Nov 27, 2017 at 08:49:08PM +0000, Wheeler, David A wrote:
> [email protected]:
> > - Do we agree the "OR-MAYBE" should be added?
>
> I agree…

Philippe's recent points about weighted confidence (e.g. [1]) suggests
that, even if we decide to support incomplete conclusions, an
unweighted list of alternatives may not be sufficient.  In that case,
we may want something like:

  binary-confidence-expression-operator = "AND"
  confidence-expression = license-expression space "CONFIDENCE" space "0." 
1*DIGIT
  confidence-list = confidence-expression *(space confidence-expression) [space 
license-expression]
                  / confidence-list space binary-confidence-expression-operator 
space confidence-list
                  / license-expression

where license-expression, space, and DIGIT are discussed in [2].  The
sum of confidence weights would have to sum to something ≤ 1.  ‘AND’
would have the same conjunctive semantics as the current
license-expression operator, but we don't want to support disjunctive
OR for confidence lists.

The ‘[space license-expression]’ (optional trailing
license-expression) has an implicit ‘CONFIDENCE {1 -
sum_of_previous_confidences}’, for folks who don't trust their math or
want to save a few characters.

The ‘/ license-expression’ case has an implicit ‘CONFIDENCE 1’ for
backwards compatibility with existing license-expression consumers who
choose to upgrade to confidence-list.

Then folks consuming confidence-list could use:

  GPL-2.0-only CONFIDENCE 0.95 GPL-2.0-or-later

For “I am 95% sure this is GPL-2.0-only but it could be
GPL-2.0-or-later” with the implicit 5% confidence for
GPL-2.0-or-later.

> > - Should we disallow "OR-MAYBE" in declared license fields (it
> >   would only be used in concluded license fields)?
>
> No.  Projects sometimes get inherited from others where the license
> isn't clear to start with, so it needs to be *possible* to declare
> ambiguities.  Of course, a *declaration* using "OR MAYBE" should
> concerning, but that helps potential users know where to dig in.

Keeping a separate ABNF rule for license-expression allows consumers
to choose between license-expression and confidence-list as they see
fit.  But yeah, the “inherited project” case is a good reason for
allowing confidence-list (or whatever we use for partial conclusions)
in declared-license fields.

> > - What is the exact definition of the "OR-MAYBE" we would include
> >   in the spec?
>
> For "OR MAYBE", in the definition of compound-expression, change:
>                  compound-expression "OR" compound-expression ) /
> to:
>                  compound-expression "OR" ["MAYBE"] compound-expression ) /
>
> If you want a MAYBE prefix to be allowed anywhere, you could change:
>   compound-expression =  1*1(simple-expression /
> to:
>   compound-expression =  ["MAYBE"] 1*1(simple-expression /
>
> The latter allows MAYBE as a prefix in general, in case you have no
> confidence in *anything*.

The CONFIDENCE approach allows you to handle that case with:

  GPL-2.0-only CONFIDENCE 0.90

for “I'm 90% sure this is GPL-2.0-only, and am not expressing an
opinion on the 10% alternatives”.  Using an OR-MAYBE like:

  binary-alternatives-operator = "AND"
  alternatives = license-expression *(OR-MAYBE license-expression)
               / alternatives space binary-alternatives-operator space 
alternatives

would not support weighting.  But with [3], you could represent that
case with:

  GPL-2.0-only OR-MAYBE NOASSERTION

So I don't see an upside to a separate MAYBE.  It might work with
clear precedence rules, but without them:

  APACHE-2.0 OR GPL-2.0-only OR MAYBE GPL-2.0-or-later

could mean ‘APACHE-2.0 OR GPL-2.0-only OR (MAYBE GPL-2.0-or-later)’:

  A disjunctive choice between ‘APACHE-2.0’, ‘GPL-2.0-only’, and
  something that I haven't been able to figure out yet but which might
  be ‘GPL-2.0-or-later’”.

or it could mean ‘(APACHE-2.0 OR GPL-2.0-only) OR MAYBE GPL-2.0-or-later’:

  This might be ‘APACHE-2.0 OR GPL-2.0-only’, but I'm not sure.  It
  might also be ‘GPL-2.0-or-later’.  I haven't been able to figure out
  which yet.

depending on whether MAYBE had a higher precedence than OR or not.

With the former interpretation, you're safe if you want to use the
code under APACHE-2.0 or if you want to use it under GPL-2.0-only.
With the latter interpretation, you're only safe if you want to use
the code under GPL-2.0-only (since that's also a subset of
GPL-2.0-or-later).

Even with OR-MAYBE, precedence for AND is going to be complicated (and
will decide whether a given AND is acting as a license expession
operator or an alternative operator).  But using a hyphenated OR-MAYBE
at least avoids that confusion for OR.

Comparing OR-MAYBE with CONFIDENCE, the only actionable use I can
think of for weighting is a vendor with a report of confidence lists
for various components of their software.  They might decide to
prioritize digging into the component with the least-confident
assertion.  But they might also want to prioritize based on lines of
code under the unclear license, or on the importance of the particular
lines.  For example, say you have a product with:

  10k lines of core code under ‘GPL-3.0-only’
   1k lines of core code under ‘GPL-2.0-or-later CONFIDENCE 0.9 GPL-2.0-only’
  100 lines of build script under ‘MIT CONFIDENCE 0.5 NONE’
   10 lines of build script under ‘MIT CONFIDENCE 0.1 NONE’

where NONE is [4].  What would the project be?

  GPL-3.0-only AND
  (GPL-2.0-or-later CONFIDENCE 0.9 GPL-2.0-only) AND
  (MIT CONFIDENCE 0.5 NONE) AND
  (MIT CONFIDENCE 0.1 NONE)

would it be:

  GPL-3.0-only AND
  (GPL-2.0-or-later CONFIDENCE 0.9 GPL-2.0-only) AND
  (MIT CONFIDENCE 0.4636 NONE)

using line-count weights (or similar) to combine the two ‘MIT OR-MAYBE
NONE’ cases?

Either way, that's probably going to focus people on build script
(“reasonable chance that this is not open code at all!”), but they may
instead want to focus on the core code (“we think copy/pasting 110
lines could be fair use, but we don't want to waste time on those 1k
lines of possibly GPL-2.0-only code if we can't link them with the 10k
GPL-3.0-only code”).  And we don't weight AND, so it's not clear to me
how actionable CONFIDENCE values would be for product-level
composites.  Still, scancode-toolkit [1,5] and licensee [6] both
decided to set it, so I don't want to drop it without understanding
how it's used.  My impression based on [7,8] is that both of these are
tunables for the tool-user, and that the tool-authors don't expect
them to be passed up the chain to folks reading compound confidence
lists, but it's worth getting more feedback from the tool authors on
that.

And I'm also fine with leaving a partial-conclusion syntax out of the
spec, and punting it to higher levels and third parties.

[1]: https://lists.spdx.org/pipermail/spdx-legal/2017-November/002351.html
     Subject: Re: update on only/or later etc.
     Date: 2017-11-22
     Message-ID: 
<caofm3ufffitvk-wk_to3zqwpgr6vd+r-26hrucnq8mnbzx2...@mail.gmail.com>
[2]: 
https://github.com/wking/spdx-spec/blob/922031a89e7f7dca19f20d17005d0f3feeb95af5/chapters/appendix-IV-SPDX-license-expressions.md#IV.2
     https://github.com/spdx/spdx-spec/pull/37
[3]: https://github.com/spdx/spdx-spec/issues/50
     Subject: Add “NOASSERTION” to the license expression syntax
[4]: https://github.com/spdx/spdx-spec/issues/49
     Subject: Add “NONE” to the license expression syntax
[5]: 
https://github.com/nexB/scancode-toolkit/blame/v2.2.1/src/licensedcode/README.rst#L140-L141
[6]: 
https://github.com/benbalter/licensee/blob/v9.6.0/docs/usage.md#command-line-usage
[7]: https://github.com/nexB/scancode-toolkit/issues/342
     Subject: Bare CPOL license detection rule detection issue
[8]: https://github.com/benbalter/licensee/pull/212
     Subject: Fix for FCPL false positive

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to