Hi!

> My view is that this tool should be extremely cautious when it sees new data
> structures or fields.  The tool should certainly not continue to output
> facts without some indication that something is suspect, and preferably
> should refuse to produce output under these circumstances.

I don't think I agree. I find tools that are too picky about details
that are not important to me hard to use, and I'd very much prefer a
tool where I am in control of which information I need and which I don't
need.

> What can happen if the tool instead continues to operate without complaint
> when new data structures are seen?  Consider what would happen if the tool
> was written for a version of Wikidata that didn't have rank, i.e., claim
> objects did not have a rank name/value pair.  If ranks were then added,
> consumers of the output of the tool would have no way of distinguishing
> deprecated information from other information.

Ranks are a bit unusual because ranks are not just informational change,
it's a semantic change. It introduces a concept of a statement that has
different semantics than the rest. Of course, such change needs to be
communicated - it's like I would make format change "each string
beginning with letter X needs to be read backwards" but didn't tell the
clients. Of course this is a breaking change if it changes semantics.

What I was talking are changes that don't break semantics, and majority
of additions are just that.

> Of course this is an extreme case.  Most changes to the Wikidata JSON dump
> format will not cause such severe problems.  However, given the current
> situation with how the Wikidata JSON dump format can change, the tool cannot
> determine whether any particular change will affect the meaning of what it
> produces.  Under these circumstances it is dangerous for a tool that
> extracts information from the Wikidata JSON dump to continue to produce
> output when it sees new data structures.

The tool can not. It's not possible to write a tool that would derive
semantics just from JSON dump, or even detect semantic changes. Semantic
changes can be anywhere, it doesn't have to be additional field - it can
be in the form of changing the meaning of the field, or format, or
datatype, etc. Of course the tool can not know that - people should know
that and communicate it. Again, that's why I think we need to
distinguish changes that break semantics and changes that don't, and
make the tools robust against the latter - but not the former because
it's impossible. For dealing with the former, there is a known and
widely used solution - format versioning.

> This does make consuming tools sensitive to changes to the Wikidata JSON
> dump format that are "non-breaking".  To overcome this problem there should
> be a way for tools to distinguish changes to the Wikidata JSON dump format
> that do not change the meaning of existing constructs in the dump from those
> that can.  Consuming tools can then continue to function without problems
> for the former kind of change.

As I said, format versioning. Maybe even semver or some suitable
modification of it. RDF exports BTW already carry version. Maybe JSON
exports should too.
-- 
Stas Malyshev
smalys...@wikimedia.org

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to