The statement to reference relation problem also relates to the topic of
Metadata Reification which from what I can gather, not really addressed in
the current WDQS RDF approach.

In Blazegraph, this could be supported by Quads or RDR (Reification Done
Right).
See http://arxiv.org/pdf/1406.3399.pdf ,
https://wiki.blazegraph.com/wiki/index.php/Reification_Done_Right

One possible approach using triples for the use case could be to assign a
blank node to a reference placeholder and introduce the valid range class
for prov:wasDerivedFrom (prov:entity) with the canonical reference UUID
like this:

wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 prov:wasDerivedFrom _:refhash .

_:refhash
    a prov:entity, wikibase:Reference, wdref:referenceUUID ;
    pr:P7 "Some data" ;
    pr:P8 "1976-01-12T00:00:00Z"^^xsd:dateTime ;
    prv:P8 wdv:b74072c03a5ced412a336ff213d69ef1 .

Introducing a owl:minCardinality on prov:wasDerivedFrom would mean that if
there were no refhash for a statement than a null object (similar to wdno)
would identify "unreferenced statements" like this:

wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 prov:wasDerivedFrom
wikibase:nullRef .

There are a lot ways to deal with this issue, I guess.  But, it seems to me
that having a simple programmatic method to validate statement integrity
(as supported or unsupported claims) is very important to substantiating
the utility of Wikidata for the academic community.


On 28 November 2015 at 11:20, Christopher Johnson <
[email protected]> wrote:

> Thank you for the explanation.  The content negotion for an Item IRI is
> clear.  Any request for  http://www.wikidata.org/entity/Q... requires an
> Accept application/rdf+xml header in order to get the RDF.  The default
> response is JSON and Accept text/html returns a 200 response delivering the
> UI page.
>
> For statement resolution in the Item RDF, is not this a fragment?  So in
> the Item context, the IRI for a statement resource would be
> http://www.wikidata.org/entity/Q16521#Statement_UUID. Otherwise, the
> statement IRI http://www.wikidata.org/entity/statement/Statement_UUID
> could just return the statement as a separate entity.
>
> On the topic of references, a use case is to measure data quality by
> counting the number of "unreferenced statements".  At
> https://phabricator.wikimedia.org/T117234#1834728, I propose the
> possibility of using blank reference nodes to identify these "bad"
> statements.  Having an object to count greatly expedites the query process
> because of the estimated cardinality feature of Blazegraph.  The only
> alternative to this is to count distinct statements with the
> prov:wasDerivedFrom predicate, and this is extremely slow (in fact, it may
> not be possible without a huge amount of memory).
>
> I do not know what would be involved in implementing blank reference nodes
> and what performance consequences may also occur. It seems to me that the
> pairing of statements and references is a core feature of the data model,
> and it is odd that there can exist statements that have no associated
> reference node in the RDF.
>
> Cheers,
> Christopher
>
> On 27 November 2015 at 13:00, <[email protected]>
> wrote:
>
>> Send Wikidata-tech mailing list submissions to
>>         [email protected]
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>> or, via email, send a message with subject or body 'help' to
>>         [email protected]
>>
>> You can reach the person managing the list at
>>         [email protected]
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Wikidata-tech digest..."
>>
>>
>> Today's Topics:
>>
>>    1. RDF Item, Statement and Reference IRI Resolution?
>>       (Christopher Johnson)
>>    2. Re: RDF Item,     Statement and Reference IRI Resolution?
>>       (Markus Krötzsch)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 27 Nov 2015 07:21:10 +0100
>> From: Christopher Johnson <[email protected]>
>> To: [email protected],  wikimedia-de-tech
>>         <[email protected]>
>> Subject: [Wikidata-tech] RDF Item, Statement and Reference IRI
>>         Resolution?
>> Message-ID:
>>         <CACzuuKvGK1dM1+dn4ypocjhO=
>> [email protected]>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi,
>>
>> After looking at the RDF format closely, I am asking if the item,
>> statement
>> and reference IRIs could/should be directly resolvable to XML/JSON
>> formatted resources.
>>
>> It seems that currently http://www.wikidata.org/entity/.... redirects to
>> the UI at https://www.wikidata.org/wiki/ which is not what a machine
>> reader
>> would expect.
>> Without a simple method to resolve the IRIs (perhaps a RESTful API?),
>> these
>> RDF data objects are opaque for parsers.
>>
>> Of course, with wbgetclaims, it is possible to get the statement like
>> this:
>>
>> https://www.wikidata.org/w/api.php?action=wbgetclaims&format=xml&claim=Q20913766%24CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
>>
>> but the API expected GUID format does not match the RDF UUID
>> representation
>> (there is a $ or "%24" after the item instead of a -) and it returns both
>> the statement and the references.
>>
>> Since the reference is its own node in the RDF,  it can be queried
>> independently.  For example, to ask "return all of the statements where
>> reference R is bound."  But then, the return value is a list of statement
>> IDs and a subquery or separate query is then required to return the
>> associated statement node.
>>
>> I am also wondering why item, statement and reference "UUIDs" are not in
>> canonical format in the RDF.  This is a question of compliance with IETF
>> guidelines, which may or may not be relevant.
>>
>> Item: Q20913766
>> Statement: Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
>> Reference: 39f3ce979f9d84a0ebf09abe1702bf22326695e9
>>
>> See: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
>> See: http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml
>> and http://tools.ietf.org/html/rfc4122 for information on urn:uuid
>> guidelines.
>>
>> Thanks for your feedback,
>> Christopher
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <
>> https://lists.wikimedia.org/pipermail/wikidata-tech/attachments/20151127/488f3d30/attachment-0001.html
>> >
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Fri, 27 Nov 2015 10:21:22 +0100
>> From: Markus Krötzsch <[email protected]>
>> To: Wikidata technical discussion <[email protected]>,
>>         wikimedia-de-tech <[email protected]>
>> Subject: Re: [Wikidata-tech] RDF Item,  Statement and Reference IRI
>>         Resolution?
>> Message-ID: <[email protected]>
>> Content-Type: text/plain; charset=utf-8; format=flowed
>>
>> On 27.11.2015 07:21, Christopher Johnson wrote:
>> > Hi,
>> >
>> > After looking at the RDF format closely, I am asking if the item,
>> > statement and reference IRIs could/should be directly resolvable to
>> > XML/JSON formatted resources.
>> >
>> > It seems that currently http://www.wikidata.org/entity/.... redirects
>> to
>> > the UI at https://www.wikidata.org/wiki/ which is not what a machine
>> > reader would expect.
>>
>> This interface actually supports content negotiation. If you open it in
>> a browser, it redirects to HTML, but an RDF client can request RDF and
>> will get this. There is no RDF/JSON export AFAIK (maybe it was a typo
>> above?).
>>
>> It may also be that auxiliary nodes (such as statements and references)
>> do not resolve, but resolving the items will always return enough RDF
>> context to get all data. Resolving statements would be easy by mapping
>> them to the item data (returning more data is always ok in RDF). This is
>> possible since the statement IDs are prefixed by the item id. For
>> references, it might be harder to implement this, since you cannot
>> reverse the hash to find the item. This might remain open for a while,
>> since it is more implementation effort.
>>
>> > Without a simple method to resolve the IRIs (perhaps a RESTful API?),
>> > these RDF data objects are opaque for parsers.
>> >
>> > Of course, with wbgetclaims, it is possible to get the statement like
>> this:
>> >
>> https://www.wikidata.org/w/api.php?action=wbgetclaims&format=xml&claim=Q20913766%24CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
>> >
>> > but the API expected GUID format does not match the RDF UUID
>> > representation (there is a $ or "%24" after the item instead of a -) and
>> > it returns both the statement and the references.
>>
>> Yes, using the MediaWiki API will not be a suitable alternative to
>> getting linked RDF. Let's not go into this.
>>
>> >
>> > Since the reference is its own node in the RDF,  it can be queried
>> > independently.  For example, to ask "return all of the statements where
>> > reference R is bound."  But then, the return value is a list of
>> > statement IDs and a subquery or separate query is then required to
>> > return the associated statement node.
>>
>> Yes, resolving statement ids has some utility. I hope it works already.
>> Otherwise it can be made to work without too much effort.
>>
>> As a temporary workaround for all of this, note that the SPARQL endpoint
>> can be (ab)used as a linked data source to fetch data for any IRI
>> present in the data.
>>
>> >
>> > I am also wondering why item, statement and reference "UUIDs" are not in
>> > canonical format in the RDF.  This is a question of compliance with IETF
>> > guidelines, which may or may not be relevant.
>> >
>> > Item: Q20913766
>> > Statement: Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
>> > Reference: 39f3ce979f9d84a0ebf09abe1702bf22326695e9
>> >
>> > See: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
>> > See:
>> http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml
>> > and http://tools.ietf.org/html/rfc4122 for information on urn:uuid
>> > guidelines.
>>
>> The IDs used in RDF are simply the ids used in the database. The RDF
>> export is not aware of the concept of UUID that was an inspiration (but
>> apparently not an exact model) for the way in which the database is
>> generating its ids. If Wikibase internally switches to canonical UUIDs,
>> this will directly show in the RDF.
>>
>> Best regards,
>>
>> Markus
>>
>> >
>> > Thanks for your feedback,
>> > Christopher
>> >
>> >
>> > _______________________________________________
>> > Wikidata-tech mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>> >
>>
>>
>>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> Wikidata-tech mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>>
>>
>> ------------------------------
>>
>> End of Wikidata-tech Digest, Vol 31, Issue 5
>> ********************************************
>>
>
>
_______________________________________________
Wikidata-tech mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Reply via email to