Re: [Wikidata] Scaling Wikidata Query Service

2019-06-26 Thread Eric Prud'hommeaux
On Mon, Jun 17, 2019 at 09:41:51PM +0200, Finn Aarup Nielsen wrote:
> 
> Changing the subject a bit:
> 
> I am surprised to see how many SPARQL requests go to the endpoint when
> performing a ShEx validation with the shex-simple Toolforge tool. They are
> all very simple and quickly complete. For each Wikidata item tested, one of
> our tests [1] requests tens of times. That is, testing 100 Wikidata items
> may yield thousands of requests to the endpoint in rapid succession.
> 
> I suppose that given the simple SPARQL queries, these kinds of requests
> might not load WDQS very much.

It's true; they require no joins are are designed to be answerable by
only looking at the index. That said, given that they offer virtually
no load, running them with API access to the Blaze getStatements() [2]
would make validation thousands of times faster and eliminate parsing
and query planning time on the SPARQL server.


> [1] 
> https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/shex-simple.html?data=Endpoint:%20https://query.wikidata.org/sparql=[]=%2F%2Fwww.wikidata.org%2Fwiki%2FSpecial%3AEntitySchemaText%2FE65
[2] 
https://www.programcreek.com/java-api-examples/?class=org.eclipse.rdf4j.repository.RepositoryConnection=getStatements

> Finn
> http://people.compute.dtu.dk/faan/
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Shape Expressions arrive on Wikidata on May 28th

2019-06-02 Thread Eric Prud'hommeaux
On Fri, May 31, 2019 at 08:02:03PM +0200, Thomas Tanon wrote:
> Hi James,
> 
> I'm not the best person to answer your questions (I've never actually used
> ShEx or contributed to it) but I hope to be able to answer your questions.
> 
> > I'm not clear whether there is an RDF representation of ShEx that could
> be added to WDQS -- this is a point that would be useful to clarify;
> and, if not, whether work is going on in this direction.
> 
> TL;DR: yes but it does not seem much used at the moment.
> 
> ShEx specifies [1] two syntaxes, ShExC that is the "plain text" syntax used
> on Wikidata and ShExJ that is based on JSON-LD. JSON-LD beeing an RDF
> seralization, there is indeed an RDF representation of ShEx. To get plain
> triples one could use the JSON-LD to RDF triples conversion algorithm that
> is implemented in most JSON-LD libraries. The old ShEx documentation pages
> refers to a ShExR serialization of ShEx to RDF but I believe it has been
> dropped in favor of ShExJ+JSON-LD to RDF conversion.

We actually do test the ShExR with a Turtle representation. We have
~500 round-trip tests between the three representations:
  

It would be pretty easy to add a translation to RDF in the backend of
a schema update. That would give you all the clever meta queries you'd
like for dependency-checking and auditing without making people edit
triples.


> > It's not immediately clear to me whether SHACL adapts easily to
> memberships defined by eg P31 "instance on" or P279 "subclass of"
> statements, etc; also memberships possibly further defined or limited by
> other statements.
> 
> Indeed Shacl 1.0 does not seem to be able to express it. There is an
> extension [1] that allows to specify targets using a sparql query, just
> like what is done with the ShEx playground using the focus nodes sparql
> query.
> 
> Thomas
> 
> [1] http://shex.io/shex-semantics/
> [2]
> https://www.w3.org/2018/jsonld-cg-reports/json-ld-api/#deserialize-json-ld-to-rdf-algorithm
> [3] https://www.w3.org/TR/shacl-af/#SPARQLTarget
> 
> Le ven. 31 mai 2019 à 11:06, James Heald  a écrit :
> 
> > On 30/05/2019 17:45, Benjamin Good wrote:
> > > I'd like to restate the initial question.
> > >
> > > Why did wikidata choose shex instead of other approaches?
> > >
> > >  From this very detailed comparison
> > > http://book.validatingrdf.com/bookHtml013.html  (thank you Andra!) I
> > could
> > > see arguments in both directions.  I'm curious to know what swayed the
> > > wikidata software team as my group is currently grappling with the same
> > > decision.
> > >
> >
> >
> > One of the key differences would seem to be that SHACL has been
> > deliberately constructed to directly representable in RDF -- so a SHACL
> > expression could be put straight into WDQS and made queryable for what
> > it pertains to.
> >
> > I'm not clear whether there is an RDF representation of ShEx that could
> > be added to WDQS -- this is a point that would be useful to clarify;
> > and, if not, whether work is going on in this direction.
> >
> > IMO, it would be a very important asset to be able to query the Shape
> > specifications using SPARQL -- querying not for compliance, but for what
> > the specifications actually contain.
> >
> >
> >
> > On the other hand, SHACL seems very strongly based on shapes for members
> > that are connected by an "is a" relationship.
> >
> > It's not immediately clear to me whether SHACL adapts easily to
> > memberships defined by eg P31 "instance on" or P279 "subclass of"
> > statements, etc; also memberships possibly further defined or limited by
> > other statements.
> >
> > It would seem a basic requirement, but I didn't see it on a first quick
> > skim-read.
> >
> >-- James.
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >

> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata