Re: [Wikidata-tech] Thoughts on (not) exposing a SPARQL endpoint

Thomas Tanon Tue, 10 Mar 2015 10:22:27 -0700

I support Magnus point of view. WDQ is a very good proof of concept but is, I 
think, to limited to be the primary language of the Wikidata query system.


A possible solution is maybe to support two query languages "as primary":
1 WDQ, at first, in order to have something working quickly
2 A safe subset of SPARQL (if it is possible) that would be implemented later 
using the experience got form the deployment of the first version of the query 
system. Or, if it is not possible, an improved version of WDQ that would break 
its current limitations.

With that I think we have the best of the two worlds:
1. A simple language (WDQ) that allows a short road to production and keeps 
compatibility with previous systems.
2. A powerful language for advanced uses.
3. Having from scratch the assumption that more than one query language may be 
used, assumption that may be very useful in the future if we want to change 
again.

Thomas

> Le 10 mars 2015 à 18:01, Magnus Manske <[email protected]> a écrit :
> 
> Some thoughts:
> * Either way, there will be a WDQ-like wrapper around SPARQL, maybe as the 
> official interface, maybe only at the current WDQ URL (and I'll have to read 
> up on SPARQL to write that, so if someone else writes it for me, all the 
> better!)
> * WDQ syntax is very limited (no references, no variables, etc), but it 
> covers a large amount of use cases at this point in time
> * A WDQ wrapper could add some sought-after functionality quite easily 
> (regular expression label matching comes to mind), but it is probably not a 
> long-term solution, given its limitations
> * A WDQ-syntax interface would be a great proof-of-concept that the new 
> solution can, at the very least, do what the current one does
> 
> 
> On Tue, Mar 10, 2015 at 4:06 PM Daniel Kinzler <[email protected]> 
> wrote:
> Am 10.03.2015 um 16:47 schrieb Markus Krötzsch:
> > Hi Daniel,
> >
> > I can understand your thoughts to some extent, but they seem to apply to any
> > potential solution. Committing to a primary query interface will always be,
> > well, a committment. Because of this, I think the big advantage of SPARQL is
> > exactly that it is a technology standard that is not depending on a specific
> > tool. If you want to minimize lock-in and be maximally future-safe, this 
> > seems
> > to be a good thing.
> 
> Committing the the broadest possible interface, even if it's a standard, is 
> the
> problem I see, because it makes swapping out the backend close to impossible. 
> I
> propose committing to an interface that is as narrow as it can be for our use
> case. That's general best practice in system design, I believe.
> 
> Note that we are not only committing to a (standardized, but very complex) 
> query
> language, but also to our data mapping. WDQ would abstract from that, and give
> us wiggle room to adjust the mapping later.
> 
> > I would certainly not support the use of a
> > tool-specific query language that is not specified anywhere but in running 
> > code.
> 
> Of course the language would need to be well specified, and modified in 
> places.
> We'd want a production grammar, and a decent parser (recursive descend, 
> probably).
> 
> > WDQ is great but it is a custom API of a single tool rather
> > than a query language.
> 
> It would be our Domain Specific Language. There's a lot to be said for DSLs, 
> if
> they are well documented.
> 
> > * "WDQ would go away": That's not a worry I have at all. It will be easy to
> > write an adaptor for WDQ to SPARQL and to keep up the service as it is for a
> > long time.
> 
> That is exactly what I'm proposing. I'd just say that the WDQ version would 
> the
> canonical one, while the SPARQL one would be considered raw/unstable, like the
> SQL databases on labs.
> 
> > * "SPARQL would be too expressive, or could have non-standard extensions 
> > that
> > are hard to support in the future": This can be addressed in two ways. The 
> > soft
> > way is to document clearly which features are supported, and to maintain
> > backwards compatibility only wrt these.
> 
> This documentation is unlikely to be complete, and people will use what ever
> "works now", and complain when it breaks. They *will* use vendor specific
> features and optimizations, even if you tell them they shouldn't. And there 
> will
> be trouble when they break.
> 
> > The hard (as in firm, not as in
> > difficult) way is to restrict queries to use only such a limited set of 
> > "safe"
> > features. This is easy to do, since SPARQL query parsing and reformulation 
> > is
> > already part of any DBMS that supports such queries, and it would be easy to
> > hook into this process to restrict queries without any notable performance
> > overhead. This would minimize vendor lock-in, since one would only commit 
> > to (a
> > subset of) the fully standardized features.
> 
> That is the plan for sandboxing SPARQL. It's doable, but not easy. 
> Implementing
> "safe" WDQ on top of SPARQL is going to be simpler and quicker, I think. It 
> will
> give us a public query interface *faster*.
> 
> > With both of these in place, your concerns should be addressed without us 
> > having
> > to build our own query language from scratch (including parsers, 
> > preprocessors,
> > optimizers, user documentation, ...).
> 
> With WDQ on to of SPARQL, we need a parser and a SPARQL emitter, that's it.
> Documentation is already there (well, to a degree), and optimization is 
> provided
> by the SPARQL endpoint.
> 
> > Moreover, both of these can be added at
> > any stage of the project, so we are not blocked now by having to decide all 
> > of
> > these details. Right now the main priority should be to get something 
> > running
> > rather than to go back to the drawing board.
> 
> Yes, absolutely, but what we make available publicly
> 1) has to be safe - I believe this is easier and faster to do with WDQ.
> 2) should be future proof - again, easier with WDQ, because it's more
> restrictive and domain specific. It allows us to change the underlying mapping
> or technology. SPARQL doesn't easily.
> 
> 
> In any case, I'm not saying we shouldn't make a SPARQL endpoint available at
> all. I'm saying it should not be the canonical query interface, but rather a
> "raw" query interface. That would give us a lot more headroom to change things
> later, without breaking a lot of 3rd party code.
> 
> -- daniel
> 
> --
> Daniel Kinzler
> Senior Software Developer
> 
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
> 
> _______________________________________________
> Wikidata-tech mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
> _______________________________________________
> Wikidata-tech mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Wikidata-tech mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] Thoughts on (not) exposing a SPARQL endpoint

Reply via email to