A expandable collection of examples: https://www.wikidata.org/wiki/User:Markus_Kr%C3%B6tzsch/Nemo_examples
The intro from my email: https://www.wikidata.org/wiki/User:Markus_Kr%C3%B6tzsch/Nemo_for_Wikidata Cheers, Markus On 22.10.25 07:40, Markus Krötzsch wrote:
TLDR: We present Nemo as a new Wikidata query tool that can answer queries, extracts subsets, and perform analyses in ways that SPARQL alone can't. It also lets you combine Wikidata with other data sources. Dear all, Nemo [1] is a graph rule engine that can be used to query and process data (in many forms, online or offline). It's free and open source [2], and there is a no-install Web application to use it: https://tools.iccl.inf.tu-dresden.de/nemo/ As an early birthday present, we have just released Nemo v0.9, which adds features that make Nemo a useful tool for working with Wikidata content in new ways. This email is a short(ish) intro and teaser towards this -- feedback is very welcome. ## What does Nemo do? Think of it as an upgrade to the SPARQL query service, with the following differences: - You can do more powerful data transformations that would timeout in SPARQL or not be possible at all - You can use and combine data from multiple sources (Wikidata SPARQL results, RDF, CSV, local files or online data) - Processing in part happens on your computer, avoiding timeouts - You can run Nemo in a browser (easy) or on the command line (for heavier jobs) Nemo still lets you focus on the data, hiding technicalities and low-level issues. It's more than SPARQL, but much simpler than Python. ## How does that work? You write "queries" -- or rather little "programs" -- in a simple language based on if-then rules. Here is an example that uses no external data at all: https://tinyurl.com/2muju6sy (find common ancestors of two people) Technically, this is a logic program in (a variant of) Datalog. Using a few more Nemo features, you can use such rules with Wikidata content: https://tinyurl.com/2mzfutcj (find common ancestors of Ada and Moby) Btw you can share any Nemo program by sharing a link (the URL updates as you type). ## Slow down, I never heard of "Datalog". How do I read this? It's actually quite simple. Data is represented in "facts" such as "father(Alice, Bob)", which we could use to say that Alice has father Bob. A bit like triples in RDF/SPARQL, but you can have any number of parameters (as in, say, "degree(Alice, MSc, Physics, 2025, TUDresden)"). Facts are used to compute new facts using rules like this: uncle(?child, ?bro) :- parent(?child, ?p), brother(?p, ?bro) . The ?... parts are variables, ":-" means "IF", and "," means "AND". So the rule says: ?child has uncle ?bro IF ?child has a parent ?p AND ?p has a brother ?bro. In a way, rules are like simple SPARQL query patterns, the result of which you store as new facts. The power of Datalog is that you can use these facts in future rule applications, producing more information step by step rather than in one huge SPARQL query. ## Why not just use SPARQL? The Ada/Moby example above can also be solved by a SPARQL query, though the query will time out on WDQS. However, Nemo can also do things that are outright impossible even with the most powerful SPARQL services. The "Examples" button on the Web app shows some of the possibilities: - Query for things that SPARQL cannot do in principle, such as the longest winning streak of your favourite sports team ("Winning streaks in sports") - Combine third-party data with Wikidata on the fly ("Old trees", "CO2 emitting countries") - Do multi-step analyses that would be very complex to express in SPARQL ("Empty classes in Wikidata") - Directly query RDF data without a SPARQL service ("Wikipedia articles vs. labels") ## What's behind it? At its heart, Nemo is an in-memory data processing engine, written in Rust. The data model is relational, but weakly typed (like RDF, CSV, and JSON) rather than strongly typed (like SQL). The Web app runs locally, in your browser. Your program and any local data you might use (with "Add input files") will not be uploaded anywhere [3]. Even in the browser, it is feasible to work with larger files (millions of facts), but there are limits (don't try to import the whole Wikidata dump there). For SPARQL, Nemo tries to optimise by querying only for the values that your program needs. This is why some of the examples can import from SPARQL queries like "?s ?p ?o" without actually downloading all of Wikidata. Nemo runs an extension of Datalog enriched with SPARQL-style datatypes and "filter" functions, aggregates, and negation (both must be stratified, i.e., used in non-recursive ways). As usual in Datalog, the order of rules does not matter at all (although the examples are all ordered following the "natural" processing pipeline). This "declarativity" allows Nemo to automatically optimise rule applications and data imports. Some more academic documentation is found on our publication page: https://github.com/knowsys/nemo/wiki/Publications ## Limitations? Future plans? Loads (of both). Key limitations from a Wikidata perspective include missing support for dates and geocoordinates (workaround: use SPARQL to decompose these into several numbers). You might also find that more data processing functions should be implemented (let us know). The web app could benefit from richer result display and downloading options. In the mid term, we plan to support more data formats, notably JSON, for native import. We also look into programming features to structure longer programs. However, we would also like to hear back from you to decide where to go next. We have a detailed handbook [4] but more Wikidata-related materials and tutorials might be desirable. Again, let us know what you think. Nemo is a university-based OSS project and still a prototype, so bear with us if you discover bugs. We will try to answer your queries asap, and we also have a public user chatroom [5]. Thanks are due to all contributors [6], and for v0.9.0 especially to Alex Ivliev, Lukas Gerlach, and Maximilian Marx. Cheers, Markus [1] https://knowsys.github.io/nemo-doc/ [2] https://github.com/knowsys/nemo [3] However, if you use Nemo with data from SPARQL, then some data might be sent to the SPARQL endpoint (your SPARQL query for a start, but possibly also specific data values your program needs data for). [4] https://knowsys.github.io/nemo-doc/ [5] https://gitter.im/nemo/community or simply #nemo_community:gitter.im [6] https://github.com/knowsys/nemo/graphs/contributors -- Prof. Dr. Markus Kroetzsch Knowledge-Based Systems Group Faculty of Computer Science TU Dresden +49 351 463 38486 https://kbs.inf.tu-dresden.de/ _______________________________________________ Wikidata mailing list -- [email protected] Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/QCJCQAA7XDK34Y5S2U5BQMGDCQYFKEJG/ To unsubscribe send an email to [email protected]
-- Prof. Dr. Markus Kroetzsch Knowledge-Based Systems Group Faculty of Computer Science TU Dresden +49 351 463 38486 https://kbs.inf.tu-dresden.de/
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Wikidata mailing list -- [email protected] Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/TC2FWNVKPGDYVXCXQZ5AEOJPWBXXJNHG/ To unsubscribe send an email to [email protected]
