[Wikidata-bugs] [Maniphest] T270764: Wikidata Truthy dump is missing important metadata triples
mkroetzsch added a comment. @Lydia_Pintscher Are you asking about the discrepancy in the counts, or about the general idea of this issue report? I must admit thatI do not get the significance of the SPARQL queries above. The missed properties seem to exist and work as expected on query.wikidata.org, so one does not need expensive group by queries to compute them (and I think this was the main point of adding them). I don't see a reason why one should not add some such aggregates, which are already available and can be exposed with an established vocabulary, to the truthy dump as well, other than a general concern that the dump would get too big when going down that road. TASK DETAIL https://phabricator.wikimedia.org/T270764 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Lydia_Pintscher, mkroetzsch, Nicksinch, Aklapper, VladimirAlexiev, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints
mkroetzsch added a comment. In T244341#5862287 <https://phabricator.wikimedia.org/T244341#5862287>, @Jheald wrote: > Please don't think or refer to the blank nodes as "unknown values". I fully agree. The use of the word "unknown" in the UI was a mistake that stuck. The intention was always to mean "unspecified" without any epistemic connotation. That is: an unspecified value only makes a positive statement ("there is a value for this property") and no negative one ("we [who exactly?] do not know this value"). TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints
mkroetzsch added a comment. Hi, Using the same value for "unknown" is a very bad idea and should not be considered. You already found out why. This highlights another general design principle: the RDF data should encode meaning in structure in a direct way. If two triples have the same RDF term as object, then they should represent relationships to the same thing, without any further conditions on the shape of that term. Otherwise, SPARQL does not work well. For example, the property paths you can write with * have no way of performing extra tests on the nodes you traverse, so the meaning of a chain must not be influenced by the shape of the terms on a property chain, if you want to use * in queries in a meaningful way. This principle is also why we chose bnodes in the first place. OWL also has a standard way of encoding the information that some property has an (unspecified) value, but the encoding of this looks more like what we have in the case of negation (no value) now. If we had used this, one would need a completely different query pattern to find people with unspecified date of death and for people with specified date of death. In contrast, the current bnode encoding allows you to ask a query for everybody with a date of death without having to know if it is given explicitly or left unspecified (you don't even have to know that the latter is possible). This should be kept in mind: the encoding is not just for "use cases" where you are interested in the special situation (e.g., someone having unspecified date of death) but also for all other queries dealing with data of some kind. For this reason, the RDF structure for encoding unspecified values should as much as possible look as the cases where there are values. I am not aware of any other option for encoding "there is a value but we know nothing more about it" in RDF or OWL besides the two options I mentioned. The proposal to use a made-up IRI instead of a bnode gives identity to the unkown (even if that identity has no meaning in our data yet). It works in many unspecified-value use cases where bnodes work, but not in all. The three main confusions possible are: 1. confusing a placeholder "unspecified" IRI with a real IRI that is expected in normal cases (imagine using a FILTER on URL-type property values), 2. believing that the data changed when only the placeholder IRI has changed (imagine someone deleting and re-adding a quantifier with "unspecified" -- if it's a bnode, the outcome is the same in terms of RDF semantics, but if you use placeholder IRIs, you need to know their special meaning to compare the two RDF data sets correctly) 3. accidental or deliberate uses of placeholder IRIs in other places (imagine somebody puts your placeholders as value into a URL-type property) Case 3 can probably be disallowed by the software (if one thinks of it). Another technical issue with the approach is that you would need to use placeholder IRIs also with datatype properties that normally require RDF literals. RDF engines will tolerate this, and for SPARQL use cases it's not a huge difference from tolerating bnodes there. But it does put the data outside of OWL, which does not allow properties to be for literals and IRIs at the same time. Unfortunately, there is no equivalent of creating a placeholder IRI for things like xsd:int or xsd:string in RDF (in OWL, you can write this with a class expression, but it will be structurally different from other cases where this data is set). For the encoding of OWL negation, I am not sure if switching this (internal, structure) bnode to a (generated, unique) IRI would make any difference. One would have to check with the standard to see if this is allowed. I would imagine that it just works. In this case, sharing the same auxiliary IRI between all negative statements that refer to the same property should also work. So: dropping in placeholder IRIs is the "second best thing" to encode bnodes, but it gives up several advantages and introduces some problems (and of course inevitably breaks existing queries). Before doing such a change, there should be a clearer argument as to why this would help, and in which cases. The linked PDF that is posted here for motivation does not speak about updates, and indeed if you look at Aidan's work, he has done a lot of interesting analysis with bnodes that would not make any sense without them (e.g., related to comparing RDF datasets; related to my point 2 above). I am not a big fan of bnodes either, but what we try to encode here is what they have genuinely been invented for, and any alternative also has its issues. TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Denny, Lucas_Werkmeister_
[Wikidata-bugs] [Maniphest] [Commented On] T216842: Specify license of Wikibase ontology
mkroetzsch added a comment. CC0 seems to be fine. Using the same license as for the rest seems to be the easiest choice for everybody. TASK DETAIL https://phabricator.wikimedia.org/T216842 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Lydia_Pintscher, daniel, Denny, mkroetzsch, abian, Aklapper, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T112127: [Story] Move RDF ontology from beta to release status
mkroetzsch added a comment. Well, for classes and properties, one would use owl:equivalentClass and owl:equivalentProperty rather than sameAs to encode this point. But I agree that this will hardly be considered by any consumer.TASK DETAILhttps://phabricator.wikimedia.org/T112127EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mkroetzschCc: Addshore, Lydia_Pintscher, Tpt, Lucas_Werkmeister_WMDE, hoo, Ricordisamoa, mkroetzsch, gerritbot, daniel, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Avner, Lewizho99, Maathavan, Gehel, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal
mkroetzsch added a comment. This is good news -- thanks for the careful review! The lack of specific threat models for this data was also a challenge for us, for similar reasons, but it is also a good sign that many years after the first SPARQL data releases, there is still no realistic danger to user anonymity known. The footer is still a good idea for general community awareness. People who do have concerns about their anonymity could be encouraged to come forward with scenarios that we should take into account.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Bawolff, mkroetzschCc: JBennett, Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Wikidata-bugs, aude, Capt_Swing, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal
mkroetzsch added a comment. Hi, The code is here: https://github.com/Wikidata/QueryAnalysis It was not written for general re-use, so it might be a bit messy in places. The code includes the public Wikidata example queries as test data that can be used without accessing any confidential information. We have a list of query types ordered by frequency. However, there are millions of query types, and the most frequent are those created by bots. I can dig up a pointer to the local file where we have it, if this is what you want. If you are interested in a broader analysis of the data, you could take a look at a recent workshop paper of ours: https://iccl.inf.tu-dresden.de/web/Inproceedings3196/en It has detailed statistics of SPARQL feature distributions and discusses some findings.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mkroetzschCc: Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Mpaulson, Wikidata-bugs, aude, Capt_Swing, jayvdb, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183020: Investigate the possibility to release Wikidata queries
mkroetzsch added a comment. I agree with Stas: regular data releases are desirable, but need further thought. The task is easier for our current case since we already know what is in the data. For a regular process, one has to be very careful to monitor potential future issues. By releasing historic data, we avoid exploits that could be theoretically possible based on detailed knowledge of the methodology. Regarding external IDs, one could whitelist unproblematic IDs that can be preserved, and obfuscate others. I agree that authority control IDs might identify humans, since they scope over so many things that are tied to particular humans (books, authors, etc.) that one could have a hypothetical situation where the interest in a particular item would already suggest who asked the query. I don't think something similar is even theoretically plausible for other IDs (e.g., for proteins or stars). Even for book ids, the lack of user traces makes it very hard to exploit this data further (the certainty you get from a single query being asked can hardly be high, and a query that helps you to guess who asked it will often not be interesting in its own right -- most likely you would want to know what else the identified person has asked). Anyway, we could restrict the "numerical strings are ok" rule to whitelisted properties for our current release. The main reason we have it at all are things like BlazeGraph's "radius" service parameter that have to be a number but are given as a string (I think the gas service might have similar cases). There is a general limitation to potential exploits of SPARQL logs for breaching someone's privacy. If you don't control the software that formulated the query, then you can only connect queries to people if you already knew that only this person would ask this query. But then you learn very little by observing the query! On the other hand, if you control the software, then it would usually be easy to gather user data more directly, without needing the detour across some SPARQL logs released months later. One exception that might be relevant in the future is the use of SPARQL from Lua built-ins or MediaWiki tags on Wikipedia pages, which could in theory expose some page traffic. This is not relevant for our historic logs, and it would be hard to fully exploit due to parser caches and crawler-based hits, but it might become a theoretical issue nonetheless. To avoid it, one could either filter all Wikipedia servers from the logs, or use a separate SPARQL service for such requests (as discussed in Berlin), whose logs would not be released. Considering our current dataset, it seems that even the obfuscation of strings is more than one would have to do, but in the future one might indeed have to add external URLs if they become more common in queries.TASK DETAILhttps://phabricator.wikimedia.org/T183020EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, mkroetzschCc: mkroetzsch, Smalyshev, DarTar, leila, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, Avner, Wikidata-bugs, aude, Capt_Swing, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org
mkroetzsch added a comment. @AndrewSu As I just replied to Benjamin Good in this matter, it is a bit too early for this, since we only have the basic technical access as of very recently. We have not had a chance to extract any community shareable data sets yet, and it is clear that it will require some time to get clearance for such data even after we believe it is ready. In the long run, I would find some collaboration very interesting, but we need to lay the foundations for this first, which will likely take a few more months.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mkroetzschCc: mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T126862: Datatype for chemical formulae on Wikidata
mkroetzsch added a comment. Re parsing strings: You are skipping the first step here. The question is not which format is better for advanced interpretation, but which format is specified at all. Whatever your proposal is, I have not seen any //syntactic// description of if yet. If -- in addition to having a specified syntax -- it can also be parsed for more complex features, that's a nice capability. But let's maybe start by saying what the proposed "structured data" format actually is. Re multilingual text: There is nothing controversial about the technical aspects you refer to. One could make all complex datatypes we have into items and only use primitive datatypes instead. There are many reasons against this, so we decided to have datatypes instead for representing complex value objects. It is a pity that you are not trying to explain what you propose but focus on attacking current proposals instead (how Wikidata editors store chemical formulas now, how the multilingual text datatype was planned). Since you are not providing much details, I have now also reviewed the gerrit patch. There is nothing in there that would enable you to search for "the nucleon masses of atoms which co-occur with at least 6 carbon atoms". The input text is simply sent to a LaTeX formatter like mathematical markup. There is no semantic interpretation or structured data there at all. There is also no syntax specification in this part of the code, so it seems the specification is "whatever the current version of MediaWiki does with text in ce-tags". All the doubts I raised in my first post remain valid. TASK DETAIL https://phabricator.wikimedia.org/T126862 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: ArthurPSmith, mkroetzsch, Chemical-Markup, gerritbot, daniel, Lydia_Pintscher, Aklapper, Physikerwelt, Izno, Wikidata-bugs, Prod, aude, fredw, Pkra, scfc, Mbch331, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T127929: [Story] Add a new datatype for linking to creators of artwork and more (smart URI)
mkroetzsch added a comment. +1 sounds like a workable design TASK DETAIL https://phabricator.wikimedia.org/T127929 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Aklapper, daniel, Steinsplitter, Lydia_Pintscher, Izno, Wikidata-bugs, aude, El_Grafo, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T126862: Datatype for chemical formulae on Wikidata
mkroetzsch added a comment. Re chemical markup for semantics: this is true for Wikitext, where you cannot otherwise know that "C" is carbon. It does not apply to Wikidata, where you already get the same information from the property used. Think of https://phabricator.wikimedia.org/P274 as a way of putting text into "semantic markup" on Wikipedia. In general, one should not confuse the task of adding semantic markup to a wiki text with what we do in Wikidata. We did the former in Semantic MediaWiki, and this approach never made it into Wikipedia. The communities in Wikipedia prefer readability of the source text over semantic markup, and the decision therefore was to move "semantic" information to a separate place, Wikidata. > The advantage of a data type vs. a property is that a service can enhance the > input data with additional information and which can thereafter be used by > third party services. You can do this in any case. Services already run on all kinds of property values on Wikidata. If you are talking about an enhanced UI that provides in-value annotation, then I don't see what exactly you refer to. Is anybody developing/designing/planning such a UI? I don't think this is needed for chemicals, where automated entity recognition would be fairly trivial to do automatically, so I would not spend any effort on this. > Can you provide a like that justifies the mulitlingual text for example? Missing word in sentence? Answers for both possible interpretations: - Use case: needed for all translated texts (e.g, slogans/mottos of organisations, quotes of people, usage notes for properties, ...) - Technical need: The type requires a new value type, since its contents is structurally distinct from all datatypes we have. Related to this, it requires a new UI, new JSON structures, and a new RDF encoding. Can't e done in a gadget. TASK DETAIL https://phabricator.wikimedia.org/T126862 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Chemical-Markup, gerritbot, daniel, Lydia_Pintscher, Aklapper, Physikerwelt, Izno, Wikidata-bugs, Prod, aude, fredw, Pkra, scfc, Mbch331, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T126862: Datatype for chemical formulae on Wikidata
mkroetzsch added a comment. I really wonder if the introduction of all kinds of specific markup languages in Wikidata is the right way to go. We could just have a Wikitext datatype, since it seems that Wikitext became the gold standard for all these special data types recently. Mark-up over semantics. By this I mean that the choice of format is focussed on presentation, not on data exchange. I am not an expert in chemical modelling (but then again, is anyone in this thread?), but it seems that this mark-up centric approach is fairly insufficient and naive. I am also missing the requirements analysis. How many infoboxes are currently using any chemical formula with this special markup at all? If you look at a page like https://en.wikipedia.org/wiki/Ethanol, you see there is no such markup in the whole page. Neither is there in https://en.wikipedia.org/wiki/Photosynthesis. Who really needs this in Wikidata? Aren't there many other forms of notation in chemistry (and biology) that would be equally important? There are some really fundamental datatypes currently missing, notably multi-lingual texts and geo shapes. This is the level on which datatypes are useful. Presentational things can be done by gadgets, as we already have for URL links shown with IDs. There is no need to codify this in the data model. Communities can solve these simple problems already without changing datatypes of existing properties (which is costly since existing applications and tools need to be updated each time). It is also notable that https://www.wikidata.org/wiki/Property:P274 has better format documentation than what is proposed here (at least there is no documentation of what the proposed format actually consists in in this thread). They even define a regular language for the possible content. Their direct, text-based formatting is preferable in many cases. TASK DETAIL https://phabricator.wikimedia.org/T126862 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Chemical-Markup, gerritbot, daniel, Lydia_Pintscher, Aklapper, Physikerwelt, Izno, Wikidata-bugs, Prod, aude, fredw, Pkra, scfc, Mbch331, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T126349: RDF export for the math data type should not export input texvc string but its MathML representation
mkroetzsch added a comment. > The MathML expression includes the TeX representation, which can be used in > LaTeX documents and also to create new statements. That would address the conversion back from MathML to TeX. With this in place, we could indeed use MathML in JSON and RDF, if we wanted (assuming that this is doable for you, that is, that there is a suitable TeX->MathML conversion available in Wikibase). TASK DETAIL https://phabricator.wikimedia.org/T126349 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Physikerwelt, mkroetzsch Cc: thiemowmde, mkroetzsch, fredw, gerritbot, daniel, Lydia_Pintscher, Tobias1984, Bene, Wikidata-bugs, Physikerwelt, Tpt, Liuxinyu970226, Rits, Ricordisamoa, Sannita, Micru, MGChecker, Aklapper, WickieTheViking, Llyrian, TomT0m, ArthurPSmith, Izno, Prod, aude, Pkra, scfc, Mbch331, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T126349: RDF export for the math data type should not export input texvc string but its MathML representation
mkroetzsch added a comment. The format should be the same as in JSON. If MathML is preferred there, then this is fine with me. If LaTeX is preferred, we can also use this. It seems that MathML would be a more reasonable data exchange format, but Moritz was suggesting in his emails that he does not think it to be usable enough today, so there might be practical reasons to avoid it. In any case, I am strongly against using a different format in RDF and JSON. Otherwise any tool-chain that uses RDF and JSON (e.g., a bot that uses SPARQL to fetch relevant information) would have to implement this conversion, maybe even back and forth. Tools that do large-scale processing (e.g., Wikidata Toolkit generating custom RDF dumps from JSON) would need to implement this conversion internally, even if there would be a web service (latency). It would be really a lot of work, without a clear benefit. Indeed, whatever format you pick, for whatever reason you pick it, the same reason should apply to all exchange formats alike. If you use MathML but keep the TeX-like input syntax, then external users will also need a web service that can convert back and forth between these representations: - TeX->MathML is needed, e.g., for a query UI where users enter data to search for in SPARQL - MathML -> TeX is needed, e.g., to display the (raw) value of a math property to a user after it was returned from SPARQL The two conversions do not need to be exact inverses, but they should hopefully stabilise after one round-trip. I think using MathML as the main exchange format would be doable, given such tool support exists. In particular, I am not concerned about showing different things to users than we use in our exchange formats. We are doing similar things with other types (dates are also written in a user syntax and then converted into an internal data model). The representation of dates is not the same in RDF and in JSON either, but the data structure is the same (same components that make up a date), which is very different from the situation of TeX vs. MathML. TASK DETAIL https://phabricator.wikimedia.org/T126349 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Physikerwelt, mkroetzsch Cc: thiemowmde, mkroetzsch, fredw, gerritbot, daniel, Lydia_Pintscher, Tobias1984, Bene, Wikidata-bugs, Physikerwelt, Tpt, Liuxinyu970226, Rits, Ricordisamoa, Sannita, Micru, MGChecker, Aklapper, WickieTheViking, Llyrian, TomT0m, ArthurPSmith, Izno, Prod, aude, Pkra, scfc, Mbch331, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T99820: [Task] Add reference to ontology.owl to the RDF output
mkroetzsch added a comment. In https://phabricator.wikimedia.org/T99820#1820662, @daniel wrote: > Looking at the link, it seems to me we'd (trivially) meet these requirements. Yes, that's what I meant. :-) > But I'm not sure about the fine details, e.g. regarding the version IRI. But > if you are sure we are meeting the formal requirements, fine with me. The version IRI is an optional aspect that we can include if we have a good one (I guess we might). We should give an ontology IRI and say that it is of type owl:ontology. Other bits of information about this ontology can be added, but there are not many requirements there. There is also not so much said in the standard about how to version ontologies in general, so this is something left to us. > We should then probably explicitly state that the dump is an ontology, > though... Not sure on which level you mean this. In RDF or in the user documentation? I thought that this discussion was about stating this in RDF, and this would already be "explicit". I am not sure it needs much documentation elsewhere. Mainly, we are putting this in to have a place where we can have the license (user requested feature) and other meta information (like export date and imported version of the Wikibase ontology). I agree that we should mention these features in the documentation. TASK DETAIL https://phabricator.wikimedia.org/T99820 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: gerritbot, Aklapper, mkroetzsch, daniel, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T99820: [Task] Add reference to ontology.owl to the RDF output
mkroetzsch added a comment. > ...and if we consider our data dump to be an ontology, then what isn't an > ontology? The word "ontology" has different meanings in different contexts. Here, we only mean the notion of "ontology" meant by the term owl:ontology as used in the W3C OWL standard. As Stas says, this is essentially nothing more than a OWL collection of OWL-compatible statements (called "OWL axioms" in the standard). No deeper meaning involved. See http://www.w3.org/TR/owl2-syntax/#Ontologies for a definition. TASK DETAIL https://phabricator.wikimedia.org/T99820 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: gerritbot, Aklapper, mkroetzsch, daniel, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T118860: [Tracking] Represent derived data in the data model
mkroetzsch added a comment. I don't want to detail every bit here, but it should be clear that one can easily eliminate the dependency to $db in the formatter code. The Sites object I mentioned is an example. It is *not* static in our implementation. You can make it an interface. You can inject Sites (or a mocked version of it) for testing -- this is what we do. The only dependency you will retain is that the formatting code, or some code that calls the formatting code, must know where to get the URL from. All of this will work in any "sane" architecture -- I don't see where you need a role manager for this. In particular, you can always pull out dependencies by injecting interface-based objects instead, and this has nothing to do with how you represent the "derived data" in memory using objects. The role-based approach discussed here simply seems to be a generalised version of this pattern, with a one-solution-fits-all interface ("Role") instead of task-specific interfaces ("Sites" etc.). The reason why I am not convinced by this here is that the tasks at hand are quite diverse and refer to different objects. So for any particular object (such as SiteLink) you might not have many possible roles available, probably just one, and in such a situation the complexity of the general solution might be avoidable. Maybe it's clearer if I say it in terms of the simpler "hash map of additional data" approach that @adrianheine mentioned: it seems to me you are adding such hashmaps to all objects (using a somewhat complicated way to encode the hashmap), just to have one or two entries per object in the end. In such a case, rather than using a hashmap, you would better use a member variable that can be null if the additional data is not there. TASK DETAIL https://phabricator.wikimedia.org/T118860 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, adrianheine, hoo, thiemowmde, aude, Jonas, JanZerebecki, JeroenDeDauw, Aklapper, StudiesWorld, daniel, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T118860: [Tracking] Represent derived data in the data model
mkroetzsch added a comment. @daniel As long as it works for you, this is all fine by me, but in my experience with PHP this could cost a lot of memory, which could be a problem for the long item pages that already caused problems in the past. > But it requires the serialization and formatting code to depend on the lookup > services I know that it's always nice in software architecture to reduce the number of dependencies (for all kinds of reasons). However, I don't see a strong reason why code that formats a sitelink should not depend on the facility that provides the URL to link to. When things have a conceptual dependency, it is not bad design to have a code dependency there as well. I don't think there is any reason to have duplicate code in either solution -- that's just a matter of coordinating work within the team (not saying it is easy, but architecture alone will not solve this ...). > Relying on global state like that for conveniance is what MediaWiki is doing > all over the place, with horrible results for testability and modularity The state would not be global (I was over-simplifying). Of course you would have a $db object that provides the access to the sites table. Reading from a table is a stateless operation, so there is no state (global or local) involved and you can indeed use static code if you like. But of course you could also use a Sites object like in WDTK. Regarding testing, I guess you already have a mock db connection object anyway (otherwise, how would you test db read operations ...). I don't see a reason why this solution should be any less modular or testable than what you propose. There is also no need to have any duplicate code. TASK DETAIL https://phabricator.wikimedia.org/T118860 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, adrianheine, hoo, thiemowmde, aude, Jonas, JanZerebecki, JeroenDeDauw, Aklapper, StudiesWorld, daniel, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T118860: [Tracking] Represent derived data in the data model
mkroetzsch added a subscriber: mkroetzsch. mkroetzsch added a comment. Structurally, this would work, but it seems like a very general solution with a lot of overhead. Not sure that this pattern works well on PHP, where the cost of creating additional objects is huge. I also wonder whether it really is good to manage all those (very different!) types of "derived" information in a uniform way. The examples given belong to very different objects and are based on very different inputs (some things requiring external data sources, some not). I find it a bit unmotivated to architecturally unify things that are conceptually and technically so very different. The motivation given for choosing this solution starts from the premise that one has to find a single solution that works for all cases, including some "edge cases". Without this assumption, one would be free to solve the different problems individually, using what is best for each, instead of being forced to go for some least common denominator. To pick just one example, consider the (article) URL of a SiteLink. To create it, one needs to have access to the content of the sites table. In WDTK, we encapsulate the sites table in an object (called Sites). To find out the URL of a SiteLink, one has to call a method of the Sites object (called getArticleUrl() or something), which takes a SiteLink as an input. This design is simple and efficient, uses no additional memory for storing new objects or values, and clearly locates the responsibility for computing this information (the Sites table). In a single-site setting (like you have in PHP), there is only one sites table, and you can access it statically, so the caller does not even need to have a reference to a Sites object as in WDTK. I therefore don't see any benefit in creating a role object for this simple task. It's just more indirection, without any convenience for the software developer or any gain in performance. The situation is similar in several of the other cases mentioned. In the end, this is a choice for PHP development, which won't affect my work, so I'll leave it to you, but it seems you are making your life harder than necessary by going for a complicated solution instead of several simple solutions. There is only going to be a small number of different kinds of "derived data" ever, and there is hardly any place where such data is used in a way that does not need to understand its meaning (mainly for serialization). For JSON exports of such data, I don't think this approach would make sense (but I suppose this is not intended here). There, one would simply use optional keys for including additional data. Likewise for RDF, where one would not want to introduce additional "role" objects that create another layer of indirection to access derived values (of course, RDF has a tradition of dealing with derived values, and nobody expects special structures there). I suppose that this proposal has nothing to do with JSON or RDF, but just to be sure we are on the same page. The term "data model" is a bit over-used in our context -- maybe it would make sense to indicate in this bug report that it is specific to the object model used in the PHP implementation, and has no implications for other implementations or export formats. TASK DETAIL https://phabricator.wikimedia.org/T118860 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, adrianheine, hoo, thiemowmde, aude, Jonas, JanZerebecki, JeroenDeDauw, Aklapper, StudiesWorld, daniel, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T113168: [Story] Make it possible to alter only Statements with a certain property
mkroetzsch added a comment. This was a suggestion we came up with when discussing during WikiCon. People are asking for a way to edit the data they pull into infobox templates. Clearly, doing this in place will be a long-term effort that needs a complicated solution and many more design discussions. Until this is in place, people can only link to Wikidata. Unfortunately, people often feel intimidated by what they see there, because they get a very long page that takes long time to load and contains all kind of data that they have not seen in the infobox. The goal of this task is to offer a cheap intermediate solution that improves the editing experience tremendously without much implementation effort. People would be able to link to a limited item page that shows only statements for one property and that has a button to view everything. The statement editing UI is actually not so bad if you see only one statement (or a smaller group of statements) -- it should be pretty self-explaining. This might allow more people to make individual changes without oversimplifying things. TASK DETAIL https://phabricator.wikimedia.org/T113168 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Lydia_Pintscher, Aklapper, hoo, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T111770: Decide how to represent quantities with units in the "truthy" RDF mapping
mkroetzsch added a comment. Note that this discussion is no longer just about the wdt property values (called "truthy" above). Simple values are now used on several levels in the RDF encoding. In general, the same argument as for coordinates applies: if we cannot do it right, then better not do it at all (i.e., use a bnode until we have a format). This might always be necessary in some cases (e.g., even if we convert units, there might be cases where conversion is not possible). I agree with the advantages and disadvantages of using a custom datatype. Without BlazeGraph support for this, one would not be able to do range queries over such data, which would make it pretty useless. We could as well use strings in this case. The normalisation of units by converting them to a base unit would still leave important problems. If there would be a community controlled way to define conversions, there would be the problem that the "main" unit that the RDF data is normalised to might change. This would change the content and meaning of simple values even though actual property values have not changed. Somehow declaring this in other triples in the RDF dump would not solve this, since we assume many fixed (standing) queries to be used which would not be able to adapt automatically to a new unit declaration. The normalisation scheme would also create problems for incremental update: a single change in the conversion definitions would require changes in millions of simple values that are part of the export of items that have not changed at all. A possible solution to work around the absence of a datatype and even in the absence of conversion support would be to create properties like "P1234inCm" and "P1234inInch". They would have plain number values that work in range queries. This would basically simulate the custom datatype with very similar effect on query answering (users would need to adjust queries to specify the unit that is queried for, but they would at least be sure that the data they query refers to this unit). The downside is that you need a different property for each unit, and that therefore you still have no good value to use for the simple value properties. However, I think this is how other datasets are doing it (has anybody checked DBpedia?). TASK DETAIL https://phabricator.wikimedia.org/T111770 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Denny, mkroetzsch, Smalyshev, Aklapper, daniel, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T111770: Decide how to represent quantities with units in the "truthy" RDF mapping
mkroetzsch added a comment. If we could distinguish type quantity properties that require a unit from those that do not allow units, there would be another options. Then we could use a compound value as the "simple" value for all properties with unit to simulate the missing datatype. On the query level, this would be fully equivalent to having a custom datatype, since one can specify the unit and the (ranged) number individually. (While the P1234inCm properties support only the number, but no queries that refer to the unit). Using a compound value as a simple value is fine. It's not worse than a bnode if you do not want to look into the inner structure, but it has additional features for those who want. The only problem is that you should not mix number literals with URIs that refer to compound values for the same property -- this is why one would need to fix in the property datatype whether units are required (always there) or forbidden (never there). Mixing this would not work. TASK DETAIL https://phabricator.wikimedia.org/T111770 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Denny, mkroetzsch, Smalyshev, Aklapper, daniel, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T111770: Decide how to represent quantities with units in the "truthy" RDF mapping
mkroetzsch added a comment. I think the discussion now lists all main ideas on how to handle this in RDF, but most of them are not feasible because of the very general way in which Wikibase implements unit support now. Given that there is no special RDF datatype for units and given that we have neither conversion support nor any kind way to restrict that a property must/must not have units, only one of the options is actually possible now: **export as string** (no range queries, but minimally more informative than just using a blank node). It would be possible to export data as numbers for unit-modified properties such as "P1234inCm" //in addition//. This can only be an additional feature though, since we still need a simple value in any case. It might not be worth to do this, since one can always use the complex value to access the number in any case. Note that the properties "P1234inCm" would need to have very complicated, lengthy, unreadable names since units in Wikibase are represented not as "cm", and not even as item ids, but as full URIs. But you cannot use a URI within another URI directly -- you would need to escape certain characters. Moreover, the resulting string might not be allowed as a local name in abbreviations like wdt:https://phabricator.wikimedia.org/P1234, so users would have to type the full URI. Therefore, it seems that using the (already existing) complex values in such queries would actually be more readable. TASK DETAIL https://phabricator.wikimedia.org/T111770 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Denny, mkroetzsch, Smalyshev, Aklapper, daniel, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T101837: [Story] switch default rdf format to full (include statements)
mkroetzsch added a comment. Including more data (within reason) will not be a problem (other than a performance/bandwidth problem for your servers). However, if there are further ideas and small improvements that will take time to implement, it would be good to switch to "dump" as the default right now. It is already a big improvement over the current (statement-free) default. Further improvements can then be done in small steps. TASK DETAIL https://phabricator.wikimedia.org/T101837 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Smalyshev, Rybesh, Jimkont, Klaraspina, Aklapper, daniel, Lydia_Pintscher, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T101837: [Story] switch default rdf format to full (include statements)
mkroetzsch added a comment. Data on the referenced entities does not have to be included as long as one can get this data by resolving these entities' URIs. However, some basic data (ontology header, license information) should be in each single entity export. TASK DETAIL https://phabricator.wikimedia.org/T101837 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Smalyshev, Rybesh, Jimkont, Klaraspina, Aklapper, daniel, Lydia_Pintscher, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T101837: [Story] switch default rdf format to full (include statements)
mkroetzsch added a subscriber: mkroetzsch. mkroetzsch added a comment. One the mailing list, Stas brought up the question "which RDF" should be delivered by the linked data URIs by default. Our dumps contain data in multiple encodings (simple and complex), and the PHP code can create several variants of RDF based on parameters now. I think the default should be to simply return all data that is in the dumps. This would address the BBC's use case of building a linked data crawler that fetches live data rather than using dumps. Such a crawler would not have any way to specify which part of RDF is needed, since linked data is such an extremely simple, parameter-free API. TASK DETAIL https://phabricator.wikimedia.org/T101837 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Smalyshev, Rybesh, Jimkont, Klaraspina, Aklapper, daniel, Lydia_Pintscher, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T85444: get Wikidata added to LOD cloud
mkroetzsch added a subscriber: mkroetzsch. mkroetzsch added a comment. As another useful feature, this will also allow us to have our SPARQL endpoint monitored at http://sparqles.ai.wu.ac.at/ Basic registration should not be too much work; please look into it (I don't want to create an account for Wikimedia ;-). TASK DETAIL https://phabricator.wikimedia.org/T85444 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, AnjaJentzsch, Aklapper, Lucie, Lydia_Pintscher, daniel, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T73349: [Bug] Fix empty map serialization behaviour
mkroetzsch added a comment. It seems that the Web API for wbeditentities is also returning empty lists when creating new items (at least on test.wikidata.org). Is this the same bug or a different component? TASK DETAIL https://phabricator.wikimedia.org/T73349 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Addshore, Aklapper, Ricordisamoa, Liuxinyu970226, Jimkont, JeroenDeDauw, Wikidata-bugs, mkroetzsch, JanZerebecki, Lydia_Pintscher, aude, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T105432: Drop wikibase:quantityUnit for now from RDF dump
mkroetzsch added a comment. If not dropped, then it should be fixed. The value of 1 (a string literal) is not correct. Units should be represented by URIs, not by literals. TASK DETAIL https://phabricator.wikimedia.org/T105432 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: gerritbot, mkroetzsch, daniel, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko, P.Copp ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T102717: https switch changed wdata prefix to https:
mkroetzsch added a comment. While I did say that pretty much all URIs I know use http, I do not have any reason to believe that https would cause problems. It is not so extensively tested maybe, but in most contexts it should work fine. A bigger issue is that some people are already using our http URIs. TASK DETAIL https://phabricator.wikimedia.org/T102717 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: JanZerebecki, mkroetzsch, thiemowmde, Lydia_Pintscher, gerritbot, daniel, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps
mkroetzsch added a comment. In https://phabricator.wikimedia.org/T95316#1373937, @Lydia_Pintscher wrote: Are there any differences we're missing? Are we ok with these differences? I will do a complete review of the update RDF mapping in the course of the next week. I will report back then if there is anything missing in the diff. Also, what is the expected outcome of this bug? A table like the one posted by Lucie? Or something with more detail? Some rows in the current table are probably only understood by people who already know both dumps ;-) Is this meant to be only for our internal information? Another relevant note here might be that the plan is to fully align WDTK mappings with the updated RDF dumps, so that many of the above will go away (the split into several files would remain though). We just did not do this while we were still discussing the updated RDF mapping. @Lucie: - The second row difference is just a consequence of what was already stated in the first row (NTriples vs Turtle). Maybe this can be merged/deleted. - It seems that the entry in row labels (aliasesdescriptions) only refers to labels. The properties skos:prefLabel and schema:name are not used for descriptions or aliases in either dumps, AFAIK. - It would make sense to distinguish differences in distribution/surface syntax (which format, how many files, which compression algorithm, ...) from real differences in the RDF model (=differences that matter for SPARQL users). TASK DETAIL https://phabricator.wikimedia.org/T95316 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucie, mkroetzsch Cc: Lydia_Pintscher, mkroetzsch, daniel, Smalyshev, Aklapper, Lucie, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T102155: find a way to surface rdf/json representation in item UI
mkroetzsch added a comment. we once planned a popup box with links to the various formats. It would be shown when you click on the Q-id in the title. A pop-up box is a good solution if there are several options, but the Qid is not a good place to trigger it, since it gives no hint that it can be clicked or what one would get by doing so. There is a standard icon used for share that looks very similar to the RDF icon and is sometimes used in this way (google share icon or search for share on http://marcoceppi.github.io/bootstrap-glyphicons/). This could be used to indicate the popup box. One could put it somewhere in the title region (I would prefer the top-right corner). TASK DETAIL https://phabricator.wikimedia.org/T102155 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Snaterlicious, mkroetzsch Cc: daniel, Nikki, mkroetzsch, Aklapper, Lydia_Pintscher, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T101752: Introduce ExternalEntityId
mkroetzsch added a comment. I think this is a useful change if you want Wikibase sites to be able to refer to other Wikibase sites. In WDTK, all of our EntityId objects are external, of course. A lesson learned for us was that it is not enough to know the base URI in all cases. You sometimes need URLs for API, file path, and page path in addition to the plain URI. MediaWiki already has a solution for this in form of the sites table. I would suggest to use this and to store pairs sitekey,localEntityId and to have the URI prefix stored in the sites table. It's cleaner than storing the actual URI string (which might change if an external site is reconfigured!) in the actual values on the page. A strategy for introducing this without breaking anything much is to keep the local wikis sitekey as the default setting in all cases. So callers who are not aware of the external site support can keep sending local ids and will get the right thing. Only when they read data they will have to mind the new information (but that's always the case if you enable linking to external entities). But, overall, I think it would be good to make this change. Commons will want to link to Wikidata, for example, but also many Wikibase instances outside of WMF will benefit from the ability to link to WIkidata content. TASK DETAIL https://phabricator.wikimedia.org/T101752 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, JeroenDeDauw, Aklapper, daniel, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T99907: Human-readable serialization of TimeValue precisions in RDF
mkroetzsch added a comment. A big advantage of the numbers is that you can search for values where the precision is at least a certain value (e.g., dates with precision day or above). This would be lost when using URIs. TASK DETAIL https://phabricator.wikimedia.org/T99907 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Smalyshev, daniel, thiemowmde, Aklapper, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph
mkroetzsch added a comment. @Jc3s5h You are right that date conversion only makes sense in a certain range. I think the software should disallow day-precision dates in prehistoric eras (certainly everything before -1). There are no records that could possibly justify this precision, and the question of calendar conversion becomes moot. Do you think 4713BCE would be enough already, or do you think there could be a reason to find more complex algorithms to get calendar support that extends further to the past? TASK DETAIL https://phabricator.wikimedia.org/T94064 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: thiemowmde, Jc3s5h, Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T97195: Create real URLs for wikidata ontology
mkroetzsch added a comment. Sounds good. I am not aware of any best practice re http vs. https but all URIs I know are using http as a protocol. TASK DETAIL https://phabricator.wikimedia.org/T97195 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Lydia_Pintscher, Denny, mkroetzsch, Manybubbles, daniel, Aklapper, Smalyshev, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94747: Make decision on RDF ontology prefix
mkroetzsch added a comment. I agree with the proposal of @Smalyshev. TASK DETAIL https://phabricator.wikimedia.org/T94747 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Denny, mkroetzsch, daniel, Lydia_Pintscher, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, Manybubbles, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94747: Make decision on RDF ontology prefix
mkroetzsch added a comment. @daniel Changing the base URIs is not working as a way to communicate breaking changes to users of RDF. You can change them, but there is no way to make users notice this change, and it will just break a few more queries. It's just not how RDF works. Most of our test queries do not even mention any wikibase ontology URI, yet they are likely to be broken by changes to come. If you think that we need a way to warn users of such changes, you need to think of another way of doing this. Here is how I would do ontology and RDF model versioning: - Ontology URIs are never changed. They are all based on the same base URI (prefix). - If an update would change the meaning of an ontology element in fundamental ways, a new URI (new local name) is used rather than redefining an existing URI. - The ontology needs to have a file (with OWL property and class declarations, some labels, etc.). The file name of this file (and thus its URL) should include a version number. - The ontology URIs should redirect to the most recent version of the ontology file. An improved setup would use content negotiation to redirect to an HTML documentation or to the ontology file. One can use URL fragments for local names in both cases. - The versioned ontology file is imported into the data dump using an OWL import statement. - In addition to the ontology import triple, the data contains a triple that defines the version of the RDF model used. This triple uses an annotation property to the dataset ontology element (i.e., the subject of the import triple, not the Wikibase ontology file). Then users can query for the RDF model version to check compatibility. Like in Daniel's proposal, there is no warning if they don't do this check, but in contrast to Daniel's proposal, users have a way to find out which version of the model is used, and the versioning can refer to the whole RDF model (not just to the Wikibase ontology, which most of our current example queries are not even referring to). TASK DETAIL https://phabricator.wikimedia.org/T94747 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Denny, mkroetzsch, daniel, Lydia_Pintscher, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, Manybubbles, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph
mkroetzsch added a comment. @Smalyshev You comment on my Item 1 by referring to BlazeGraph and Virtuoso. However, my Item 1 is about reading Wikidata, not about exporting to RDF. Your concerns about BlazeGraph compatibility are addressed by my item 2. I hope this clarifies this part. As for the wrong dates, I simply say that we do not know how to fix them, since the errors are not sufficiently systematic. At best we can replace one kind of error with another kind of error. I agree with you that wrong dates are a bad thing, but a bad thing that is beyond our power to fix. We should focus on the RDF export and rely on others to do their work, so that everything will run smoothly in the end. At the current stage of the RDF work, the issue is of relatively minor relevance compared to the problems it is causing elsewhere. All of our current RDF exports and several applications that people are using suffer from the same errors in Wikidata. We need to fix it at the root, not in each consumer. TASK DETAIL https://phabricator.wikimedia.org/T94064 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph
mkroetzsch added a comment. @Smalyshev Re halting the work on the query engine/produce code now: The WDTK RDF exports are generated based on the original specification. There is no technical issue with this and it does not block development to do just this. The reason we are in a blocker situation is that you want to move forward with an implementation that is different from the RDF model we proposed and that goes against our original specification, so that Denny and I are fundamentally disagreeing with your design. If you want to return to the original plan, please do it and move on. If not, then better wait until Lydia has a conclusion for what to do with dates, rather than implementing your point of view without consensus. For me, this is a benchmark of whether or not our current discussion setup is working. Here is why I am optimistic that we can align with RDF 1.1 and ISO 8601:2000 before the query engine would even go live: Basically all calendar-accurate BCE dates will be revised and many of them will be changed because of the ongoing date review. We can well fix the year zero issue at the same time. Thus we can as well work on the hypothesis that dates are in ISO 8601:2000 as originally intended. From the feedback we got from the SPARQL group, it seems that this would be preferable, if we can make it work technically. The date review is a great opportunity to get the whole internal representation back on track. Re deep value model: the core of the issue is that you propose to represent dates as the original string. Denny and I have clarified that we don't find this an acceptable representation for dates. As opposed to the XSD 1.0 issue, this proposal leads to a completely different structure in RDF and queries. There is no upgrade path from this implementation to the one we actually want. If we can agree on getting rid of this first, this would be a good start to move on. Changing from XSD 1.0 to XSD 1.1 is a minor issue in comparison, and one which can be deferred in implementation until we have BlazeGraph support for this. TASK DETAIL https://phabricator.wikimedia.org/T94064 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph
mkroetzsch added a comment. @mkroetzsch I already listed a few of the tools that implement XSD 1.0 style BCE years and I read your answer as to say that you know of no tools that implement XSD 1.1 style BCE years. Then you misread my answer. Almost all tools that exist today use the 2000 version of the ISO standard. A prominent example is ECMAScript, and thus all JavaScript implementations, and virtually every JavaScript timeline implementation. See http://www.ecma-international.org/ecma-262/5.1/#sec-15.9.1.15 and the examples in the following section to see this. RDF tools are an understandable exception, not because people there think one should cling to the old standard, but because RDF 1.1 was only standardised in 2014. It is natural that existing RDF implementations have more pressing upgrade work to do than to fix BCE date handling. I am sure they will all move to the new standard in due course. It is worth noting that all of the ECMAScript documents use ISO 8601 to refer to ISO 8601:2000, just like we did in the Wikidata data model specification. It seems that most people are not confused by this. Having dug up ECMA from the Web, I can now also safely say that JSON exports should definitely use ISO 8601:2000 dates. The new situation therefore is: ISO, W3C, and all JavaScript implementations vs. a subset of the developers in the WMDE office. I am very unhappy about the amount of my time I have to put into digging up for you what the rest of the world is thinking. It's a nice position to put yourself in, asking others to find specific arguments against your position and assuming you are right if they don't have the time or knowledge to do it. Now I am myself far from being an expert in JavaScript or even in all details of SPARQL 1.1, but if I don't know something I try to find out before taking part in discussions like this. TASK DETAIL https://phabricator.wikimedia.org/T94064 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph
mkroetzsch added a comment. @Smalyshev We really want the same thing: move on with minimal disturbance as quickly as possible. As you rightly say, the data we generate right now is not meant for production use but for testing. We must make sure that our production environment will understand dates properly, but it's still some time before that. Here is my proposal summed up: 1. Implement RDF export now as if Wikidata would encode all dates in ISO 8601:2000 (proleptic Gregorian with year 0 encoding -1BCE) 2. Have a switch in the RDF export code that allows us to export to RDF 1.1 or to RDF 1.0. Item 1 will ensure that we can work with the dates as they will most likely be when we enter production. With the discovery that all of JavaScript relies on ISO 8601:2000, there is not much of a question that we will have this corrected in the end. It would be a waste of programming time to work around issues that others are already trying to fix as we speak. We can still implement reinterpretations of the internal dates when we find that the internal format is still broken when we want to release this (I hope this won't happen). Item 2 is a compromise. It will ensure that we can use BlazeGraph even before the xsd:date bug is fixed. Has anyone reported the issue to them yet? It might well be that they are quicker fixing this than we are finishing this discussion. I am sure they would also like to conform to SPARQL 1.1. I agree with you that some dates will not be interpreted as intended, but this is unavoidable (whatever rule we pick, we will always have some dates that are not as intended, already because of the calendar model mix-up). We have to rely on the ongoing review to get this fixed. This should not worry us right now, as it affects everyone (including actual production uses of Wikidata data from the API or JSON dumps). TASK DETAIL https://phabricator.wikimedia.org/T94064 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph
mkroetzsch added a comment. @Smalyshev P.S. Your finding of years in our Virtuoso instance is quite peculiar given that this endpoint is based on RDF 1.0 dumps as they are currently generated in WDTK using this code: https://github.com/Wikidata/Wikidata-Toolkit/blob/a9f676bfbc2df545d386bfa72e5130fa280521a9/wdtk-rdf/src/main/java/org/wikidata/wdtk/rdf/values/TimeValueConverter.java#L112-L117 TASK DETAIL https://phabricator.wikimedia.org/T94064 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph
mkroetzsch added a comment. @Smalyshev @Lydia_Pintscher Dates without years should not be allowed by the time datatype. They are impossible to order, almost impossible to query, and they do not have any meaning whatsoever in combination with a preferred calendar model. All the arguments @Denny has already given elsewhere for why we should unify dates to Proleptic Gregorian internally apply here too. My suspicion is that the existing dates of this form are simply a glitch in the UI, where users got the impression that dates without years are recognized and pressing save silently set the year to zero without them seeing the change in meaning. If this is an important use case, then we should develop a day-of-year datatype that supports this, or suggest the community to use dedicated properties/qualifiers to encode this. However, other datatype extensions would be much more important than this rare case (e.g., units of measurement). The above proposal of @Smalyshev is simply to use RDF 1.0 for export and to assume XSD 1.0 (non-ISO) dates to be used in Wikidata. After all the discussion here, I am completely baffled by this proposal. It goes against all current standards, and against the view of the SPARQL working group. The additional proposal to revert to the dates are just strings view for deep values ignores the original design and documentation, and dismisses the recommendations that Denny and I have been making via email. It seems we have reached an impasse here. **I suggest to freeze the RDF-time encoding discussions now** until we have established a joint understanding what dates in Wikidata mean. As soon as we export dates to RDF, we are defining their meaning indirectly via the RDF semantics, and this bug report is not the right place for doing this. TASK DETAIL https://phabricator.wikimedia.org/T94064 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph
mkroetzsch added a comment. @mkroetzsch Do you know of some widely used software that implements XSD 1.1 handling of BCE dates? Many applications that process dates are based on ISO rather than on XSD. Java's SimpleDateFormat class, for example, is based on ISO and thus interprets year numbers like XSD 1.1. I would assume that most time-processing applications, e.g., JavaScript timelines, use the same. Only XSD-based implementations tend to have legacy handling. For many RDF tools it is really hard to tell without digging into their code (usually they don't document this detail, and they use own implementations rather than relying on any XSD library). But I think it is fair to assume that ISO has a much larger market share, and that XSD 1.0 implementations will be updated at some point. I think the best way forward is to leave the lexical 0 in the year fraction as undefined in Wikidata. Yes its used, but AFAIK it was always undefined. Our original specification of Wikidata said: The calendar model used for saving the data is always the proleptic Gregorian calendar according to ISO 8601. This is the specification that Denny and I support, but there have been changes recently. WMDE is currently in the process of reviewing these changes to gauge the impact they have had in the data over time, and to come up with ideas how to recover to a consistent state. We have to await their report and suggestions before deciding what to do in RDF. TASK DETAIL https://phabricator.wikimedia.org/T94064 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed in wikibase but has no meaning as xsd:dateTime
mkroetzsch added a comment. Note that all current data representation formats assume that -01-01T00:00:00 is a valid representation: - XML Schema 1.1: http://www.w3.org/TR/xmlschema11-2/#dateTime - RDF 1.1: http://www.w3.org/TR/rdf11-concepts/#section-Datatypes - OWL 2: http://www.w3.org/TR/owl2-syntax/#Datatype_Maps Moreover, XML Schema 1.1 argues that this change was made in order to agree with existing usage, and I would agree there: many existing documents used the ISO interpretation of years even before it became official. In other words, if we want to export data to RDF, we should definitely conform with current usage and standards. I imagine that it would be easy for BlazeGraph to use either semantics if we asked for support there. Regarding the intention of SPARQL 1.1, I now have sent an enquiry to the former SPARQL WG: http://lists.w3.org/Archives/Public/public-sparql-dev/2015JanMar/0031.html which will hopefully lead to further clarification on this matter. TASK DETAIL https://phabricator.wikimedia.org/T94064 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T93451: Data format updates for RDF export
mkroetzsch added a comment. Don't see why it would be this many. It'd be like 4 additional rows per property: I was referring to the labels. For some use cases, it could be convenient of each of the property variants would also have the rdfs:label of the property item. For example, RDF browsers will not be able to label a property variant such as :P1234q (or whatever we use) if we don't include any label for it. But including all labels (up to 300 languages) for all variants would lead to a lot of triples in the dump. TASK DETAIL https://phabricator.wikimedia.org/T93451 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: gerritbot, Denny, mkroetzsch, daniel, Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph
mkroetzsch added a comment. we don't know what year it was but it was July 4th Ouch. Where has this been designed? Can you point to the specification of this? @Denny, is this intended? Dates without a year are extremely hard to handle in queries and don't work at all like the normal dates we have. This should be a different datatype. TASK DETAIL https://phabricator.wikimedia.org/T94064 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T93451: Data format updates for RDF export
mkroetzsch added a comment. All RDF tools should be able to handle resources without labels (no matter if used as subject, predicate, or objcet). But data browsers or other UIs will simply show the URL (or an automatically created abbreviated version of it) to the user. So instead of instance of it would read something like http://www.wikidata.org/entity/P31c;. Nevertheless, we can accept this for now. AFAIK there are no widely used generic RDF data browsers anyway, and it's much more likely that people will first create Wikidata-aware interfaces. TASK DETAIL https://phabricator.wikimedia.org/T93451 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: gerritbot, Denny, mkroetzsch, daniel, Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Subscribers] T94019: Generate RDF from JSON
mkroetzsch added a subscriber: mkroetzsch. TASK DETAIL https://phabricator.wikimedia.org/T94019 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: mkroetzsch, Aklapper, daniel, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Subscribers] T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph
mkroetzsch added a subscriber: Lydia_Pintscher. mkroetzsch added a comment. Yes, the discussion on SPARQL has converged surprisingly quickly to the view that XSD 1.1 is both normative and intended in SPARQL 1.1 (by the way, I can only recommend this list if you have SPARQL questions, or the analogous list for RDF -- people are usually very quick and helpful in answering queries, esp. if you say why you need it ;-). Your findings about Virtuoso and BlazeGraph show that it might be hard to find a conforming processor right now. However, I would still hope that it can be done, since the transformation of the values is quite easy after all. In fact, I think that neither of these projects is very likely to have customers who have cared about BCE years so far ;-). Technically, it should be not too hard: even if you use an XSD 1.0 library in most places, you could surely find an XSD 1.1 library to use with date functions, or you could transform dates internally before passing them to the XSD 1.0 operator functions. If such transformation is not efficient enough, one could also convert all input to XSD 1.0 dates on loading (or before) and then merely translate dates in queries and results accordingly (should not be a big performance issue since these datasets are rather small). However, I think it should be easy to take the few affected XPath functions from another library or to implement them directly. Julian day calculation is a very simple algorithm (https://en.wikipedia.org/wiki/Julian_day#Calculation), and this is all you need for date comparisons, calendar conversion, and time intervals. One way or the other, implementations will most likely have to do some custom extensions of internal date handling, unless the standard XSD libraries can cope with the age of the universe. Data publication is another issue. It's clear that we need to use XSD 1.1 when we publish RDF online, since this is what the current RDF specification requires. Applications that find our data on the web cannot know what we discussed and there is no way of telling them. They can only assume that we are using the current standard. For JSON and Wikidata internally, the main reference is probably ISO 8601 not XSD (I don't actually know what JSON says here, but usually it says nothing about things other than primitive Javascript types). I'd find it hard to explain why we would choose to deviate from that. Year has been legal in ISO for many years before Wikidata was even started. @Lydia_Pintscher recently triggered an action to review the dates stored internally in Wikidata, so this issue should probably be part of it (esp. since all BCE dates that are exact to the year should use Julian calendar). As you said, we already have year in values, and it is likely that we have other negative year numbers entered by the same bots (assuming ISO semantics). So we need to change many dates one or the other way in any case. I think most technical consumers will appreciate if we stick to ISO since it is easier to do calculations with. Moreover, now that every standard has agreed to use the same format, time is working for those who go with it. TASK DETAIL https://phabricator.wikimedia.org/T94064 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. @Smalyshev Yes, this is what I was saying. @hoo was proposing to create a special directory for truthy based on offline discussion in the office. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T93451: Data format updates for RDF export
mkroetzsch added a comment. @Smalyshev Yes, using lower-case local names for properties is a widely used convention and we should definitely follow that for our ontology. However, I would rather not change case of our P1234 property ids when they occur in property URIs, since Wikibase ids might be case sensitive in the future (Commons files will have their filename as id, and even if standard MW is first-letter case-insensitive in articles, it can be configured to be otherwise). It would also create some confusion if one would have to write p1234 in some interfaces and P1234 in others (maybe even both would occur in RDF since we have a P1234 entity and several related properties). TASK DETAIL https://phabricator.wikimedia.org/T93451 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Denny, mkroetzsch, daniel, Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T93207: Better namespace URI for the wikibase ontology
mkroetzsch added a comment. @daniel Changing URIs of the ontology vocabulary is silently producing wrong results as well. I understand the problems you are trying to solve. I am just saying that changing the URIs does not actually solve them. @adrianheine You are right. My example was less suitable than I had thought. The reason is that for an API, you can report an error if somebody uses the wrong actions. This is exactly what you cannot do in RDF. If someone uses the wrong URIs in a SPARQL query, he will just get the wrong results (or maybe, coincidentally, the correct results) without any warning. TASK DETAIL https://phabricator.wikimedia.org/T93207 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: adrianheine, Manybubbles, Smalyshev, mkroetzsch, Denny, Lydia_Pintscher, Aklapper, daniel, Wikidata-bugs, aude, Krenair, Dzahn ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. @hoo Thanks for the heads up! I do have comments. (1) I would remove the full and truthy distinction from the path and rather make this part of the dump type (for example statements and truthy-statements). The reason is that we have many full dumps (terms, sitelinks, statements, properties), which can be readily exported in RDF and JSON, but we have only one truthy dump and it really is mainly for RDF (at least we did not discuss a JSON format for single-triple statements). Therefore, it does not seem worth to make a top-level distinction in the directory structure for this. For consumers, it is easier if a dump file is addressed with four components (projectname, dumptype, date, file format). The truthy/full distinction would be another parameter that does not seem to add any functionality. (2) My comment right at the beginning of this bug report was to have timestamped subdirectories, just like we have for the main dumps. Maybe you have reasons for not having these, but could you explain them here? TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. @Lydia_Pintscher I understand this problem, but if you put different dumps for different times all in one directory, won't this become quite big over time and hard to use? Maybe one should group dumps by how often they are created (and have date-directories only below that). For some cases, there does not seem to be any problem. For example, creating all RDF dumps from the JSON dump takes about 3-6h in total (on labs). So this is easily doable on the same day as the JSON dump generation. I am sure that we could also generate alternative JSON dumps in comparable time (maybe add an hour to the RDF if you do it in one batch). The slow part seems to be the DB export that leads to the first JSON dump -- once you have this the other formats should be relatively quick to do. TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. All of these dumps will be generated by exporting from the DB. Why would one want to do this? The JSON dump contains all information we need for building the other dumps, and it seems that the generation from the JSON dump is much faster, avoids any load on the DB, and would guarantee consistent state of all files (same revision status). Moreover, we already have code for doing it now (which will be updated to agree with any changes in RDF export structures we want). TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. @Smalyshev Re what does consistent mean: to be based on the same input data. All dumps are based on Wikidata content. If they are based on the same content, they are consistent, otherwise they are not. Re discussing RDF dump partitioning in https://phabricator.wikimedia.org/T93488: Agreed. We are not discussing which RDF dumps to have here, only whether they are likely to be well organised by distinguishing full and truthy as a primary categorisation that sits above format (RDF vs. JSON and other matters). TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. @JanZerebecki: Re using the same code: That's not essential here. All we want is that the dumps are the same. It's also not necessary to develop the code twice, since it is already there twice anyway. It's just the question if we want to use a slow method that keeps people waiting for the dumps for days (as they already do now with many other dumps), or a fast one that you can run anywhere (even without DB access; on a laptop if you like). The fact that we must have the code in PHP too makes it possible to go back to the slow system if it should ever be needed, so there is no lock-in. Dump file generation is also not operation-critical for Wikidata (the internal SPARQL query will likely be based on a live feed, not on dumps). What's not to like? Re consistency: I meant that the dumps would contain the same information, not that they reflect a consistent state of the site. If it is important for you to have a defined state, then the dump-based file generation is also your friend: one can do the same with the full history dump, where one could exactly specify the revision to dump. Probably still as fast as the DB method, but guaranteed to provide a globally consistent snapshot (yes, I know, modulo deletions). Not sure if this type of consistency is relevant though. Having a guarantee that the dump files in various formats are based on the same data, however, would be quite useful (e.g., in SPARQL, where you often mix data from truthy and full dumps in one query). Recall that we are discussing this here since Lydia said that the slowness of the DB-based exports would be a reason for why we cannot have an (otherwise convenient) date-based directory structure. I agree with Lydia that this would be a blocker, but in this case it's really one that we can easily remove. The code I am talking about is at https://github.com/Wikidata/Wikidata-Toolkit, well tested, extensively documented, and partially WMF-funded. Why not make this into a community engagement success story? :-) TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Manybubbles, JanZerebecki, Smalyshev, aude, daniel, Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T93451: Data format updates for RDF export
mkroetzsch added a comment. is there any existing ontology we may want to use to create such links between entity:P1234 and v:P1234 or q:P1234? Or should we just invent our own? We would have to make new URIs here. This depends on which/how many variants of RDF property URIs we use: we should use a different link for each kind of RDF property variant. For example, we could have :P1234 wikibase:qualifierProperty q:P1234. Also, if we never use entity:P1234 in statements, to look it up (e.g. for type, etc. if we add type to property export, or for properties) one would have to do additional hop with something like: ?entity wikibase:represents v:P1234 instead of just using it directly. Not sure if it's a big issue. I would say that it is not a big issue since most of the RDF properties we use will always have that problem. If we use the property entity as an RDF property, it would only replace one of the uses of property variants. In all other places, you would still need the additional hop to get the label. It would be good if the linked data export for all RDF property variants could include the entity labels. I would not add them to the dumps though (1000 x 300 x 5 is a lot of additional triples). For convenient SPARQL-based access, we should provide query interfaces that retrieve labels for IRIs that occur in query results so that users don't have to SPARQL for the label. Such post-query labelling is done in WDQ. It will be easy to extend this to property variants without even looking at the RDF graph. This will make the SPARQL queries much lighter in general. TASK DETAIL https://phabricator.wikimedia.org/T93451 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Denny, mkroetzsch, daniel, Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T93451: Data format updates for RDF export
mkroetzsch added a comment. Also, it was suggested that we may want to change the fact that we use entity:P1234 in link Entity-Statement and give it a distinct URL. However, then it is not clear what would be the link between entity:P1234 and the rest of the data. This is a good point. It affects all property variants (qualifiers, values, ...) that we generate. We should have explicit links from the Enitity P1234 to every RDF property that we use to model this Wikidata property in different contexts. WDTK already has an open issue on this: https://github.com/Wikidata/Wikidata-Toolkit/issues/84 TASK DETAIL https://phabricator.wikimedia.org/T93451 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Denny, mkroetzsch, daniel, Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T93207: Better namespace URI for the wikibase ontology
mkroetzsch added a comment. @daniel It makes sense to use wikibase rather than wikidata, but I don't think it matters very much at all. We should just define it rather sooner than later. As for the versioning, I don't see how to convince you. Four more attempts: - Try to apply your proposal to the MediaWiki API: Every API action should contain the MW version number. I think from your experience with MW it should be easier for you to see why this would be a bad idea. RDF is the same, but it affects a lot more APIs. - Another argument is that, of course, changing URIs does not give users any warnings about the change either. Their queries will just return different results, but there won't be any error message or the like. This behaviour is exactly the same as for other kinds of breaking changes. You just add a new kind of breaking change that is sure to break everybody's usage (not just the users' who use BCE dates, to stay in your example), but the breakage is still subtle and hard to notice in a running system. URI versioning does not implement any kind of fail fast principle that you would want for announcing breaking changes. There is no standard way of announcing breaking changes via an RDF or SPARQL API; you need to work on your community communication to get this done (e.g., one could send notes about breaking changes well in advance to wikidata-tech and gather feedback). - You gave an example where a well-informed group of experts decided against your recommendation. I know of many other examples where URIs were initially created to contain a version number that was then never changed even after major updates (FOAF for example), again because experts in the field deemed that this was a sensible way to go. I would also claim some expertise in this area. Your view is natural for somebody who has not worked much with ontologies. Many smart people have thought similar ten years ago (you can see a lot of 0.1 and 1.0 version numbers in vocabulary URIs. Even the SMW ontology includes a version number in the URIs; of course it also never changed). Experience shows that the case where you would ever want to do such a drastic thing is the case where you use completely new URIs anyway (and probably give the project another name, too). - You could always decide to do the versioning later on if you must. There is no problem going from URIs http://wikiba.se/ontology#... to URIs http://wikiba.se/ontology-2.0.0# There is no standard way of encoding version information in URIs and you would not write a SPARQL query to extract it from there. However, in most cases, if you really change the meaning of one URI, you would rather use a new URI for this one thing only and keep all the other URIs as they are. TASK DETAIL https://phabricator.wikimedia.org/T93207 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: adrianheine, Manybubbles, Smalyshev, mkroetzsch, Denny, Lydia_Pintscher, Aklapper, daniel, Wikidata-bugs, aude, Krenair, Dzahn ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T93207: Better namespace URI for the wikibase ontology
mkroetzsch added a comment. @daniel: Have you wondered why XML Schema decided against changing their URIs? It is by far the most disruptive thing that you could possibly do. Ontologies don't work like software libraries where you download a new version and build your tool against it, changing identifiers as required. Changing all URIs of an ontology (even if only on a major version increment) will break third-party applications and established usage patterns in a single step. There is no mechanism in place to do this smoothly. You never want to do this. Even changing a single URI can be very costly, and is probably not what you want if the breaking change affects only a diminishing part of your users (How many BCE dates are there in XSD files? How many of those were already assuming the ISO reading anyway?). Having a version number in URLs does not solve the problem of versioning: it just creates an obligation for you to change *all* URIs whenever you update the ontology. If you really ever want to make a breaking change to one URI, then you just create a new URI for this purpose and keep the old one defined as it was. Then you stop using the old one in the data. Clean, easy, and usually without any disruption to 99% of the users (depending on which URI you changed of course ;-). This introduction of new names is completely independent of your ontology document version. Besides all this, breaking changes are extremely rare. The example you gave (changing the meaning of an XML Schema datatype) does not apply to us, since we cannot do such things in our ontology. In essence, our ontology is just a declaration of technical vocabulary. Most changes you could make do not cause any incompatibility -- the Semantic Web is built on an open-world assumption so that additions of information to the ontology never are breaking anything. The only potentially breaking change to an ontology is when you delete some information, but even there it is hard to see how it should break a specific application. Summing up, the only breaking change to an ontology is to change an important URI that many people rely on. The current proposal is to introduce a mechanism for doing exactly this. TASK DETAIL https://phabricator.wikimedia.org/T93207 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: adrianheine, Manybubbles, Smalyshev, mkroetzsch, Denny, Lydia_Pintscher, Aklapper, daniel, Wikidata-bugs, aude, Krenair, Dzahn ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T93207: Better namespace URI for the wikibase ontology
mkroetzsch added a comment. Hi Daniel. Good point, I agree that this should change. A URL based on wikiba.se seems to be the best. I don't think we need to worry about domain ownership here (why would anybody sell this domain? Is it not WMF-owned?) I think it is not a good idea to change ontoogy URLs based on the version of the ontology. External tools will depend on the exact URI and you will thus soon be locked into the version string for all future. You can see this in FOAF, which has always been at http://xmlns.com/foaf/0.1/; although it's not at v0.1 these days. Basically, putting the version into your URIs is like putting the version number into every class name in your program in software versioning. I would suggest to maintain versioned ontology files (somewhere) and to define the version in the ontology header, but to keep the identifiers as they were. TASK DETAIL https://phabricator.wikimedia.org/T93207 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: adrianheine, Manybubbles, Smalyshev, mkroetzsch, Denny, Lydia_Pintscher, Aklapper, daniel, Wikidata-bugs, aude, Krenair, Dzahn ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T91117: Empty JSON maps serialized as empty lists in XML dumps
mkroetzsch created this task. mkroetzsch added subscribers: mkroetzsch, Lydia_Pintscher. mkroetzsch added a project: Wikibase-DataModel-Serialization. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The XML dumps of Wikidata contain many JSON serialization errors where empty maps {} are wrongly serialized as empty lists []. This affects many different fields, such as claims, aliases, and descriptions, and it seems likely that all empty maps are serialized wrongly. This is a major inconvenience to all users who for some reason or other prefer XML over the JSON dumps (which do not have the problem). In particular, the XML dumps are needed to get access to the full history of the pages. Using the same JSON as used in the JSON dumps would fix the problem. TASK DETAIL https://phabricator.wikimedia.org/T91117 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Lydia_Pintscher, mkroetzsch, Aklapper, Wikidata-bugs ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T89949: RDF mapping should not assert that .../entity/Q123 is-a Wikidata item
mkroetzsch added a comment. In https://phabricator.wikimedia.org/T89949#1052731, @daniel wrote: Nik tells me that the HA features in Virtuoso are only available in the closed source enterprise version. That basically means WMF is not going to use it in production. Yes, I guessed that this would cause issues. I don't know which other tool could deliver the performance you need though. 4Store is free, too, but may not be active enough since 5Store became the main (closed) product; it may also lack some features you need. Beyond this, the only free options (beyond research prototypes) are Jena and Sesame (OpenRDF). I think they won't scale to what we need. TASK DETAIL https://phabricator.wikimedia.org/T89949 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel, mkroetzsch Cc: Manybubbles, Denny, Smalyshev, mkroetzsch, Aklapper, daniel, Wikidata-bugs, Jdouglas, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T89949: RDF mapping should not assert that .../entity/Q123 is-a Wikidata item
mkroetzsch added a comment. The RDF should certainly contain information about the entity type of exported data. This is essential to ensure that the RDF data contains all the information that is found in the JSON (other than the ordering). As I read it, things that are of rdf:type Item are things that are described by on item on Wikidata. If this is not obvious to anybody who uses the data (maybe somebody really thinks that Washington himself is an item?!), we can always emphasize this in the documentation of the Item class. I therefore suggest to close this issue as invalid. It's just a matter of how we document our ontology. In particular, it should not be assumed that any triple in RDF has a self-evident ground truth associated with it that one can grasp just by reading the URIs (or their labels), though I think confusion is very unlikely here since we do not export any RDF data about item documents. The comment on rdfs:Resource seems to be a misinterpretation of the spec. In RDF, we certainly distinguish between a thing and its description, it is just that the description itself is yet another thing (in short: everything is a thing). I don't think that this has any bearing on how we want to encode the entity type in our RDF. In general, I would suggest to stick to the RDF encoding that Denny and I have worked out and published, as it is used in the existing dumps. We can always discuss changes if really needed, but we should not start to re-discuss things that are already done. What is needed now is implementation, not design. TASK DETAIL https://phabricator.wikimedia.org/T89949 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Smalyshev, mkroetzsch, Aklapper, daniel, Wikidata-bugs, Jdouglas, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T89949: RDF mapping should not assert that .../entity/Q123 is-a Wikidata item
mkroetzsch added a comment. Our primary goal is to encode the JSON information in RDF, and possibly to enrich this information where it makes sense in an RDF-context (e.g., by adding links to other datasets). The JSON data includes the entity type, so it is clear that we want to encode it in RDF in some way. As I said, my understanding is that Q42 *is* an item for a suitable sense of item, just as https://phabricator.wikimedia.org/P31 is a property in this sense. In neither case are we referring to the HTML page or any other electronic document. The confusion arises from your preconception of the item class referring to a document or description, which in turn is understandable given our lack of up-to-date documentation for this vocabulary. We can just invent new vocabulary as needed for exporting data about the HTML item pages, e.g., by introducing an ItemDocument class to refer to the documents. I think including data in RDF that we do not even have in our JSON exports is secondary for now. In fact, creating RDF exports for data that is collected by MediaWiki for every page seems a much bigger task that is hardly addressed in a satisfactory way by the snippet pasted above. Something like the SIOC vocabulary should probably be used there, and a suitable linked-data interface for accessing all revisions would be needed. I am not in favour of creating a makeshift solution now that mixes data that is special to Wikibase with data that should be there for all MW installations. MW should have it's own linked-data export for page metadata, and Wikidata should merely link to the relevant URIs from its data exports. TASK DETAIL https://phabricator.wikimedia.org/T89949 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Smalyshev, mkroetzsch, Aklapper, daniel, Wikidata-bugs, Jdouglas, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T89949: RDF mapping should not assert that .../entity/Q123 is-a Wikidata item
mkroetzsch added a comment. Thanks for adding Denny. Long reply, but details matter here. I agree that there are different things one could talk about (document, real thing). However, for now I am mainly interested in talking about the latter, since this should be our primary concern in Wikibase (the document is a thing MediaWiki has to care about). Now you argue that certain triples should not be given for the real thing (whether related to the reified statements or to the item type). However, your arguments do not have an objective foundation: the only reason why you do not want to have certain triples is that you interpret them to mean something that would not be true, whereas I am interpreting the very same triples in a way that would be correct. In other words, we have a dispute about the meaning of triples. Interestingly, the triples we discuss primarily refer to vocabulary that we created for the very purpose of being used in these triples. How could it mean the wrong thing? Only if we define it to mean the wrong thing. Therefore, let's just define those triples to mean the right thing and we are all set. There is no technical discussion to be had here; it's all about desired or undesired interpretations. If you look at the RDF structures that we get if we want to represent all of our data (without even including its order), then it should be clear that these structures simply do not have any self-evident natural interpretation. We need to tell people what they mean. Let's just tell them what we think they should mean. The only thing to keep in mind (and this is also what you are saying) is that we cannot use the same URI to mean different things in different contexts, so we need different URIs for referring to the class of real-world items and for referring to the class of item documents. I do not see a problem with this. An RDF document represents a graph. It is a purely abstract, mathematical model. Nothing is said there about the real world or about documents or about truth. It's all in our heads. The reason why we are so careful to distinguish documents from real things etc. is that we want to make sure that the data as a whole (taking all RDF from one site together) still makes sense. Yet, we are free to define this sense. We could define a property that means is described on a Wikipedia page that was once edited by someone who was born in. Would this say something about the real George Washington? Sure. For the same reason, please do not confuse reification in RDF (which represents a triple without stating that the triple is true) with reification in our export (which simply uses an auxiliary resource to make a statement). Using auxiiary nodes in RDF data is a common technique that does not have any impact on whether you are saying something about the real world or whether you are saying that a document made a certain claim. In particular, it is not any stronger or more direct to use one triple to represent a statement than to use a group of triples around an auxiliary node. You always need to document in your ontology what your RDF structures express. Whether we need to use different subject URIs for simplified and for reified exports I don't know. Maybe it also depends on the exact way in which the simplified export is created. Already in our RDF exports, we are using different property URIs in both cases, so even in the union of the datasets there would never be any doubt as to which triple belongs to which view on the data. Moreover, many triples are the same in both views (e.g., labels). Therefore I am inclined to think that there is no need for different URIs there. (I don't see the connection to your truthy projection if you just use it for answering queries, unless of course you are returning query results in RDF so that these results would turn into another kind of RDF export that needs to be consistent with those we have now.) TASK DETAIL https://phabricator.wikimedia.org/T89949 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel, mkroetzsch Cc: Denny, Smalyshev, mkroetzsch, Aklapper, daniel, Wikidata-bugs, Jdouglas, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T89949: RDF mapping should not assert that .../entity/Q123 is-a Wikidata item
mkroetzsch added a comment. Now my reply was so long that the ticket has already been closed in the meantime :-D Anyway, those are my two (or more) cents on this topic ;-) I don't think the paper goes into these topics very much (as they are not so much technical as philosophical). TASK DETAIL https://phabricator.wikimedia.org/T89949 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel, mkroetzsch Cc: Denny, Smalyshev, mkroetzsch, Aklapper, daniel, Wikidata-bugs, Jdouglas, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T72385: Wikidata JSON dump: file directory location should follow standard patterns
mkroetzsch added a comment. I think json should be in the path somewhere. It does not have to be at the top-level, but it would not be good if dump files of one type end up in their own directory. The only way for tools to detect and download dumps automatically is to look at the HTML directory listings, and this listing should not change its appearance (again). Note that different types of dumps will be created in different intervals, so a combined directory that contains several types of dumps would look quite messy in the end. We could have wikibase-dumps/wikidatawiki/json if you prefer this over something like other/wikibase-json/wikidatawiki. However, the latter seems to be more consistent with /other/incr/wikidatawiki. I don't care much about the details, but it would be good to have something systematic in the end: either other/projectname/dumptype or other/dumptype/projectname seems most logical. Also, I think that dumptype could already mention wikibase if desired, so that there is no need for an extra directory wikibase-dumps on the path. The thing to avoid is to introduce a new directory structure for every new kind of dump (and wikibase-dumps smells a lot like this, even if there is a faint possibility that there will be more dumps of this kind in the future -- do you actually have any plans to move our RDF dumps from http://tools.wmflabs.org/wikidata-exports/rdf/ to the dumps site? Could be done, but not sure if it is needed.). TASK DETAIL https://phabricator.wikimedia.org/T72385 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn, mkroetzsch Cc: Wikidata-bugs, Nemo_bis, mkroetzsch, Svick, ArielGlenn, Lydia_Pintscher, jeremyb-phone, hoo, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T86524: use data model implementation for import
mkroetzsch added a comment. I don't know about the details of the import task discussed here, but for the record: we are happy to support this use of WDTK by helping to update our implementation where necessary. TASK DETAIL https://phabricator.wikimedia.org/T86524 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mkroetzsch Cc: Aklapper, JanZerebecki, daniel, mkroetzsch, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, aude, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T86278: Define which data the query service would store
mkroetzsch added a comment. In https://phabricator.wikimedia.org/T86278#969184, @Multichill wrote: I would like to turn it around. We should support indexing everything: ... The fact that we're not creative enough to make up queries for everything doesn't mean it isn't useful. I have to disagree with this approach of designing a (query) system. Of course supporting everything would be nice, but indexing always needs to be viewed in the context of a query language. Depending on the query features you support, indexing may mean completely different things. The requirement that everything should be indexed is fuzzy and vague. For example, Wikidata Query supports regular path queries (in Wikidata, Kleene-star recursion is accessed with an operator called TREE). When you say that references should be indexed, do you mean that it should be possible to navigate through references within regular path expressions? How about qualifiers? I think Wikidata query only supports TREE in the main statements. On the other hand, Wikidata query does not support any kind of cyclic queries (Find people whose parents are married to each other.) even though the relevant data is indexed. The point is that you need adequate index structures for each query feature. You may have indexed everything and still not be able to do the queries you want. Moreover, I am creative enough to come up with queries that would be very hard to implement, yet this does not mean that we should do it. Both everything and everything we are creative enough for are poor design principles. So how to move forward? The usual approach to design a practical system is to collect use cases and requirements (example queries) and then make a clear decision what should be supported and what shouldn't. One can always revise this decision later, but it's still much better to make it explicit than to just go along and see what we get. In all of this, it must be understood that supporting everything is an impossible task. The question is merely where to draw the line(s). TASK DETAIL https://phabricator.wikimedia.org/T86278 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Aklapper, Smalyshev, Lydia_Pintscher, Multichill, Magnus, daniel, JeroenDeDauw, JanZerebecki, aude, mkroetzsch, Denny, Sjoerddebruin, Tobi_WMDE_SW, jkroll, Wikidata-bugs, GWicke, Manybubbles ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T86278: Define which data the query service would store
mkroetzsch added a comment. @Smalyshev My suggestion was just about the surface appearance, not about the inner workings. I am saying that the following two phrases have the same structure: - Find things with a *sitelink* that *has badge* *featured*. - Find things with a *population* that has *point in time* *2014*. If you look at it like this, you use badge like a (special) qualifier property and featured like a value. This does not mean that the query answering will be any less efficient than with another syntax. The query engine would easily parse the queries in any case, without ambiguity, and know that sitelinks are (possibly) stored in a different way internally. Same for labels. The reason I was suggesting this unification here was that it also somehow answers the question what do we mean by 'indexing' this data?: any query that would work over statements would also be expected to work for sitelinks, even if different structures are used internally. TASK DETAIL https://phabricator.wikimedia.org/T86278 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Aklapper, Smalyshev, Lydia_Pintscher, Multichill, Magnus, daniel, JeroenDeDauw, JanZerebecki, aude, mkroetzsch, Denny, Sjoerddebruin, Tobi_WMDE_SW, jkroll, Wikidata-bugs, GWicke, Manybubbles ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T86278: Define which data the query service would store
mkroetzsch added a comment. This is not correct, original structure can be recovered Then I misunderstood the transformation that was proposed. My impression was that a statement with three qualifier snaks: P1 V1, P1 V2, https://phabricator.wikimedia.org/P2 V3 would be stored as two statements, one with qualifiers P1 V1, https://phabricator.wikimedia.org/P2 V3, and one with qualifiers P1 V2, https://phabricator.wikimedia.org/P2 V3. In this case, one would not be able to distinguish this from the case where two statements with two qualifiers each had been given originally. Could you explain what kind of transformation you had in mind? Scan of the database shows there are no entries generating more than 15 qualifier splits My point was that an attacker could craft a single statement that makes you index millions of statements. It's clear that such statements are not in the current data, since they are hopefully not needed. Again this depends on my (possibly incorrect) understanding of your intended transformation. TASK DETAIL https://phabricator.wikimedia.org/T86278 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Aklapper, Smalyshev, Lydia_Pintscher, Multichill, Magnus, daniel, JeroenDeDauw, JanZerebecki, aude, mkroetzsch, Denny, Sjoerddebruin, Tobi_WMDE_SW, jkroll, Wikidata-bugs, GWicke, Manybubbles ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T86278: Define which data the query service would store
mkroetzsch added a comment. @JanZerebecki I understand what you are saying about what indexing means here. Makes sense to me. What you are saying about my example query sounds as if you are planning to implement query execution manually. I hope this is not the case and you can just give the query to Titan to get it answered for you. You mentioned splitting certain statements with duplicate qualifiers. This changes the structure, and the original structure is no longer represented and can no longer be faithfully recovered. I don't know if this is an issue with the current data (which duplicate qualifiers are actually used?) but it is an issue in general. It also means that with a mere 40 qualifiers (20 properties, each with two values) on one forged statement, I could create a million statements in your index -- a possible DOS attack vector? An alternative technique for handling such cases is to use a special encoding for overflowing values. This is done successfully in some database implementations, but it may mean additional checks during query answering, and depending on how low in the query answering process you can do these checks, they may or may not add significant cost (hence it's a pity that Titan does not support this efficiently out of the box). TASK DETAIL https://phabricator.wikimedia.org/T86278 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Aklapper, Smalyshev, Lydia_Pintscher, Multichill, Magnus, daniel, JeroenDeDauw, JanZerebecki, aude, mkroetzsch, Denny, Sjoerddebruin, Tobi_WMDE_SW, jkroll, Wikidata-bugs, GWicke, Manybubbles ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T86278: Define which data the query service would store
mkroetzsch added a comment. @Smalyshev My point is merely that sitelinks and labels //can// be handled like statements. Since statements must be supported anyway, it would be sensible to reuse the data structures and query expressions defined for them. I don't think that confusion is likely, since the query language will not use the colloquial names as my examples. Properties of Wikidata will always be referred to by their Pid, whereas something like has badge would not have an id of this form. So it's not like having a reserved label has badge that competes with Wikidata property labels. Structurally, however, statements and sitelinks can all be represented in the same data structure as far as querying is concerned. Maybe you would like it better if you viewed it as a separate, independent data structure qualified triple that we would use to represent statements, sitelinks, and labels? There is no conceptual mix-up between the high-level terms here, just taking advantage of similar structure at a low level. If you look at common query languages like SQL, SPARQL, Cypher, etc. then you can see that they are always based on a relatively small set of structural primitives that do not have a domain-specific meaning. You can always build UIs that use domain specific terms like sitelink and that make them appear separate, but for implementers and API users it is very useful if some things can be unified. TASK DETAIL https://phabricator.wikimedia.org/T86278 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, mkroetzsch Cc: Aklapper, Smalyshev, Lydia_Pintscher, Multichill, Magnus, daniel, JeroenDeDauw, JanZerebecki, aude, mkroetzsch, Denny, Sjoerddebruin, Tobi_WMDE_SW, jkroll, Wikidata-bugs, GWicke, Manybubbles ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs