Re: [Wikidata] Big numbers
Am 07.10.19 um 09:50 schrieb John Erling Blad: > Found a few references to bcmath, but some weirdness made me wonder if it > really > was bcmath after all. I wonder if the weirdness is the juggling with double > when > bcmath is missing. I haven't looked at the code in five years or so, but when I wrote it, Number was indeed bcmath with fallback to float. The limit of 127 characters sounds right, though I'm not sure without looking at the code. Quantity is based on Number, with quite a bit of added complexity for converting between units while considering the value's precision. e.g. "3 meters" should not turn into "118,11 inch", but "118 inch" or even "120 inch", if it's the default +/- 0.5 meter = 19,685 inch, which means the last digit is insignificant. Had lots of fun and confusion with that. I also implemented rounding on decimal strings for that. And initially screwed up some edge cases, which I only realized when helping my daughter with her homework ;) -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Personal news: a new role
Very cool! Looking forward to seeing more of you! Am 19.09.19 um 18:56 schrieb Denny Vrandečić: > Hello all, > > Over the last few years, more and more research teams all around the world > have > started to use Wikidata. Wikidata is becoming a fundamental resource [1]. That > is also true for research at Google. One advantage of using Wikidata as a > research resource is that it is available to everyone. Results can be > reproduced > and validated externally. Yay! > > I had used my 20% time to support such teams. The requests became more > frequent, > and now I am moving to a new role in Google Research, akin to a Wikimedian in > Residence [2]: my role is to promote understanding of the Wikimedia projects > within Google, work with Googlers to share more resources with the Wikimedia > communities, and to facilitate the improvement of Wikimedia content by the > Wikimedia communities, all with a strong focus on Wikidata. > > One deeply satisfying thing for me is that the goals of my new role and the > goals of the communities are so well aligned: it is really about improving the > coverage and quality of the content, and about pushing the projects closer > towards letting everyone share in the sum of all knowledge. > > Expect to see more from me again - there are already a number of fun ideas in > the pipeline, and I am looking forward to see them get out of the gates! I am > looking forward to hearing your ideas and suggestions, and to continue > contributing to the Wikimedia goals. > > Cheers, > Denny > > P.S.: Which also means, incidentally, that my 20% time is opening for new > shenanigans [3]. > > [1] https://www.semanticscholar.org/search?q=wikidata&sort=relevance > [2] https://meta.wikimedia.org/wiki/Wikimedian_in_residence > [3] https://wikipedia20.pubpub.org/pub/vyf7ksah > > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Language for non-logged in users
Am 25.01.19 um 13:33 schrieb DaB.:> Hello. > Am 25.01.2019 um 12:13 schrieb Daniel Kinzler: >> Serving different content from the same URL is generally a bad thing. > > No, it’s not. That’s the reason they invented Language-headers in the > first place: So you can view a page in your language and I can view a > site in my language. Please respect that not everybody can read english > (fluently). Headers can solve the caching problem, but this makes it impossible to link to a specific language version of a page. That is bad when discussing specifics of the page, and can cause confusion. It's also bad for search engine indexes, which should index all language versions. I very much want everyone to be able to see each page in their own language. The idea is to redirect based on the language header, when visiting the neutral URL. Please read the proposal. -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Language for non-logged in users
The reason this is not trivial is two-fold: 1) caching and b) the semantics of URLs. Serving different content from the same URL is generally a bad thing. A soltuion for this is dicussed in <https://phabricator.wikimedia.org/T114662>, but work on this is currently not resourced. Am 25.01.19 um 11:44 schrieb Darren Cook:> I wanted to send someone a URL to show them how a data item looks in > Japanese (so we could see which items have a translation). But am I > right in thinking there is nothing I can put in the URL to do this? > > I also tried changing my accept-language header to put "ja" first, but > it is ignored. Was this a feature that was discussed and rejected; or > just an itch that no-one has got around to scratching yet? > > Darren > > P.S. I realize I can login, change my UI to another language, and see > the data that way. But that is quite a long-winded process, especially > if the person has not created an account yet. > > It also changes the whole UI, not just the data, which is painful if I > just want to see what has been translated but cannot read the language. > I think for a project about data, you should be able to set the UI > language and the content language separately. > > E.g. I just put a page into Greek (I think), and now I can see the few > items that have been translated, but cannot read the property names! Let > alone navigate the site.) (The switch back to previous language link at > the top was a great idea, though - thank-you to whoever thought of that > shortcut.) > > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata
Am 06.12.18 um 09:49 schrieb Daniel Kinzler: > Am 02.12.18 um 02:28 schrieb Erik Paulson: >> How do these external identifiers work, and how do I get something into one >> of >> these namespaces? (I apologize if I have missed them in the documentation) > > Hi Erik! Oh, I forgot an important disclaimer: I used to be on the Wikidata team and I was involved in discussing and specifying the different levels of federations for Wikibase repos. I am no longer part of the Wikidata team though, and may not to up to date to the latest progress. I cannot in any way speak for the Wikidata team or make any promises. -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata
Am 02.12.18 um 02:28 schrieb Erik Paulson: > How do these external identifiers work, and how do I get something into one of > these namespaces? (I apologize if I have missed them in the documentation) Hi Erik! You got the right idea. Sadly, this feature is not implemented yet. I don't know if there is any public documentation for this by now, but here is a very rough list of the stepping stones towards allowing what you want: 1) Enable Items and Properties that exist on Wikidata to be referenced from other Wikibase instances (repo or client) that can access the Wikidata's internal database directly, and do not themselves define Items or Properties (but may define other kinds of entities). This is implemented, but not deployed yet. It is scheduled to be deployed soon on Wikimedia Commons, as part of the "Structured Data on Coommons" projects (aka Wikibase MediaInfo). 2) Enable Items and Properties that exist on Wikidata to be referenced from other Wikibase instances (repo or client) that call Wikidata's web API, and do not themselves define Items or Properties (but may define other kinds of entities). This is relatively simple, but details about the caching mechanisms need to be ironed out. Ask Adam and Lydia about the timeline for this. 3) Enable Items and Properties that exist on Wikidata to be referenced from other Wikibase instances (repo or client) that call Wikidata's web API, and *do* themselves also define Items or Properties which are *distinct* from the ones that Wikidata defines. The spec for this is clear, but some old code needs to be updated to enable this, and some details about the user interface need to be worked out. Ask Adam and Lydia about the timeline for this. 4) Enable Items and Properties that exist on Wikidata to be referenced from other Wikibase instances (repo or client) that call Wikidata's web API, and may "augment" or "override" the descriptions of Items and Properties defined on Wikidata. There seems to be a lot of demand for this, but the details of the semantics are unclear, especially with respect to SPARQL queries. More discussion is needed. -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata
Am 29.11.18 um 10:40 schrieb Yuri Astrakhan:>If at > some point you decide to add some new area of data, e.g. biological, you could > add new prefixes for that too, but that would also be a "separate" project. The Q, P, L, M, etc are used to identify the *type* of entity. They are not for keeping projects separate. That was never their purpose. Wikibase uses prefixes before that, but they are prefixed *before* the letter that indicates the type. > The prefix can be omitted for local entities, so Q12345 > is an item on the local repo (or the default repo of a wikibase client). > > I think that was a big mistake -- the "(or the default repo of a wikibase > client)" -- because wd implies Wikidata, not Wikibase, so it dilutes the > meaning of "wd:". See my other email on how I fixed it. I'm confused - yes, we: should ALWAYS imply wikidata. Your wikibase instance would have its own prefix (that can be omitted for local use), e.g. "osm:". For the record, I'm just voicing my oppinion here, and telling you what the original intention was. I'm no longer working on Wikidata or Wikibase, and I can't make any decisions on any of this. -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata
Am 29.11.18 um 01:00 schrieb Lydia Pintscher: > On Thu, Nov 29, 2018 at 9:46 AM Andra Waagmeester wrote: >> I fully agree. I rather see the scarse development resources being focused >> on fixing this, than the p/q business, as you nicely call it. Tbh, I really >> don't see an issue with multiple p's and q's over different Wikibases. That >> is where prefixes are for, to distinguish between different resources. >> Examples of identical identifier (literal) schemes between multiple >> resources are abundant. (e.g. PubMed and NCBI gene) It really is a matter of >> getting used to, or am I missing something? > > Are we talking about https://phabricator.wikimedia.org/T194180? I'm > happy to push that into one of the next sprints if so. This doesn't fix the hard-coded prefix in the RDF output generated by Wikibase. -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata
Am 29.11.18 um 08:21 schrieb Imre Samu: > - What is the real meaning of Q/P prefix -> Wikidata or Wikibase? The intention was: P and Q indicate the *type* of the entity ("P" = "Property", "Q" = "Item" for arcane reasons), "L" = Lexeme, "F" = Form, "S" = Sense, "M" = MediaInfo). As you can tell, we'd quickly run out of letters and cause confusion if this became configurable. Using prefixes to indicate where the entity comes from is indeed useful and is already part of the model. The prefix for Wikidata is "wd:", wo "wd:Q12345" is an item from Wikidata. The prefix can be omitted for local entities, so Q12345 is an item on the local repo (or the default repo of a wikibase client). -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata
Am 28.11.18 um 23:53 schrieb Olaf Simons: > I will receive answers in the form of > > wd:q25 > > but they do not lenk to wd, wikidata, but into our database > https://database.factgrid.de/entity/Q25. Right, that prefix should not be "wd" for your own query service. I'm afraid that's currently hard coded in the RdfVocabulary class. That should indeed be fixed. -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata
Am 28.11.18 um 10:15 schrieb James Heald: > It should also be made possible for the local wikibase to use local prefixes > other than 'P' and 'Q' for its own local properties and items, otherwise it > makes things needlessly confusing -- but currently I think this is not > possible. I think the opposite is the case: ending up with a zoo of prefixes, with items being called A73834 and F0924095 and Q98985 and W094509, would be very confusing. The current approach is to to use the same approach that RDF and XML use: add a kind of namespace identifier in front of "foreign" identifiers. So you would have Q437643 for "local" items, xy:Q8743 for items from xy, foo:Q873287 for items from foo, etc. This is how foreign IDs are currently implemented in Wikibase. -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons
Hi Pine, sorry for the misleading wording. Let me clarify below. Am 19.10.18 um 9:51 nachm. schrieb Pine W: > Hi Markus, I seem to be missing something. Daniel said, "And I think the best > way to achieve this is to start using the ontology as an ontology on wikimedia > projects, and thus expose the fact that the ontology is broken. This gives > incentive to fix it, and examples as to what things should be possible using > that ontology (namely, some level of basic inference)." I think that I > understand the basic idea behind structured data on Commons. I also think > that I > understand your statement above. What I'm not understanding is how Daniel's > proposal to "start using the ontology as an ontology on wikimedia projects, > and > thus expose the fact that the ontology is broken." isn't a proposal to add > poor > quality information from Wikidata onto Wikipedia and, in the process, give > Wikipedians more problems to fix. Can you or Daniel explain this? What I meant in concrete terms was: let's start using wikidata items for tagging on commons, even though search results based on such tags will currently not yield very good results, due to the messy state of the ontology, and hope people fix the ontology to get better search results. If people use "poodle" to tag an image and it's not found when searching for "dog", this may lead to people investigating why that is, and coming up with ontology improvements to fix it. What I DON'T mean is "let's automatically generate navigation boxes for wikipedia articles based on an imperfect ontology, and push them on everyone". I mean, using the ontology to generate navigation boxes for some kinds of articles may be a nice idea, and could indeed have the same effect - that people notice problems in the ontology, and fix them. But that would be something the local wiki communities decide to do, not something that comes from Wikidata or the Structured Data project. The point I was trying to make is: the Wiki communities are rather good in creating structures that serve their purpose, but they do so pragmatically, along the behavior of the existing tools. So, rather than trying to work around the quirks of the ontology in software, the software should use very simply rules (such as following the subclass relation), and let people adopt the data to this behavior, if and when they find it useful to do so. This approach, over time, provides better results in my opinion. Also, keep in mind that I was referring to an imperfect *improvement* of search. the alternative being to only return things tagged with "dog" when searching for "dog". I was not suggesting to degrade user experience in order to incentivize editors. I'm rather suggesting the opposite: let's NOT give people a reason tag images that show poodles with "poodle" and "dog" and "mammal" and "animal" and "pet" and... -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons
Am 18.10.2018 um 19:05 schrieb Peter F. Patel-Schneider: > On 10/17/18 7:04 AM, Daniel Kinzler wrote: >> My (very belated) thoughts on this issue: >> > [...] >> I say: let it produce> bad results, tell people why the results are bad, and > what they can do about it! > [...] >> >> -- daniel > My view is that there is a big problem with this for industrial use of > Wikidata. > [...] > What is the biggest problem I see in Wikidata? It is the poor organization of > the Wikidata ontology. To fix the ontology, beyond doing point fixes, is > going to require some commitment from the Wikidata community. I agree. And I think the best way to achieve this is to start using the ontology as an ontology on wikimedia projects, and thus expose the fact that the ontology is broken. This gives incentive to fix it, and examples as to what things should be possible using that ontology (namely, some level of basic inference). -- Daniel Kinzler Principal Software Engineer, MediaWiki Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons
gt; "...the burden of proof has to be placed on authority, and it should be > dismantled if that burden cannot be met..." > > -Thad > +ThadGuidry <https://plus.google.com/+ThadGuidry> > > > On Sat, Sep 29, 2018 at 2:49 AM Ettore RIZZA <mailto:ettoreri...@gmail.com>> wrote: > > Hi, > > The Wikidata's ontology is a mess, and I do not see how it could be > otherwise. While the creation of new properties is controlled, any > fool > can decide that a woman <https://www.wikidata.org/wiki/Q467>is no > longer > a human or is part of family. Maybe I'm a fool too? I wanted to remove > the claim that a ship <https://www.wikidata.org/wiki/Q11446> is an > instance of "ship type" because it produces weird circular inferences > in > my application; but maybe that makes sense to someone else. > > There will never be a universal ontology on which everyone agrees. I > wonder (sorry to think aloud) if Wikidata should not rather facilitate > the use of external classifications. Many external ids are knowledge > organization systems (ontologies, thesauri, classifications ...) I > dream > of a simple query that could search, in Wikidata, "all elements of the > same class as 'poodle' according to the classification of imagenet > <http://imagenet.stanford.edu/synset?wnid=n02113335>. > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> > https://lists.wikimedia.org/mailman/listinfo/wikidata > > > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Solve legal uncertainty of Wikidata
Am 18.05.2018 um 21:37 schrieb Amirouche Boubekki: > What wikidata doesn't track the license of each piece of information?! Facts don't *have* licenses. They have sources, and we track those. Which may have licenses, depending on jurisdiction, interpretation, form, content, etc. But the fact itself doesn't, it's not copyrightable. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] RDF: All vs Truthy
Am 03.12.2017 um 15:06 schrieb Fariz Darari: > Current state gives me one result, the Russian ruble, due to its preferred > rank > (notice the wdt prefix): > > https://query.wikidata.org/#select%20%2a%0A%7B%20wd%3AQ159%20wdt%3AP38%20%3Fcurrency%20%7D Ah, right - the current answer would by convention be marked as preferred, so only it counts as "truthy". Sorry for the confusion. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] RDF: All vs Truthy
Am 03.12.2017 um 14:49 schrieb Imre Samu: >>All=contains not only the Truthy ones,but also the ones with qualifiers > > imho: Sometimes Qualifiers is very important for multiple values ( like > "Start time","End time","point in time", ... ) > for example: Russia https://www.wikidata.org/wiki/Q159 : Russia - > P38:"currency" > has 2 "statements" both with qualifiers: > > * Russian ruble - ( start time: 1992 ) > * Soviet ruble - (end time: September 1993 ) > > My Question: > in this case - what is the "Truthy=simple" result for Russia-P38:"currency" > ? You will simply get two truthy results: Russian rubel, and Soviet rubel. Both are Russian currencies. If you want to know when, why, where, etc, you have to check the qualified "full" statements. That's why it's called "truthy": the answer is kind of true, depending on context. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] How to get direct link to image
Am 30.10.2017 um 19:10 schrieb Laura Morales: >> You can also use the Wikimedia Commons API made by Magnus: > https://tools.wmflabs.org/magnus-toolserver/commonsapi.php >> It will also gives you metadata about the image (so you'll be able to cite >> the author of the image when you reuse it). > > Is the same metadata also available in the Turtle/HDT dump? Sadly not. We don't have proper structured meta-data yet. That's what the Structured Data on Commons project is about: <https://commons.wikimedia.org/wiki/Commons:Structured_data> -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikidata is becoming a proper citizen of the linked open data web
Due to massive performance problems, Wikibase was roleld back to the previous version last night. So the new features currently do not work. We are working hard to find the cause of the problem (which may or may not be related to Wikidata), so we can deploy the latest version again. Sorry for the confusion! Am 27.10.2017 um 11:06 schrieb Jakob Voß: > Hi Lydia an all of you, > > Lydia wrote: > >> The identifier can often be expanded to a full URI. (For example, LoC >> ID n81114174 becomes http://id.loc.gov/authorities/names/n81114174.) >> This full URI can then be used in the linked open data web to match >> our data with other datasets and use both of them together easily. >> >> From today on, Wikidata has full URIs for statements that represent >> external identifiers in its RDF exports, and thereby becomes a proper >> citizen of the linked open data web. To make this work the property >> for the external ID needs to have a statement with property “URI used >> in RDF” (https://www.wikidata.org/wiki/Property:P1921). > > Could you give an example? The RDF of item Q43027 with LoC ID n81114174 does > not > include the URI <http://id.loc.gov/authorities/names/n81114174> if exported > with > http://www.wikidata.org/wiki/Special:EntityData/Q43027 > > I also tried with a statement just added to make sure it's not some caching > issue. Is the feature not enabled yet? > > In particular I'm interested how the external URI and Wikidata URI are > connected. > > subject: <http://www.wikidata.org/entity/Q43027> > property: ??? > object: <http://id.loc.gov/authorities/names/n81114174> > > I'm sure the RDF-property also depends on the Wikidata-property so this > feature > requires some additional tweaking. At least the property is not always > owl:sameAs because we have at least 1-to-n relationships between Wikidata > items > and external ids. > > Cheers, > Jakob > > P.S: Won't have time to cover all these aspects in my WikidataCon Lightening > talk about https://www.wikidata.org/wiki/Wikidata:Identifiers > -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikidata prefix search is now Elastic
Am 26.10.2017 um 11:36 schrieb Marco Fossati: > Thanks a lot Stas for this present. > Could you please share any pointers on how to integrate it into other tools? Just keep using wgsearchentities. It now uses Cirrus as a backend, instead of SQL. That should provide better performance, and better ranking. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Navigation to Wikipedia links on Wikidata
If your browser window is wide enough, Sitelinks to Wikipedia should already be close to the top of the page, on the right-hand side. But in any case, you can always add #sitelinks-wikipedia to the URL, like in <https://www.wikidata.org/wiki/Q1#sitelinks-wikipedia>. That will make the browser jump right to the wikipedia section. Am 05.09.2017 um 16:47 schrieb Tito Dutta: > Hello, > If I am on a Wikidata item page (QX), what's the easiest way to navigate > to > the Wikipedia links other than manual scrolling? Sometimes (actually a lot of > times) I need to check Wikipedia articles (not only English) before I add > description part. Is there any user script or something that puts Wikipedia > links above statement or any other suggestion? > > Thanks > Tito Dutta > Note: If I don't reply to your email in 2 days, please feel free to remind me > over email or phone call. > > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] lib.reviews: Review anything with a Wikidata entry
Thanks for sharing, Erik! Combining search and quiery capabilities would needed be useful for quite a few things. We'll probably be working on making this easier soon. -- daniel Am 26.07.2017 um 07:26 schrieb Erik Moeller: > A small update on this: based on some off-list feedback, I replaced > the way I exclude disambiguation pages and the like from the > autocomplete list. The autocomplete widget now performs two queries: a > MediaWiki API (wbsearchentities) query, and a follow-up WDQS SPARQL > query to exclude disambiguation pages, Wikinews articles, and other > content that folks are most likely not interested in reviewing. > > I didn't find a good example for this in the examples directory, so I > figured folks might find the query I'm using useful. Before I add it > to the examples, please let me know if you see obvious ways in which > it can be improved. > > Here's an example query: > > # For a list of items, exclude the ones that have "instance of" set to > # one from a given set of excluded classes > SELECT DISTINCT ?item WHERE { > ?item ?property ?value > > # Excluded classes: disambiguation pages, Wikinews articles, etc. > MINUS { ?item wdt:P31 wd:Q4167410 } > MINUS { ?item wdt:P31 wd:Q17633526 } > MINUS { ?item wdt:P31 wd:Q11266439 } > MINUS { ?item wdt:P31 wd:Q4167836 } > MINUS { ?item wdt:P31 wd:Q14204246 } > > # Set of items to check against the above exclusion list > # wd:Q355362 is a disambiguation page and will therefore not be in > # the result set > VALUES ?item { wd:Q23548 wd:Q355362 wd:Q1824521 wd:Q309751 wd:Q6952373 } > } > > _______ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wiki PageID
Hello Gintautas! Am 21.04.2017 um 17:58 schrieb Gintautas Sulskus: > I have a couple of questions regarding the Wiki Page ID. Does it always stay > unique for the page, where the page itself is just a placeholder for any kind > of > information that might change over time? That is indeed the idea. COntent changes, the page ID stays the same. If you need to identify a specific state of the page, use the revision ID (aka permalink). Note however that page IDs are considered "internal" identifiers. They are stable, but they are not the canonical way to access or identify a page. Use the title for that - or, in the context of Wikidata, use the entity ID. > Consider the following cases: > 1. The first time someone creates page "Moon" it is assigned ID=1. If at some > point the page is renamed to "The_Moon", the ID=1 remains intact. Is this > correct? Yes, page IDs survive renaming/moving the page. > 2. What if we have page "Moon" with ID=1. Someone creates a second-page > "The_Moon" with ID=2. Is it possible that page "Moon" is transformed into a > redirect? Then, "Moon" would be redirecting to page "The_Moon"? Yes, pages can become redirects. > 3. Is it possible for page "Moon" to become a category "Category:Moon" with > the > same ID=1? Yes, pages can be moved into the category namespace. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Languages in Wikidata4Wiktionary
Am 10.04.2017 um 18:12 schrieb Denny Vrandečić: > So assume we enter a new Lexeme in Examplarian (which has a Q-Item), but > Examplarian has no language code for whatever reason. What language code would > they enter in the MultilingualTextValue? My plan is: it will be "mis+Q7654321" internally, which will be exposed in HTML and RDF as "mis". We will want to distinguish "a known language not on this list (mis)" from "an unknown language (und)" and "translingual" (Wiktionary uses "mul" for translingual, but that's not technically correct). -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Languages in Wikidata4Wiktionary
Am 10.04.2017 um 19:24 schrieb Denny Vrandečić: > Daniel, I agree, but isn't that what Multilingual Text requires? A language > code? Yes. Well, internally, it just has to be *some* unique code. But for interoperability, we want it to be a standard code. So I propose to internally use something like "de+Q1980305", and expose that as "de" externally. This allows us to distinguish however many variants of German we want internally, and tag them all as "de" in HTML and RDF, so standard tools can use the language information. > I assume most of it is hidden behind mini-wizards like "Create a new lexeme", > which actually make sure the multitext language and the language property are > consistently set. In that case I can see this work. Yes, that is exactly the plan for the NewLexeme page. We'll still have to come up with a nifty UI for "add a lemma, select a language, and optionally an item identifying a variant of that language". -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Languages in Wikidata4Wiktionary
Am 10.04.2017 um 18:56 schrieb Gerard Meijssen: > Hoi, > The standard for the identification of a language should suffice. I know no standard that would be sufficient for our use case. For instance, we not only need identifiers for German, Swiss and Austrian German. We also need identifiers for German German before and after the spelling reform of 1901, and before and ofter the spelling reform of 1996. We will also need identifiers for the "language" of mathematical notation. And for various variants of ancient languages: not just Sumerian, but Sumerian from different regions and periods. The only system I know that gives us that flexibility is Wikidata. For interoperability, we should provide a standard language code (aka subtag). But a language code alone is not going to be sufficient to distinguish the different variants we will need. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Languages in Wikidata4Wiktionary
Tobias' comment made me realize that I did not clarify wone very important distinction: there are two kinds of places where a "language" is needed in the Lexeme data model <https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model>: 1) the "lexeme language". This can be any Item, language code or no. This is what Tobias would have to use in his query. 2) the language codes used in the MultilingualTextValues (lemma, representation, and gloss). This is where my "hybrid" approach comes in: use a standard language code augmented by an item ID to identify the variant. To make it easy to create new Lexemes, the lexeme language can serve as a default for lemma, representation, and gloss - but only if it has a language code. If it does not have one, the user will have to specify one for use in MultilingualTextValues. Am 06.04.2017 um 19:59 schrieb Tobias Schönberg: > An example using the second suggestion: > > If I would like to query all L-items that contain a combination of letters and > limit those results by getting the Q-items of the language and limit those, to > those that have Latin influences. > > In my imagination this would work better using the second suggestion. Also the > flexibility of "what is a language" and "what is a dialect" would seem easier > if > we can attach statements to the UserLanguageCode or the Q-item of the > language. > > -Tobias -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Disputed territories in WDQS
Hi Andrea! As Nicolas pointed out, the map view of WDQS is based on OpenStreetMap. So the territory would have to be marked as disputed there. However, perhaps you can turn this into a positive example for Wikidata's flexibility and NPOV afterall: I have added some statements to <https://www.wikidata.org/wiki/Q5671580> to show how a territorial dispute can be modeled on Wikidata. I was lazy and didn't add any sources, though - I didn't know what to make of "Donovan 2003" given in Wikipedia, as it doesn't give the title of a publication. But I suppose sources for these things should be easy to find. HTH daniel Am 09.04.2017 um 14:54 schrieb Andra Waagmeester: > I am currently in Suriname, where I gave a talk on open > data/wikipedia/wikidata. > Next week there will be a handson session, where I hope to get as much > contribution from this country as possible. > > When I demonstrated the WDQS, the audience took offense in the way Suriname is > depicted on the map view used in the WDQS. There is a territorial dispute with > the neighboring country Guyana, called the Tigri > area(https://en.wikipedia.org/wiki/Tigri_Area). In the WDQS this area is > currently being drawn as being part of Guyana. The maps drawn in the WIkipedia > article shows how the issue is dealt with here when drawing maps. i.e. The > area > is explicitly drawn as being a territorial dispute, which is more factual. > > Any idea's on how to get a similar mapview on the WDQS? Thanks to Wikipedia > Zero, where people can have free access to Wikidata (even in remote area's), > there is quite some potential to get people involved in adding local data. > Having the current mapview is counter productive. > > Cheers, > > Andra > > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Languages in Wikidata4Wiktionary
Am 07.04.2017 um 01:34 schrieb Denny Vrandečić: > I foresee that might be a bit of a problem for external tools consuming > this data - how they would figure out what language it is if it's > doesn't have a code? We could of course generate fake codes like > mis-x-q12345, maybe that would work. > > Q-items for languages already have a property to state their language code. > It's > just an extra hop away. We want ISO codes (or rather, IANA language subtags [1]), so we can use them in HTML lang attributes, and in RDF literals. This allows interoperability with standard tools. For this reason, I also favor a mixed approach, that allows standard language tags to be used whenever possible. I have some ideas on how that could work, but no definite plan yet. Something like de+Q1980305 could work; when generating HTML or RDF, we'd just drop the suffix. For transligual entries (e.g. the for number symbol i), we could use e.g. mis+Q1140046. [1] https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Significant change: new data type for geoshapes
Am 29.03.2017 um 15:19 schrieb Luca Martinelli: >> One thing to note: We currently do not export statements that use this >> datatype to RDF. They can therefore not be queried in the Wikidata Query >> Service. The reason is that we are still waiting for geoshapes to get stable >> URIs. This is handled in this ticket. This ticket: <https://phabricator.wikimedia.org/T159517>. And more generally <https://phabricator.wikimedia.org/T161527>. The technically inclined of you may be interested in joining the relevant RFC discussion on IRC tonight at 21:00 UTC (2pm PDT, 23:00 CEST) #wikimedia-office. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] wikibase:directClaim predicate?
Am 27.03.2017 um 23:48 schrieb Kingsley Idehen: > I think we can just agree to disagree for now, since nothing you've > stated is fundamentally contrary to my view of RDF -- as a Language for > describing anything (including statements) :) Yes, that's what RDF is. My pint is: just because seomthing can be described in RDF doesn't mean it *is* RDF. As you said, RDF can describe anything. If anything that can be described with RDF *is* RDF, then everything is RDF. Then the term would be meaningless. The Wikibase model "is" an RDF model just as much as it "is" modal logic system, or any other sufficiently powerful formal language. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] wikibase:directClaim predicate?
Am 27.03.2017 um 15:13 schrieb Kingsley Idehen: > I see Wikidata is a collection of reified RDF Statements. I don't see how this > model differs from RDF's model. It just so happens (in my eyes) that Wikidata > includes description of statements about things which provides rich metadata, > in > line with the goals of Wikidata. It's a matter of perspective. I agree that Wikidata can be *represented* as a collection of reified RDF Statements. That's what we do for the query service. But I do not agree that this is what Wikidata *is*. RDF and the Wikibase model are quite different conceptually. But they are of equal power and thus formally equivalent: one can be represented using the other. Just because a Turing Machine is computationally equivalent to lambda calculus, that does not mean they are the same thing. Understanding one in terms of the other may be helpful in some context, and irrelevant in another. There is nothing special about the relationship between Wikibase/Wikidata and RDF; Wikibase has an RDF binding, but it is not defined in terms of RDF, its specification does not rely on RDF concepts. The Wikibase model can just as well (or perhaps more easily) be understood and represented in terms of the Topic Maps model (ISO 13250). Academically, the Wikibase model could perhaps be described as an extended model logic with reasoning rules for provenance. I think W. Stelzner explored related ideas in the 80s. Maybe one day I'll find the time to dig into this some more. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Does Wikidata use a property store or a RDF triplestore?
The primary data storage is document oriented, and very dumb. It's JSON blobs stored as wiki page content, using MediaWiki's standard content blob storage mechanism. We have a live export to a triple store, and an open SPARQL endpoint. These links may be helpful: https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format https://www.wikidata.org/wiki/Wikidata:Data_access If you want to play with the data, try http://query.wikidata.org/ -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] What kind of bot "wiktionary in wikidata" needs?
Am 22.03.2017 um 10:10 schrieb Amirouche: > My understanding is that wiktionary (and wikipedia) CC-BY-SA license is > incompatible with wikidata CC0 license. That is true, for any copyrighted information on Wiktionary. That will mainly be definitions, and maybe example sentences. Facts, such as word type or morphology, are not copyrightable. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] wikibase:directClaim predicate?
Am 19.03.2017 um 18:21 schrieb Bob DuCharme: > I do have to ask: if the mapping used on wikidata.org has diverged from what > is > described there, is a more up-to-date description of the mapping available > anywhere? The current mapping is the one described at https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] wikibase:directClaim predicate?
Am 18.03.2017 um 23:15 schrieb Daniel Kinzler: > Wikibase Entities are certainly Resources in the RDF sense, but so are some of > the more fine grained components of the Wikibase model, such as Statements and > References. You can find the OWL file for the RDF binding of Wikibase at > <http://wikiba.se/ontology>. If you are interested, there's a paper about mapping Wikidata to RDF: http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf Note however that the mapping used on wikidata.org has somewhat diverged from what is discribed in the paper. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] wikibase:directClaim predicate?
Am 18.03.2017 um 22:48 schrieb Bob DuCharme: > New question: when I see that https://www.wikidata.org/wiki/Special:EntityData > says "This page provides a linked data interface to entity values", can you > tell > me what "entity" means in the context of Wikidata? If I was going to refer to > something that can be identified with a URI and described by triples in which > it > is the subject, I would just use the term "resource" as described at > https://www.w3.org/TR/rdf11-concepts/#resources-and-statements (and > remembering > what "RDF" stands for!) so I'm guessing that "entity" means something a little > more specific than that here. The Wikidata (or technically, Wikibase) data model is not defined in terms of RDF. Have a look at the primer <https://www.mediawiki.org/wiki/Wikibase/DataModel/Primer> and the spec <https://www.mediawiki.org/wiki/Wikibase/DataModel>. Entitites are the top level elements of Wikidata. There are currently two kinds: Items (things or concepts in the world) and Properties (attributes for describing Items and other entities). Wikibase Entities are certainly Resources in the RDF sense, but so are some of the more fine grained components of the Wikibase model, such as Statements and References. You can find the OWL file for the RDF binding of Wikibase at <http://wikiba.se/ontology>. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] wikibase:directClaim predicate?
Am 18.03.2017 um 21:27 schrieb Bob DuCharme: > Thanks Daniel! > > How do I find a full statement representation? For example, what would the > full > statement representation be for a triple like > {wd:Q64 wdt:P1376 wd:Q183}? The full representation of the statement in this case is: wds:Q64-43CCD3D6-F52E-4742-B0E3-BCA671B69D2C a wikibase:Statement, wikibase:BestRank ; wikibase:rank wikibase:PreferredRank ; ps:P1376 wd:Q183 ; prov:wasDerivedFrom wdref:ba76a7c0f885fa85b10368696ab4ac89680aa073 . wdref:ba76a7c0f885fa85b10368696ab4ac89680aa073 a wikibase:Reference ; pr:P248 wd:Q451546 ; pr:P958 "Artikel 2 (1)" . This RDF representation can be found at <https://www.wikidata.org/wiki/Special:EntityData/Q64.ttl>. Content negotiation will take you there from the canonical URI, <https://www.wikidata.org/entity/Q64.ttl> In addition to the actual value, the RDF above also give the rank, and a source reference (nameley, the re-unification treaty). This statement doesn't currently have a qualifier - it should have at least one, stating since when Berlin is the Capital of Germany. That qualifier would be represented as: wds:Q64-43CCD3D6-F52E-4742-B0E3-BCA671B69D2C pq:P580 "1990-10-03T00:00:00Z"^^xsd:dateTime ; The Statement ID, Q64$43CCD3D6-F52E-4742-B0E3-BCA671B69D2C, can be found in the HTML source of the page, encoded as a CSS class. These IDs are not exposed nicely anywhere. But usually, one would look at the RDF representation right away, or at least got from HTML to *all* the RDF. HTH -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] wikibase:directClaim predicate?
Am 18.03.2017 um 19:03 schrieb Bob DuCharme: > What makes a predicate a direct claim predicate? It's a predicate (that's what RDF calls all relationships) that expresses a direct claim (as opposed to a full statement). Direct claims are one of two ways Wikidata Statements are mapped to RDF. In the wikidata query service, each statement is represented twice - once as a full statement, and once as a direct claim. Direct claims represent a "naive projection" of wikidata to RDF: everything that is claimed (by anyone) to be true (under any circumstances) is assumed to be true. So you get triples like meaning "Berlin - capital-of - Germany". Simple to work with, but incomplete: you also get ("Berlin - capital-of - Kingdom of Prussia"), without an easy way to see that one is current and the other is not. To get all the additional context information, you need to look at the full statement representation, which provides a complex structure of value, qualifiers, and source references. The full mapping will use predicates with the "wds" prefix to connect the item (subject) to the structure representing the statement with all its parts. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Label gaps on Wikidata
Am 27.02.2017 um 18:18 schrieb James Heald: > From what Daniel is saying, it seems this may not be possible, because the > template expansion would then depend on the user's preferred language(s), > which > would not be compatible with the template cacheing. > > Is that right? Or is there a way round this? We are currently aiming for a compromise: we render the page with the user's interface language as the target language, and apply fallback accordingly. We do not take into account secondary user languages, as defined e.g. by the Babel or Translate extensions. This means a user with the UI language set to French will see French if available, but will not see Spanish, even if they somehow declared that they also speak Spanish. This way, we split the parser cache once per UI language - a factor of 300, but not the exponential explosion we would get if we would split on every possible permutation of languages (does anyone want to compute 300 factorial?). -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Label gaps on Wikidata
Am 27.02.2017 um 17:01 schrieb James Hare: > One option is to allow users to define their own ranked preferences for > language > beyond just first place. (I personally would enjoy having French as a fallback > to English.) That would badly fragment the parser cache. I don't think it's viable. > This has the downside of only really working for people with > accounts, which I suspect might be a minority of overall traffic. Currently, we only support English for anon visiors (yes, this is very sad; the reason is, again, caching - varnish, this time). -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Label gaps on Wikidata
Am 19.02.2017 um 17:00 schrieb Romaine Wiki: > Hi all, > > If you look in the recent changes, most items have labels in English and those > are shown in the recent changes and elsewhere (so we know what the item is > about > without opening first). Wikidata actually tries to show you the labels in your üpreferred interface language. And if you user language is not available, it uses a fallback mechanism to show the next-best language, which may even include automated transciptions. When all else fails, it will show the English label. If that doesn't exist, it shows the ID. > But not all items have labels, and these items without > English label are often items with only a label in Chinese, Arabic, Cyrillic > script, Hebrew, etc. This forms a significant gap. The fallback mechanism works OK, but is not great for English speaking users who see a lot of items that have no English label. For English, we just don't know what to fall back to. Just anything? Or try european languages first? What should the rule be? If we can decide on a good rule, it should actualyl be pretty simple to add such fallback for English. > Is there a way to easily make a transcription from one language to another? We have such rules for some languages/variants, e.g. between the cyrillic and the roman representations of Kazakh or Uzbek. But translitteration rules can be complex, and covering every permutation of the 300 languages we support would mean we'd need about 45000 rule sets... > Or alternatively if there is a database that has such transcriptions? Not yet. One of the goals of Wikidata is to be that database. -- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Full Text Search in Query Service
Am 17.02.2017 um 22:02 schrieb James Heald: > Quick question on this Stas: > > * Why do the suggestions that come up when typing in the search box seem so > much > more on-point (ie better at presenting the most likely option first) than the > ones that come up in the results list? The reason is that the "search box" on wikidata.org is fake: it is not the search box you see on wikipedia, it does not use the search infrastructure that Special:Search uses (Cirrus). It uses a custom API module (wbsearchentities) which relies on a custom database table (wb_terms). We need this because Cirrus did not have suppor for structured data or multilingual fields. That is changing now, and we want to use Cirrus for everything. But until then, wikidata is using two completely different search mechanisms, both of which work well for some things, and really badly for others. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikidata ontology
Am 09.01.2017 um 11:16 schrieb Peter F. Patel-Schneider: > Although there is no formal problem here, care does have to be taken when > modelling entities that are to be considered as both classes and non-classes > (or, and especially, metaclasses and non-metaclass classes). It is all too > easy for even experienced modellers to make mistakes. The problem is worse > when the modelling formalism is weak (as the Wikidata formalism is) and thus > does not itself provide much support to detect mistakes. The problem is even > worse when the modelling methodology often does not provide much description > of the entities (as is the case in Wikidata). That's what I meant by "problematic". I did not mean to say it's *wrong* per se. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikidata ontology
Am 09.01.2017 um 04:36 schrieb Markus Kroetzsch: > Only the "current king of Iberia" is a single person, but Wikidata is about > all > of history, so there are many such kings. The office of "King of Iberia" is > still singular (it is a singular class) and it can have its own properties > etc. > I would therefore say (without having checked the page): > > King of Iberiainstance of office > King of Iberiasubclass of king To be semantically strict, you would need to have two separate items, one for the office, and one for the class. Because the individual kinds have not been instances of the office - they have been holders of the office. And they have been instances of the class, but not holders of the class. On wikidata, we often conflate these things for sake of simplicity. But when you try to write queries, this does not make things simpler, it makes it harder. Anything that is a subclass of X, and at the same an instance of Y, where Y is not "class", is problematic. I think this is the root of the confusion Gerards speaks of. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikidata ontology
Am 04.01.2017 um 11:00 schrieb Léa Lacroix: > Hello, > > You can find it here: http://wikiba.se/ontology-1.0.owl > > If you have questions regarding the ontology, feel free to ask. Please note that this is the *wikibase* ontology, which thefines the meta-model for the information on Wikidata. It defines models statements, sitelinks, source references, etc. This ontology does not model "real world" concepts or properties like location or color or children, etc. Modeling on this level is done on Wikidata itself, there is no fixed RDF or OWL schema or ontology. The best you can get in terms of "downloading the wikidata ontology" would be to download all properties and all the items representing classes. We currently don't have a separate dump for these. Also, do not expect this to be a concise or consistent model that can be used for reasoning. You are bound to find contradictions and lose ends. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikidata Redirects in dumps
Am 12.12.2016 um 20:53 schrieb Praveen Balaji: > When using JSON dumps, how can I tell a redirected entity from the JSON dumps If you look at the ID of the entity you get when you ask for <https://www.wikidata.org/wiki/Special:EntityData/Q6703218.json>, you will notice that it does not have the ID you requested. This way, you know that you have been redirected. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?
Am 28.11.2016 um 17:34 schrieb gnosygnu: >> The datatype is implicit, it can be derived from the property ID. You can >> find >> it by looking at the Property page's JSON. >> ... > > Thanks for all the info. I see my error. I didn't realize that > mainsnak.datatype was inferred. I assumed it would have to be embedded > directly in the XML's JSON (partly because it is embedded directly in > the JSON's dump JSON) > > The rest of your points make sense. Thanks again for taking the time to > clarify. If you have problems accessing the datatype from Lua or elsewhere, let me know. There may be issues with the import process. It's always cool to see that people use our data and our software! -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?
Am 28.11.2016 um 16:31 schrieb gnosygnu: >> If you are also using the same software (Wikibase on MediaWiki), the XML >> dumps >> should Just Work (tm). The idea of the XML dumps is that the "text" blobs are >> opaque to 3rd parties, but will continue to work with future versions of >> MediaWiki & friends (with a compatible configuration - which is rather >> tricky). > > Not sure I follow. Even from a Wikibase on MediaWiki perspective, the > XML dumps are still incomplete (since they're missing > mainsnak.datatype). The datatype is implicit, it can be derived from the property ID. You can find it by looking at the Property page's JSON. The XML dumps are complete by definition, since they contain a raw copy of the primary data blob. All other data is derived from this. However, since they are "raw", they are not easy to process by consumers, and we make no guarantees regarding the raw data format. We include the data type in the statements of the canonical JSON dumps for convenience. We are planning to add more things to the JSON output for convenience. That does not make the XML dumps incomplete. You use case is special since you want canonical JSON *and* wikitext. I'm afraid you will have to process both kinds of dumps. > One line of the file specifically checks for datatype: "if datatype > and datatype == 'commonsMedia' then". This line always evaluates to > false, even though you are looking at an entity (Q38: Italy) and > property (P41: flag image) which does have a datatype for > "commonsMedia" (since the XML dump does not have "mainsnak.datatype"). That is incorrect. datatype will always be set in Lua, even if it is not present in the XML. Remember that it is not present in the primary blob on Wikidata either. Wikibase will look it up internally, from the wb_property_info table, and make that information available to Lua. When loading the XML file, a lot of secondary information is extracted into database tables for this kind of use, e.g. all the labels and descriptions go into the wb_terms table, property types go into wb_property_info, links to other items go to page_links, etc. Actually, you may have to run refreshLinks.php or rebuildall.php after doing the XML import, I'm not sure which is needed when any more. But the point is: the XML dump contains all information needed to reconstruct the content. This is true for wikitext as well as for Wikibase JSON data. All derived information is extracted upon import, and is made available via the respective APIs, including Lua, just like on Wikidata. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?
Am 27.11.2016 um 01:15 schrieb gnosygnu: > This is useful, but unfortunately it won't suffice. Wikidata also has > pages which are wikitext (for example, > https://www.wikidata.org/wiki/Wikidata:WikiProject_Names). These > wikitext pages are in the XML dumps, but aren't in the stub dumps nor > the JSON dumps. I actually do use these Wikidata wikitext entries to > try to reproduce Wikidata in its entirety. If you are also using the same software (Wikibase on MediaWiki), the XML dumps should Just Work (tm). The idea of the XML dumps is that the "text" blobs are opaque to 3rd parties, but will continue to work with future versions of MediaWiki & friends (with a compatible configuration - which is rather tricky). -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] [wikicite-discuss] Entity tagging and fact extraction (from a scholarly publisher perspective)
Am 18.11.2016 um 22:12 schrieb Ruben Verborgh: > In case you consider scenarios where clients perform federation, > you might be interested to see that lightweight interfaces > can outperform full SPARQL interfaces: > http://linkeddatafragments.org/publications/jws2016.pdf#page=26 We are indeed planning to experiment with LDF, see https://phabricator.wikimedia.org/T136358 -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Determining Wikidata Usage in Wikipedia Pages
Am 26.11.2016 um 23:33 schrieb Andrew Hall: > 1. In the “Wikidata entities used in this page” section, are the entities > used > dependent on, for example, the logic of the templates through which they > are > referenced? If entities are listed in this section, are they for sure > always > coming from Wikidata? Yes, *any* use is tracked and recorded, including accessing some part of the entity from a conditional somewhere in the Lua code. And all entities come from Wikidata -- we don't have any other Wikibase repo yet, and when we do, usage will be tracked separately for that. > 2. Sometimes “other (statements)” is specified in the “Wikidata entities used > in this page” section. Is it possible to determine what those statements > are? No, that information is not recorded. There is no way to find out without tracing all templates, parameters, and Lua code. We may start tracking this in the future, but it's a lot of data. I'm sure we had a ticket for changiong this, but couldn't find it, so I made a new one: https://phabricator.wikimedia.org/T151717 -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?
Hi gnosygnu! The JSON in the XML dumps is the raw contents of the storage backend. It can't be changed retroactively, and re-encoding everything on the fly would be too expensive. Also, the JSON embedded in the XML files is not officially supported as a stable interface of Wikibase. The JSON format in the XML files can change without notice, and you may encounter different representations even within the same dump. I recommend to use the JSON dumps, they contain our data in canonical form. To avoid downloading redundant information, you can use one of the wikidatawiki-20161120-stub-* dumps instead of the full page dumps. These don't contain the actual page content, just meta-data. Caveat: there is currently no dump that contains the JSON of old revisions of entities in canonical form. You can only get them individually from Special:EntityData, e.g. <https://www.wikidata.org/wiki/Special:EntityData/Q23.json?oldid=30279> HTH -- daniel Am 26.11.2016 um 02:13 schrieb gnosygnu: > Hi everyone. I have a question about the Wikidata xml dump, but I'm > posting this question here, because it looks more related to Wikidata. > > In short, it seems that the "pages-articles.xml" does not include the > datatype property for snaks. For example, the xml dump does not list a > datatype for Q38 (Italy) and P41 (flag image). In contrast, the json > dump does list a datatype of "commonsMedia". > > Can this datatype property be included in future xml dumps? The > alternative would be to download two large and redundant dumps (xml > and json) in order to reconstruct a Wikidata instance. > > More information is provided below the break. Let me know if you need > anything else. > > Thanks. > > > > Here's an excerpt from the xml data dump for Q38 (Italy) and P41 (flag > image). Notice that there is no "datatype" property > // > https://dumps.wikimedia.org/wikidatawiki/20161120/wikidatawiki-20161120-pages-articles.xml.bz2 > "mainsnak": { > "snaktype": "value", > "property": "P41", > "hash": "a3bd1e026c51f5e0bdf30b2323a7a1fb913c9863", > "datavalue": { > "value": "Flag of Italy.svg", > "type": "string" > } > }, > > Meanwhile, the API and the JSON dump lists a datatype property of > "commonsMedia": > // https://www.wikidata.org/w/api.php?action=wbgetentities&ids=q38 > // > https://dumps.wikimedia.org/wikidatawiki/entities/20161114/wikidata-20161114-all.json.bz2 > "P41": [{ > "mainsnak": { > "snaktype": "value", > "property": "P41", > "datavalue": { > "value": "Flag of Italy.svg", > "type": "string" > }, > "datatype": "commonsMedia" > }, > > As far as I can tell, the Turtle (ttl) dump does not list a datatype > property either, but this may be because I don't understand its > format. > wd:Q38 p:P41 wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D . > wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D a wikibase:Statement, > wikibase:BestRank ; > wikibase:rank wikibase:NormalRank ; > ps:P41 > <http://commons.wikimedia.org/wiki/Special:FilePath/Flag%20of%20Italy.svg> > ; > pq:P580 "1946-06-19T00:00:00Z"^^xsd:dateTime ; > pqv:P580 wdv:204e90b1bce9f96d6d4ff632a8da0ecc . > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Determining Wikidata Usage in Wikipedia Pages
Am 23.11.2016 um 21:33 schrieb Andrew Hall: > Hi, > > I’m a PhD student/researcher at the University of Minnesota who (along with > Max > Klein and another grad student/researcher) has been interested in > understanding > the extent to which Wikidata is used in (English, for now) Wikipedia. > > There seems to be no easy way to determine Wikidata usage in Wikipedia pages > so > I’ll describe two approaches we’ve considered as our best attempts at solving > this problem. I’ll also describe shortcomings of each approach. There is two pretty easy ways, which you may not have found because they were added only a couple of months ago: You can look at the "page information" (action=info, linked from the sidebar), e.g. <https://en.wikipedia.org/w/index.php?title=South_Pole_Telescope&action=info>. Near the bottom you can find "Wikidata entities used in this page". The same information is available via an API module, <https://en.wikipedia.org/w/api.php?action=query&prop=wbentityusage&titles=South_Pole_Telescope>. See <https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bwbentityusage> for documentation. These URLs will list all direct and indirect usages, and also indicate what part or aspect of the entity was used. HTH -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
[Wikidata] BREAKING CHANGE: Quantity Bounds Become Optional
Hi all! This is an announcement for a breaking change to the Wikidata API, JSON and RDF binding, to go live on 2016-11-15. It affects all clients that process quantity values. As Lydia explained in the mail she just sent to the Wikidata list, we have been working on improving our handling of quantity values. In particular, we are making upper- and lower bounds optional: When the uncertainty of a quantity measurement is not explicitly known, we no longer require the bounds to somehow be specified anyway, but allow them to be omitted. This means that the upperBound and lowerBound fields of quantity values become optional in all API input and output, as well as the JSON dumps and the RDF mapping. Clients that import quantities should now omit the bounds if they do not have explicit information on the uncertainty of a quantity value. Clients that process quantity values must be prepared to process such values without any upper and lower bound set. That is, instead of this "datavalue":{ "value":{ "amount":"+700", "unit":"1", "upperBound":"+710", "lowerBound":"+690" }, "type":"quantity" }, clients may now also encounter this: "datavalue":{ "value":{ "amount":"+700", "unit":"1" }, "type":"quantity" }, The intended semantics is that the uncertainty is unspecified if not bounds are present in the XML, JSON or RDF representation. If they are given, the interpretation is as before. For more information, see the JSON model documentation [1]. Note that quantity bounds have been marked as optional in the documentation since August. The RDF mapping spec [2] has been adjusted accordingly. This change is scheduled for deployment on November 15. Please let us know if you have any comments or objections. -- daniel [1] https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON [2] https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Quantity Relevant tickets: * <https://phabricator.wikimedia.org/T115269> Relevant patches: * <https://gerrit.wikimedia.org/r/#/c/302248> * <https://github.com/DataValues/Number/commit/2e126eee1c0067c6c0f35b4fae0388ff11725307> -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Stable Interface Policy: Database Schema as a stable API
I have updated the Stable Interface Policy according to the discusstion at <https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Database_Schema_as_a_stable_API> The diff is here: <https://www.wikidata.org/w/index.php?title=Wikidata%3AStable_Interface_Policy&type=revision&diff=400924854&oldid=382163118> -- daniel Am 28.10.2016 um 17:59 schrieb Daniel Kinzler: > Hi all! > > I plan to add the wikibase (SQL) database schema as a stable interface. > > Typically, a database schema is considered internal, but since we have tools > on > labs that may rely on the current schema, breaking changes to the schema > should > be announced as such. To address this, I plan to add the following paragraph > to > the Stable Public APIs section: > > The database schema as exposed on Wikimedia Labs is considered a stable > interface. Changes to the available tables and fields are subject to the > above notification policy. > > In addition, I plan to add the following paragraph to the Extensibility > section: > > In a tabular data representation, such as a relational database schema, > the > addition of fields is not considered a breaking change. Any change to the > interpretation of a field, as well as the removal of fields, are > considered > breaking. Changes to existing unique indexes or primary keys are breaking > changes; changes to other indexes as well as the addition of new unique > indexes are not breaking changes. > > If you have any thoughts ob objections, please let me know at > <https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Database_Schema_as_a_stable_API> > -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
[Wikidata] Stable Interface Policy: Database Schema as a stable API
Hi all! I plan to add the wikibase (SQL) database schema as a stable interface. Typically, a database schema is considered internal, but since we have tools on labs that may rely on the current schema, breaking changes to the schema should be announced as such. To address this, I plan to add the following paragraph to the Stable Public APIs section: The database schema as exposed on Wikimedia Labs is considered a stable interface. Changes to the available tables and fields are subject to the above notification policy. In addition, I plan to add the following paragraph to the Extensibility section: In a tabular data representation, such as a relational database schema, the addition of fields is not considered a breaking change. Any change to the interpretation of a field, as well as the removal of fields, are considered breaking. Changes to existing unique indexes or primary keys are breaking changes; changes to other indexes as well as the addition of new unique indexes are not breaking changes. If you have any thoughts ob objections, please let me know at <https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Database_Schema_as_a_stable_API> -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Acquiring general knowledge from Wikidata
Am 25.10.2016 um 17:27 schrieb Federico Leva (Nemo): > As far as I know, an axiom by definition can't be false. What definition are > you > using? Maybe some jargon specific to this research field? An axiom is always true in the context of the formal model it helps define. But if that model corresponds to something in the real world, the axium may well found to be "false" when applied there. Say you have an axiom that says "all humans are born with two legs"; this is then (by definition) true in your model, but may not be an accurate modelling of the real world, since very rarely, humans are born with more or less than two legs. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Greater than 400 char limit for Wikidata string data types
That was discussed and declined a while ago, see <https://phabricator.wikimedia.org/T126862>. Though I think the proposed realization was presentational rather than functional. I'll have to re-read the discussion, though. Am 08.10.2016 um 12:07 schrieb Thomas Douillard: > Probably a silly question but ... did you all consider creating a datatype for > molecue representation ? This seem to be a very similar usecase than > mathematica > formula. Essentially we're not dealing with a raw string but a representation > of > molecule formulas, with its own encoding ... > > Changing the limit seem to be a poor workaround to a dedicated datatype - > nobody > seems to have found a relevant usecase and it seem to me that we're > essentially > abusing strings for storing blobs ... > > 2016-10-08 11:33 GMT+02:00 Egon Willighagen <mailto:egon.willigha...@gmail.com>>: > > > > On Sat, Oct 8, 2016 at 11:28 AM, Lydia Pintscher > mailto:lydia.pintsc...@wikimedia.de>> > wrote: > > On Sat, Oct 8, 2016 at 11:23 AM, Egon Willighagen > mailto:egon.willigha...@gmail.com>> > wrote: > > Ah, those numbers are for > https://www.wikidata.org/wiki/Property:P234 > <https://www.wikidata.org/wiki/Property:P234> ... > > External identifier then. Cool. And for string like in > https://www.wikidata.org/wiki/Property:P233 > <https://www.wikidata.org/wiki/Property:P233>? Sebastian's initial > email > > says 1500 to 2000. Is this still a good number after this discussion? > > > Yes, that would cover more than 99.9% of all InChIs in PubChem. (See > Sebastian's reply earlier in this thread.) > > Egon > > -- > E.L. Willighagen > Department of Bioinformatics - BiGCaT > Maastricht University (http://www.bigcat.unimaas.nl/) > Homepage: http://egonw.github.com/ > LinkedIn: http://se.linkedin.com/in/egonw > <http://se.linkedin.com/in/egonw> > Blog: http://chem-bla-ics.blogspot.com/ > <http://chem-bla-ics.blogspot.com/> > PubList: http://www.citeulike.org/user/egonw/tag/papers > <http://www.citeulike.org/user/egonw/tag/papers> > ORCID: -0001-7542-0286 > ImpactStory: https://impactstory.org/u/egonwillighagen > <https://impactstory.org/u/egonwillighagen> > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> > https://lists.wikimedia.org/mailman/listinfo/wikidata > <https://lists.wikimedia.org/mailman/listinfo/wikidata> > > > > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Elevation
Am 28.09.2016 um 14:13 schrieb Markus Bärlocher: > Da es um grundlegende Modellierungsfragen geht - wer kann hier helfen? "Die Community"... > Ich brauche ein System, um in WD geografische Höhen zu modellieren. > > Eine geografische Höhenangabe besteht aus: > 1. Zahl (127,53) > 2. Einheit (Meter, feet) > 3. Höhenreferenzebene (NN, NHN, LAT, MSL, MHWS, ...) > > Wenn eine der drei Angaben fehlt, ist die Aussage unbrauchbar. Die Referenzebene kann wie gersagt als Qualifier angegeben werden. Es wäre sinnvoll, die Property "Elevation over sea level" entsprechend umzudefinieren oder zu ersetzen. Eine andere Lösung fällt mir nicht ein. Es sei denn, es geht um "Lichte Höhe", dann kannst du P2793 benutzen. Du brauchst aber immernoch eine Property für "Reference level". Ich glaube, die gibt es noch nicht. > Sinnvoll wäre zusatzlich eine Angabe zu: > 4. Genauigkeit > > Verstehe ich Dich richtig? > Du schlägst vor, die Genauigkeit hinter die Zahl zu schreiben? > und beides in einen String zusammenzuführen? > also 1., 2. und 4. in ein Feld zu packen? > > Beispiel: 123,53±0,005m Ja, genau so. Oder so ähnlich - bei der Eingabe muss die Einheit momentan noch separat ausgewählt werden. > Dann müsste man jede Zahl erst auseinanderdröseln > um sie in einer Tabelle darstellen und numerisch sortieren zu können? Nein, das ist ja kein Text-Feld. Wert, Genauigkeit, und Einheit werden separate gespeichert, dafür haben wir "data types." Details findest du hier: <https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#quantity> und hier <https://www.mediawiki.org/wiki/Wikibase/DataModel#Quantities>. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Elevation
Am 27.09.2016 um 23:14 schrieb Info WorldUniversity: > Hi Daniel, Markus and Wikidatans, > > Thanks for your interesting "modeling elevation with Wikidata" conversation. > > Daniel, in a related vein and conceptually, how would you model elevation > change > over time (e.g. in a Google Street View/Maps/Earth with TIME SLIDER, > conceptually, for example) with Wikidata, building on the example you've > already > shared? You would use the "point in time" qualifier. We use this a lot with population data, see for instance <https://www.wikidata.org/wiki/Q64#P1082>. > Would there be a wikidata Q-item for all 46 sub levels, for example? That's a question of desirable modelling granularity. I would suppose that for troy, we would have one item per sub-level, since it's such a famous site. But we would probably not have every sub-level of every archeological excavation. This is always a question of balance, and always a matter of debate. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Elevation
Am 27.09.2016 um 22:21 schrieb Markus Bärlocher: >> The "elevation" property we have (P2044) is defined to refer to NN > > It is not a good idea, to define 'elevation' > like it is "defined" in P2044: > there are hundreds of reference-levels (not only NN)... Yes, I agree. But that's how it currently is. You can start a discussion about it on the property's talk page, or on the project chat page, or some other appropriate place. >> Then you could express something like "elevation: 28.3m; > > In WD there is a confusion between altitude and elevation? > (may be in American and British English? > or geographic and aviation and astronomy?) As far as I know, WD uses "altitude" only as an alias of "elevation". I'm not a native speaker of English, but as far as I know, you can use "altitude" as well as "elevation" when describing a geographical point. The definitions of the corresponding items (Q190200 and Q2633778) reflect that, and so do the definitions in Merriam Webster. However, "elevation" seems to be used only for fixed places - a plane has altitude, not elevation - so that's a reason not to merge the two items. However, if I understood correctly, what you are looking for is actually not elevation, but "clearance" ("Lichte Höhe"): <https://www.wikidata.org/wiki/Q1823312>. Interestingly, there is also Q2446632... Oh, we actually do have a property for that! P2793 is the "distance between surface and bottom of a bridge deck". That's exactly what you need, no? > But this is a combination of unit and reference-level: > 'm ü.M.' > > We should not mix or confound this modellings... > > What will be the WD-way? > (you should discuss this with a geodetic specialist...!) Indeed :) And a civil engineer. But for starters, maybe Aude has some thoughts on this. > > Additionally we need an expression for 'accuracy' and 'source': > If the hight unit is 'meter' and the source value is in 'feet', > the new value could have a lot more/less digits than the source, > but no better/worse accuracy... Sources can be given for any statement. Accuracy can be qiven for any quantity value, just enter 32+-2m. If the source gives the number in feet, please enter it in feet in Wikidata, and leave the conversion to the software (we are just in the process of adding support for unit conversion) HTH, Daniel PS: I'm a software guy. I know how Wikibase and MediaWiki work, and I know the underlying data model of Wikidata quite well. But I do not know all the properties and conventions, and I may not be aware of the best place to discuss these things. So please don't rely on my opinion about modeling on Wikidata too much. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Elevation
So you want to e.g. give the height of a bridge above the "mean high water spring" level of the river it crosses? You wouldn't use a unit for that, but a qualifier. The unit would be meter (or feet or whatever). The "elevation" property we have (P2044) is defined to refer to NN, so it's no good for your purpose. To model what you want nicely, you would need a more general "elevation" property, and a "reference level" property to use as a qualifier. Then you could express something like "elevation: 28.3m; reference-level: Q6803625". I'm sure there are other options, but I see no good option that would be possible with the properties I know. Anyway, this is really a modelling question, and it can't really be solved with units. Am 27.09.2016 um 20:26 schrieb Markus Bärlocher: > Hallo Daniel, > > nein, ich suche nicht einen WP-Artikel über MHWS, > (diesen habe ich nur verlinkt als Erklärung) > > sondern eine Einheit/unit, > um MHWS als Bezugshorizont für geografische Höhen zu beschreiben. > > MHWS wird verwendet, um Brückendurchfahrtshöhen über Wasser zu > definieren, sowie für die geografische Höhe von Leuchtfeuern. > > Mit herzlichem Gruss, > Markus > > > Am 27.09.2016 um 19:28 schrieb Daniel Kinzler: >> Am 27.09.2016 um 19:10 schrieb Markus Bärlocher: >>> I look for this: >>> "Elevation in metres above 'mean high water spring' level." >>> >>> Which means the geographic hight above MHWS: >>> https://en.wikipedia.org/wiki/Mean_high_water_spring >> >> By clicking on "Wikidata Item" in the sidebar of that page, I get to >> https://www.wikidata.org/wiki/Q6803625 ("highest level that spring tides >> reach >> on average over a period of time") >> >> Is that what you need? >> > > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Elevation
Am 27.09.2016 um 19:10 schrieb Markus Bärlocher: > I look for this: > "Elevation in metres above 'mean high water spring' level." > > Which means the geographic hight above MHWS: > https://en.wikipedia.org/wiki/Mean_high_water_spring By clicking on "Wikidata Item" in the sidebar of that page, I get to https://www.wikidata.org/wiki/Q6803625 ("highest level that spring tides reach on average over a period of time") Is that what you need? -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Let's move forward with support for Wiktionary
Am 21.09.2016 um 19:23 schrieb Eric Scott: > A substantial amount of work in the LOD community seems to have gone into > Ontolex: > > https://www.w3.org/community/ontolex/wiki/Final_Model_Specification > > Is there any concern with aligning WD's model to this standard? Thanks for pointing to this! From a first look, the models seem to roughly align: What we call a "Lexeme" corresponds to a "Lexical Entry" in ontolex. What we call a "Form" corresponds to a "Form" in ontolex. What we call a "Sense" corresponds to a "Lexical Sense & Reference" in ontolex, although in ontolex, a reference to a Concept is required, while in our model that reference would be optional, but a natural language gloss is required. So the models seem to match fine on a conceptual level. Perhaps someone with more expertise in RDF modeling can provide a more detailed analysis. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Let's move forward with support for Wiktionary
Am 16.09.2016 um 20:46 schrieb Thad Guidry: > Daniel, > > I wasn't trying to help solve the issues - I'll be quite now :) > > I was helping to expose one of your test cases :) Ha, sorry for sounding harsh, and thanks for pointing me to "product"! It's a good test case indeed. > 'product' is a lexeme - a headword - a basic unit of meaning that has a 'set > of > forms' and those have 'a set of definitions' In the current model, a Lexeme has forms and senses. Forms don't have senses directly, the meanings should apply to all forms. This means lexemes have to be split with higher granularity: * product (English noun) would be one lexeme, with "products" being the plural form, and "product's" the genitive, and "products'" the plural genitive. Sense include the ones you mentioned. * (to) produce (English verb) would be another lexeme, with forms like "produces", "produced", "producing", etc, and senses meaning "to create", "to show", "to make available", etc * production (English noun) would be another lexeme, with other forms and senses. * produce (English noun) would be another * producer (English noun) would be another * produced (English adjective) another etc... These lexemes can be linked using some kind of "derived from" statements. > But a thought just occured to me... > A. In order to model this perhaps would be to have those headwords stored in > Wikidata. Those headwords ideally would not actually be a Q or a P ... but > what > about instead ... L ? Wrapping the graph structure itself ? Pros / Cons ? That's the plan, yes: Have lexemes (L...) on wikidata, which wrap the structure of forms and senses, and has statements for the lexeme, as well as for each form and each sense. We don't currently plan a "super-structure" for wrapping derived/related lexemse (product, produce, production, etc). They would just be inter-linked by statements. > B. or do we go with Daniel's suggestion of linking out to headwords and not > actually storing them in Wikidata ? Pros / Cons ? The link I suggest is between items (Q...) and lexemes (L...), both on Wikidata. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Let's move forward with support for Wiktionary
Quick clarification: Am 15.09.2016 um 17:40 schrieb Jan Berkel: > The pdf mentions 4 new entity types: Lexeme, Statement, Form, Embedded (?). "Embedded" isn't a separate type, it refers to the fact that Senses and Forms are stored on the same page as "their" Lexeme. "Statement" isn't an entity, I assume you meant to write "Sense". > Curious, was the existing data model not flexible enough? It was not expressive enough, no; it would be possible to use items to model lexemes, but it would be very annoying and complicated. You would need separate items for each form and sense, and need to keep track of them for deletion, undeletion, etc. > Will these new entities be restricted to the usage in a lexicographical > context, > i.e. Wiktionary? It will also be accessible from Wikipedia and other wikis. > How will they fit into the existing data model, will there be > links from existing Wikidata items to the new entities? (i.e. how will > Wikidata > benefit from the new data?) Yes, there will be cross-linking. > I imagine in an integrated Wikidata/Wiktionary world "content" and code lives > in > various places, and we'll have a range of automated processes to copy things > back and forth, and to automatically create new entries derived from existing > ones? It would be transcluded and generated, like with templates and and Lua. Not so much copied, with bots. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Let's move forward with support for Wiktionary
Am 16.09.2016 um 20:11 schrieb Thad Guidry: > Denny, > > I would suggest to use https://en.wiktionary.org/wiki/product as that strawman > proposal. Because it has 2 levels of Senses. > 3. Anything that is produced (contains 6 sub-senses) Modelling sub-senses is a completely different can of worms. The proposed model doesn't allow this directly (we try to avoid recursive structures), but it can be done using statements. Your example doesn't really say anything about how lexemes could be connected to items as labels/aliases, which is, i believe, what Gerard and Denny were discussing. My usage of "Sense" and "From" follows <https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2013-08> which in turn follows the LEMON model <http://lemon-model.net/>. Synsets are not directly modeled, but it's possible to construct them via statements. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Greater than 400 char limit for Wikidata string data types
Am 16.09.2016 um 19:38 schrieb Denny Vrandečić: > Markus' description of the decision for the limit corresponds with mine. I > also > think that this decision can be revisited. I would still advice for caution, > due > to technical issues, but I am sure that the development team will make a > well-informed decision on this. It would be sad if valid usecases could not be > supported due to that. I agree, but re-considering this will have to wait until we have a better solution for storing terms. The current mechanism, the wb_terms table, is a massive performance bottleneck, and stuffing more data in there makes me very uncomfortable. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Let's move forward with support for Wiktionary
Am 16.09.2016 um 19:41 schrieb Denny Vrandečić: > Yes, there should be some connection between items and lexemes, but I am still > hazy about details on how exactly this should look like. If someone could > actually make a strawman proposal, that would be great. > > I think the connection should live in the statement space, and not be on the > level of labels, but that is just a hunch. I'd be happy to see proposals > incoming. My thinking is this: On some Sense of a Lexeme, there is a Statement saying that this Sense refers to a given concept (Item). If the property for stating this is well-known, we can track the Sense-to-Item relationship in the database. We can then automatically show the lexeme's lemma as a (pseudo-)alias on the Item, and perhaps also use it (and maybe all forms of the lexeme!) for indexing the item for search. So: from ( Lexeme - Sense - Statement -> Item ) we can derive ( Item -> Lexeme - Forms ) In the beginning of Wikidata, I was very reluctant about the software knowing about "magic" properties. Now I feel better about this, since wikidata properties are established as a permanent vocabulary that can be used by any software, including our own. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
[Wikidata] Stable interfaces policy updated
The stable interface policy has now been updated, see <https://www.wikidata.org/w/index.php?title=Wikidata%3AStable_Interface_Policy&type=revision&diff=376348194&oldid=369006368> Am 13.09.2016 um 16:58 schrieb Daniel Kinzler: > Tomorrow I plan to apply the following update to the Stable Interface Policy: > > https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Proposed_change_to_to_the_.22Extensibility.22_section > > Please comment there if you have any objections. > > Thanks! -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Let's move forward with support for Wiktionary
Am 14.09.2016 um 10:51 schrieb Léa Lacroix: > /- What else can provide wikidata to wiktionary?/ > Machine-readable data will allow users to create new tools, useful for > editors, > based on the communities' needs. By helping the different communities > (Wiktionaries and Wikidata) working together on the same project, we expect a > growth of the number of people editing the lexicographical data, providing > more > review and a better quality of the data. Finally, when centralized and > structured, the data will be easily reusable by third parties, other websites > or > applications... and give a better visibility of the volunteers' work. Here are some examples of things that will become possible with the new structure: * the fact that the English word "sleeper" may refer to a railway tie, and in which regions this is the case, only has to be entered once, not separately in each Wiktionary. * the fact that "Stuhl" is the German translation of (a specific sense of) the English word "chair" only has to be entered once, not separately in each Wiktionary. * by connecting lexeme-sense to concepts (items), it will become possible to automatically search for potential synonyms and translations to other languages. * by providing a statement defining the morphological class of a lexeme, it becomes possible to automatically generate derived forms for display and search * different representations (spellings, scripts) of a lexeme can be covered by a single entry, information about word senses does not have to be repeated. * the search interface will know about languages and word types, so you can search specifically for "french verb dormir" (or perhaps more technical "lang:fr a:Q24905 dormir") * Similarly, you can search for or filter by epoch, region, linguistic convention or methodology, etc. > - Will editing wiktionary change? > Yes, changes will happen, but we're working on making editing Wiktionary > easier. Soon as we can provide some mockups, we will share them with you for > collecting feedbacks. The question is if you consider editing wikitext with complex nested templates "easy" or not. With wikidata, editing would be form-based, with input fields and suggestions. This makes it a lot easier especially for new editors. And even for experienced editors, I think it's more convenient for editing individual bits of information. The form-based approach is less convenient when you want to enter a lot of information at once. The solution is to identify the use cases for this, and provide a specialized interface for that use case. This does not have to depend on Wikibase developers, it can also be done by wiki users using gadgets, Labs-based tools, or even bots. > Because Wikidata is a multilingual project, we already have to deal with the > language issue, and we hope that with the increase of the numbers of editors > coming from Wikidata and Wiktionaries, it will become easier to find people > with > at least one common language to communicate between the different projects. Interestingly, we found that on wikidata there is rarely a conflict about whether a statement about an item should say X or Y, e.g. whether Chelsea Manning's gender should be given as "transgender female" or just "female" or even "male". The conflict does not arise because you can and should simply add all three, and use qualifiers and source references to specify who claimed which of these, and for which period of time. Long discussions do take place about the overall organization of information on wikidata, about which properties to have and how to use them, about whether substances like "ethanol" should be considered subclasses or instance of classes like "alcohol". I agree however that cross-lingual discussions are indeed an issue, and finding techniques and strategies to help with communication between the speakers of different languages will be a challenge. But isn't the Wiktionary community perfectly equipped for just that challenge? Isn't it just the crowd you would ask if you had to solve a problem like this? I would (along perhaps with the folks from translatewiki.net). -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Let's move forward with support for Wiktionary
Am 13.09.2016 um 15:37 schrieb Gerard Meijssen: > Hoi, > You assume that it is not good to have lexicological information in our > existing > items. With Wiktionary support you bring such information on board. It would > be > really awkward when for every concept there has to be an item in two > databases. It will be two namespaces in the same project. But we will not duplicate items. The proposed structure is not concept-centered like Omegawiki is. It will be centered about lexemes, like Wiktionary is, but with a higher level of granularity (a lexeme corresponds to one "morphological" section on a Wiktionary page). > Why is there this problem with lexicologival information and how will the > current data be linked to the future "Wiktionary-data" information if there > are > to be two databases? Because "bumblebee" "noun" conflicts with "bumblebee" "insect". They can't both be true for the same thing, because nouns are not insects. One is true for the word, the other is true for the concept. So they need to be treated separately. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Let's move forward with support for Wiktionary
Am 13.09.2016 um 17:16 schrieb Gerard Meijssen: > Hoi, > The database design for OmegaWiki had a distinction between the concept and > all > the derivatives for them. Wikidata will have Lexemes and their Forms and Senses. > So bumblebee is more complex than just "instance of" noun. It is an English > noun. "Hommel" is connected as a Dutch noun for the same concept and "hommels" > is the Dutch plural... Wikidata would have a Lexeme for "bumblebee" (english noun) and one for "Hommel" (dutch noun). Both would have a sense that would describe them as a flying insect (and perhaps other word senses, such as Q1626135, a creater on the moon). The senses that refer to the flying insect would be considered translations of each other, and both senses would refer to the same concept. So "bumblebee" (insect) is a translation of "Hommel" (insect), and both refer to the genus Bombus (Q25407). "Hommel" (creater) would share the morphology of "Hommel" (insect), as it has the same forms (I assume), but it won't share the translations. Having lexeme-specific word-senses avoids the loss of connotation and nuance that you get when you force words of different languages on a shared meaning. The effect of referring to the same concept can still be achieved via the reference to a concept (item). -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
[Wikidata] Proposed update to the stable interfaces policy
Tomorrow I plan to apply the following update to the Stable Interface Policy: https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Proposed_change_to_to_the_.22Extensibility.22_section Please comment there if you have any objections. Thanks! -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
[Wikidata] Announcing the Wikidata Stable Interface Policy
Hello all! After a brief period for final comments (thanks everyone for your input!), the Stable Interface Policy is now official. You can read it here: <https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy> This policy is intended to give authors of software that accesses Wikidata a guide to what interfaces and formats they can rely on, and which things can change without warning. The policy is a statement of intent given by us, the Wikidata development team, regarding the software running on the site. It does not apply to any content maintained by the Wikidata community. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Breaking change in JSON serialization?
Am 11.08.2016 um 23:12 schrieb Peter F. Patel-Schneider: > Until suitable versioning is part of the Wikidata JSON dump format and > contract, however, I don't think that consumers of the dumps should just > ignore new fields. Full versioning is still in the future, but I'm happy that we are in the process of finalizing a policy on stable interfaces, including a contract regarding adding fields: <https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy>. Please comment on the talk page. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
[Wikidata] Policy on Interface Stability: final feedback wanted
Hello all, repeated discussions about what constitutes a breaking change has prompted us, the Wikidata development team, to draft a policy on interface stability. The policy is intended to clearly define what kind of change will be announced when and where. A draft of the policy can be found at <https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy> Please comment on the talk page. Note that this policy is not about the content of the Wikidata site, it's a commitment by the development team regarding the behavior of the software running on wikidata.org. It is intended as a reference for bot authors, data consumers, and other users of our APIs. We plan to announce this as the development team's official policy on Monday, August 22. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Render sparql queries using the Histropedia timeline engine
Hi Navino! Thank you for your awesome work! Since this has caused some confusion again recently, I want to caution you about a major gotcha regarding dates in RDF and JSON: they use different conventions to represent years BCE. I just updated our JSON spec to reflect that reality, see <https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#time>. There is a lot of confusion about this issue throughout the linked data web, since the convention changed between XSL 1.0 (which uses -0044 to represent 44 BCE, and -0001 to represent 1 BCE) and XSL 1.1 (which uses -0043 to represent 44 BCE, and + to represent 1 BCE). Our JSON uses the traditional numbering (1 BCE is -0001), while RDF uses the astronomical numbering (1 BCE is +). Yay, fun. Am 10.08.2016 um 21:49 schrieb Navino Evans: > Hi all, > > > > At long last, we’re delighted to announce you can now render sparql queries > using the Histropedia timeline engine \o/ > > > Histropedia WikidataQuery Viewer > <http://histropedia.com/showcase/wikidata-viewer.html> -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Discussion on graph databases for WIkipedia: applications, volunteers, and stack design recommendations
I recommend you have a look at the SWEBLE project <http://sweble.org/>, at least for the parsing. They basically represent all of Wikipedia (and potentially all Wikipedias together) as one huge parse tree, using an XML database. The website doesn't have much details, but they are building some interesting projects on top of this. Best contact Dirk Riehle directly, <https://osr.cs.fau.de/people/members/riehle-dirk/>. Am 01.08.2016 um 20:38 schrieb Ian Seyer: > Full disclosure: I am the creator of the Project Grant application > for Arc.heolo.gy <http://arc.heolo.gy/>, located > here: https://meta.wikimedia.org/wiki/Grants:Project/Arc.heolo.gy > > I hope for this to be a general discussion on potential applications, > criticisms, questions, technological recommendations, and community discussion > about a graph representation of Wikipedia. > > Currently, the project has a live Neo4j Graph database built and parsed from a > download of the English language Wikipedia from April. I have temporarily > hosted > the database instance both on my local machine and a SoftLayer server provided > under a temporary entrepreneur credit. > > My goal is two fold. > On the backend: refine the parsing algorithm (I am getting some incorrect > relationships in the database), automate the parsing so that it updates the > database frequently, expand language support, and perform semantic parsing to > weight individual relationships to strengthen the ability to filter out > extraneous relationships. > On the frontend: I have done little to zero work here beyond pure > conceptualization. I would hope to use an asynchronous front-end javascript > framework to build both a 2d (d3) and 3d (webGL) interface to be able to > explore > the database with a high amount of control and ease. > > If any of you would like to access the database for exploration, please > contact > me privately and I will give you credentials. > > Any recommendations on parsing, hosting, visualization, or otherwise are > appreciated. Endorsements and Volunteers are also highly appreciated! > > p.s. I am new to directly engaging with the Wiki community, and if I committed > some faux pas in starting this thread please let me know and I will do my best > to correct it. > -- > ╭╮ > ╭╮┃┃ > ╭╮ ╭╮╭╮ > ┃┃ ╭╮ ┃╰╯╰╯┃┃╰ > ╭╮┃┃╭╮┃┃╭╮┃╰╯ > ╭╮ ┃╰╯┃┃╰╯ > ┃┃╭╮┃╰╯┃┃ ╰╯ > ╮┃╰╯┃┃ ╰╯ > ╰╯ ┃┃ > ╰╯ > > > _______ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wikidata query performance paper
Hi Aidan! Thank you for this very interesting research! Query performance was of course on of the key factors for selecting the technology to use for the query services. However, it was only one among several more. The Wikidata use case is different from most common scenarios in some ways, for instance: * We cannot optimize for specific queries, since users are free to submit any query they like. * The data representation needs to be intuitive enough for (thenically inclined) casual users to grasp and write queries. * The data doesn't hold still, it needs to be updated continuously, mutliple times per second. * Our data types are more complex than usual - for instance, we suppor tmultiple calendar models fro dates, and not only values but also different accuracies up to billions of years; we use "quantities" with unit and uncertainty instead of plain numbers, etc. My point is that, if we had a static data set and a handful of known queries to optimize for, we could have set up a relational or graph database that would be far more performant than what we have now. The big advantage of Blazegraph is its felxibility, not raw performance. It might be interesting to you to know that we initially started to implement the query service against a graph database, Titan - which was discontinued while we were still getting up to speed. Luckily this happened early on, it would have been quite painful to switch after we had gone live. -- daniel Am 06.08.2016 um 18:19 schrieb Aidan Hogan: > Hey all, > > Recently we wrote a paper discussing the query performance for Wikidata, > comparing different possible representations of the knowledge-base in Postgres > (a relational database), Neo4J (a graph database), Virtuoso (a SPARQL > database) > and BlazeGraph (the SPARQL database currently in use) for a set of equivalent > benchmark queries. > > The paper was recently accepted for presentation at the International Semantic > Web Conference (ISWC) 2016. A pre-print is available here: > > http://aidanhogan.com/docs/wikidata-sparql-relational-graph.pdf > > Of course there are some caveats with these results in the sense that perhaps > other engines would perform better on different hardware, or different styles > of > queries: for this reason we tried to use the most general types of queries > possible and tried to test different representations in different engines (we > did not vary the hardware). Also in the discussion of results, we tried to > give > a more general explanation of the trends, highlighting some > strengths/weaknesses > for each engine independently of the particular queries/data. > > I think it's worth a glance for anyone who is interested in the > technology/techniques needed to query Wikidata. > > Cheers, > Aidan > > > P.S., the paper above is a follow-up to a previous work with Markus Krötzsch > that focussed purely on RDF/SPARQL: > > http://aidanhogan.com/docs/reification-wikidata-rdf-sparql.pdf > > (I'm not sure if it was previously mentioned on the list.) > > P.P.S., as someone who's somewhat of an outsider but who's been watching on > for > a few years now, I'd like to congratulate the community for making Wikidata > what > it is today. It's awesome work. Keep going. :) > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Breaking change in JSON serialization?
Am 05.08.2016 um 17:34 schrieb Peter F. Patel-Schneider: > So some additions are breaking changes then. What is a system that consumes > this information supposed to do? If the system doesn't monitor announcements > then it has to assume that any new field can be a breaking change and thus > should not accept data that has any new fields. The only way to avoid breakage is to monitor announcements. The format is not final, so changes can happen (not just additions, but also removals), and then things will break if they are unaware. We tend to be careful and conservative, and announce any breaking changes in advance, but do not guarantee full backwards compatibility forever. The only alternative is a fully versioned interface, which we don't currently have for JSON, though it has been proposed, see <https://phabricator.wikimedia.org/T92961>. > I assume that you are referring to the common practice of adding extra fields > in HTTP and email transport and header structures under the assumption that > these extra fields will just be passed on to downstream systems and then > silently ignored when content is displayed. Indeed. > I view these as special cases > where there is at least an implicit contract that no additional field will > change the meaning of the existing fields and data. In the name of the Robustness Principle, I would consider this the normal case, not the exception. > When such contracts are > in place systems can indeed expect to see additional fields, and are permitted > to ignore these extra fields. Does this count? <https://mail-archive.com/wikidata-tech@lists.wikimedia.org/msg00902.html> > Because XML specifically states that the order of attributes is not > significant. Therefore changes to the order of XML attributes is not changing > the encoding. That's why I'm proposing to formalize the same kind of contract for us, see <https://phabricator.wikimedia.org/T142084>. > Here is where I disagree. As there is no contract that new fields in the > Wikidata JSON dumps are not breaking, clients need to treat all new fields as > potentially breaking and thus should not accept data with unknown fields. While you are correct that there is no formal contract yet, the topic had been explicitly discussed before, in particular with Markus. > I say this for any data, except where there is a contract that such additional > fields are not meaning-changing. Quote me on it: For wikibase serializations, additional fields are not meaning changing. Changes to the format or interpretation of fields will be announced as a breaking change. >> Clients need to be prepared to encounter entity types and data types they >> don't >> know. But they should also allow additional fields in any JSON object. We >> guarantee that extra fields do not impact the interpretation of fields they >> know >> about - unless we have announced and documented a breaking change. > > Is this the contract that is going to be put forward? At some time in the not > too distant future I hope that my company will be using Wikidata information > in its products. This contract is likely to problematic for development > groups, who want some notion how long they have to prepare for changes that > can silently break their products. This is indeed the gist of what I want to establish as a stability policy. Please comment on <https://phabricator.wikimedia.org/T142084>. I'm not sure how this could be made less problematic. Even with a fully versioned JSON interface, available data types etc are a matter of configuration. All we can do is announce such changes, and advise consumers that they can safely ignore unknown things. You raise a valid point about due notice. What do you think would be a good notice period? Two weeks? A month? -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Breaking change in JSON serialization?
Am 05.08.2016 um 15:02 schrieb Peter F. Patel-Schneider: > I side firmly with Markus here. > > Consumers of data generally cannot tell whether the addition of a new field to > a data encoding is a breaking change or not. Without additional information, they cannot know, though for "mix and match" formats like JSON and XML, it's common practice to assume that ignoring additions is harmless. In any case, we had communicated before that we do not consider the addition of a field a breaking change. It only becomes a breaking change when it impacts the interpretation of other fields. In which case we would announce it well in advance. > Given this, code that consumes > encoded data should at least produce warnings when it encounters encodings > that it is not expecting and preferably should refuse to produce output in > such circumstances. Depends on the circumstances. For a web browser for example, this would be very annoying behavior. Nearly all websites would be unusable. Similarly, most email would become unreadable if mail clients would be that strict. > Producers of data thus should signal in advance any > changes to the encoding, even if they know that the changes can be safely > ignored. I disagree on "any". For example, do you want announcements about changes to the order of attributes in XML tags? Why? In case someone uses a regex to process the XML? Should you not be able to rely on your clients conforming the to XML spec, which says that the order of attributes is undefined? In the case at hand (adding a field), it would have been good to communicate it in advance. But since it wasn't tagged as "breaking", it slipped through. We are sorry for that. Clients should still not choke on an addition like this. > I would view software that consumes Wikidata information and silently ignores > fields that it is not expecting as deficient and would counsel against using > such software. Is this just for Wikidata, or does that extend to other kinds of data too? Why, or why not? By definition, any extensible format or protocol (HTTP, SMTP, HTML, XML, XMPP, IRC, etc) can contain parts (headers, elements, attributes) that the client does not know about, and should ignore. Of course, the spec will tell clients where to expect and allow extra bits. That's why I'm planning to put up a document saying clearly what kinds of changes clients should be prepared to see in Wikidata output: Clients need to be prepared to encounter entity types and data types they don't know. But they should also allow additional fields in any JSON object. We guarantee that extra fields do not impact the interpretation of fields they know about - unless we have announced and documented a breaking change. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Breaking change in JSON serialization?
Hi Markus! You are asking use to better communicate changes to our serialization, even if it's not a breaking change according to the spec. I agree we should do that. We are trying to improve our processes to achieve this. Can we ask you in return to try to make your software more robust, by not making unwarranted assumptions about the serialization format? With regards to communicating more - it's very hard to tell which changes might break something for someone. For instance, some software might rely on the order of fields in a JSON object, even though JSON says this is unspecified, just like you rely on no fields being added, even though there is no guarantee about this. Similarly, some software might rely on non-ascii characters being represented as unicode escape sequences, and will break if we use the more compact utf-8. Or they may break on changes whitespace. Who knows. We can not possibly know what kind of change will break some 3rd party software. I don't think announcing any and all changes is feasible. So I think an official policy about what we announce can be useful. Something like "This is what we consider a breaking change, and we will definitely announce it. And these are some kinds of changes we will also communicate ahead of time. And these are some things that can happen unannounced." You are right that policies don't change the behavior of software. But perhaps they can change the behavior of programmers, by telling them what they can (and can't) safely rely on. It boils down to this: we can try to be more verbose, but if you make assumptions beyond the spec, things will break sooner or later. Writing robust software requires more time and thought initially, but it saves a lot of headaches later. -- daniel Am 04.08.2016 um 21:49 schrieb Markus Kroetzsch: > Daniel, > > You present arguments on issues that I would never even bring up. I think we > fully agree on many things here. Main points of misunderstanding: > > * I was not talking about the WMDE definition of "breaking change". I just > meant > "a change that breaks things". You can define this term for yourself as you > like > and I won't argue with this. > > * I would never say that it is "right" that things break in this case. It's > annoying. However, it is the standard behaviour of widely used JSON parsing > libraries. We won't discuss it away. > > * I am not arguing that the change as such is bad. I just need to know about > it > to fix things before they break. > > * I am fully aware of many places where my software should be improved, but I > cannot fix all of them just to be prepared if a change should eventually > happen > (if it ever happens). I need to know about the next thing that breaks so I can > prioritize this. > > * The best way to fix this problem is to annotate all Jackson classes with the > respective switch individually. The global approach you linked to requires > that > all users of the classes implement the fix, which is not working in a library. > > * When I asked for announcements, I did not mean an information of the type > "we > plan to add more optional bits soonish". This ancient wiki page of yours that > mentions that some kind of change should happen at some point is even more > vague. It is more helpful to learn about changes when you know how they will > look and when they will happen. My assumption is that this is a "low cost" > improvement that is not too much to ask for. > > * I did not follow what you want to make an "official policy" for. Software > won't behave any differently just because there is a policy saying that it > should. > > Markus > > > On 04.08.2016 16:48, Daniel Kinzler wrote: >> Hi Markus! >> >> I would like to elaborate a little on what Lydia said. >> >> Am 04.08.2016 um 09:27 schrieb Markus Kroetzsch: >>> It seems that some changes have been made to the JSON serialization >>> recently: >>> >>> https://github.com/Wikidata/Wikidata-Toolkit/issues/237 >> >> This specific change has been announced in our JSON spec for as long as the >> document exists. >> <https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#wikibase-entityid> >> sais: >> >>> WARNING: wikibase-entityid may in the future change to be represented as a >>> single string literal, or may even be dropped in favor of using the string >>> value type to reference entities. >>> >>> NOTE: There is currently no reliable mechanism for clients to generate a >>> prefixed ID or a URL from the information in the data value. >> >> That was the problem: With the current form
Re: [Wikidata] Breaking change in JSON serialization?
Hi Markus! I would like to elaborate a little on what Lydia said. Am 04.08.2016 um 09:27 schrieb Markus Kroetzsch: > It seems that some changes have been made to the JSON serialization recently: > > https://github.com/Wikidata/Wikidata-Toolkit/issues/237 This specific change has been announced in our JSON spec for as long as the document exists. <https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#wikibase-entityid> sais: > WARNING: wikibase-entityid may in the future change to be represented as a > single string literal, or may even be dropped in favor of using the string > value type to reference entities. > > NOTE: There is currently no reliable mechanism for clients to generate a > prefixed ID or a URL from the information in the data value. That was the problem: With the current format, all clients needed a hard coded mapping of entity types to prefixes, in order to construct ID strings from the JSON serialization of ID values. That means no entity types can be added without breaking clients. This has now been fixed. Of course, it would have been good to announce this in advance. However, it is not a breaking change, and we do not plan to treat additions as breaking changes. Adding something to a public interface is not a breaking change. Adding a method to an API isn't, adding an element to XML isn't, and adding a key to JSON isn't - unless there is a spec that explicitly states otherwise. These are "mix and match" formats, in which anything that isn't forbidden is allowed. It's the responsibility of the client to accommodate such changes. This is simple best practice - a HTTP client shouldn't choke on header fields it doesn't know, etc. See <https://en.wikipedia.org/wiki/Robustness_principle>. If you use a library that is touchy about extra data per default, configure it to be more accommodating, see for instance <https://stackoverflow.com/questions/14343477/how-do-you-globally-set-jackson-to-ignore-unknown-properties-within-spring>. > Could somebody from the dev team please comment on this? Is this going to be > in > the dumps as well or just in the API? Yes, we use the same basic serialization for the API and the dumps. For the future, note that some parts (such as sitelink URLs) are optional, and we plan to add more optional bits (such as normalized quantities) soonish. > Are further changes coming up? Yes. The next one in the pipeline is Quantities without upperBound and lowerBound, see <https://phabricator.wikimedia.org/T115270>. That IS a breaking change, and the implementation is thus blocked on announcing it, see <https://gerrit.wikimedia.org/r/#/c/302248/>. Furthermore, we will probably remove the entity-type and numeric-id fields from the serialization of EntityIdValues eventually. But there is no concrete plan for that at the moment. When we remove the old fields for ItemId and PropertyId, that IS a breaking change, and will be announced as such. > Are we ever > going to get email notifications of API changes implemented by the team rather > than having to fix the damage after they happened? We aspire to communicate early, and we are sorry we did not announce this change ahead of time. However, this is not a breaking change by the common understanding of the term, and will not be treated as such. We have argued about that on this list before, see <https://www.mail-archive.com/wikidata-tech@lists.wikimedia.org/msg00902.html>. I have made it clear back then what we consider a breaking change and what not, and I have advised you that being accommodating in what your client code accepts will avoid headaches in the future. To make this even more clear, we will enact and document something similar to my email from February as official policy soon. Watch for an announcement on this list. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] An attribute for "famous person"
Am 02.08.2016 um 20:19 schrieb Markus Kroetzsch: > Oh, there is a little misunderstanding here. I have not suggested to create a > property "number of sitelinks in this document". What I propose instead is to > create a property "number of sitelinks for the document associated with this > entity". The domain of this suggested property is entity. The advantage of > this > proposal over the thing that you understood is that it makes queries much > simpler, since you usually want to sort items by this value, not documents. > One > could also have a property for number of sitelinks per document, but I don't > think it has such a clear use case. "number of sitelinks for the document associated with this entity" strikes me as semantically odd, which was the point of my earlier mail. I'd much rather have "number of sitelinks in this document". You are right that the primary use would be to "rank" items, and that it would be more conveniant to have the count assocdiated directly with the item (the entity), but I fear it will lead to a blurring of the line between information about the entity, and information about the document. That is already a common point of confusion, and I'd rather keep that separation very clear. I also don't think that one level of indirection would be orribly complicated. To me it's just natural to include the sitelink info on the same level as we provide a timestmap or revision id: for the document. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] An attribute for "famous person"
Am 02.08.2016 um 18:41 schrieb Andrew Gray: > I'd agree with both interpretations - the majority of people in Wikidata are > Using the existence of Wikipedia articles as a threshold, as suggested, seems > a > pretty good test - it's flawed, of course, but it's easy to check for and > works > as a first approximation of "probably is actually famous". If we want to have the number of sidelinks in RDF, let's please make sure that this number is associated with the item *document* uri, not with the concept uri. After all, the person doesn't have links, the item document does. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Grammatical display of units
Am 28.07.2016 um 12:26 schrieb Lydia Pintscher: > The discussion about how to do this is happening in > https://phabricator.wikimedia.org/T86528 The basic problem is that we > do use items for the units. I think this is the right thing to do but > it does make this particular part a bit tricky. Well, I think we could sidestep the grammar issue by using unit symbols. We would have to get them from statements, and they would have to be multilingual values (or mutliple mono-lingual values), but that is still much less complicated than trying to apply plural rules. An alternative is to use MediaWiki i18n messages instead of entity labels. E.g. if the unit is Q11573, we could check if MediaWiki:wikibase-unit-Q11573 exists, and if it does, use it. We'd get internationalization including support for plurals for free. We could actually combine all of these approaches: first check for a system message, then check for a symbol statement, then use the label, and if all fails, use the ID. I'll comment on the ticket. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Controversy around Wikimania talks
Am 31.07.2016 um 17:04 schrieb Gerard Meijssen: > Hoi, > I am not to judge what conferences will be deemed relevant for an item in > Wikidata. When a conference is relevant, it is the talks and particularly the > registrations of the talks, the papers and the presentations that make the > conference relevant after the fact. So you think that for every relevant conference, all talks and speakers should automatically be considered relevant? Does the same aregument apply to all courses and theachers at all relevant universities and schools? I'm trying to understand your point. To me it's a question of granularity. We can't manage arbitrarily fine grained information, so we have to stop at some point. What do you think, where should that point be for Wikimania, for other (relevant) conferences, for universities, for schools? -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Controversy around Wikimania talks
Am 31.07.2016 um 16:28 schrieb Gerard Meijssen: > Hoi, > Really? It is a source for the talks that were given. It contains the papers > that were the basis for granting a spot on the program. To clarify - would the same apply for any talk at any conference? Or do you think Wikimania schould be especially relevant to Wikidata, because it's a Wikimedia thing? -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Machine-readable Wikidata ontology/schema?
Am 23.06.2016 um 21:34 schrieb Nicolas Torzec: > Thanks Stas and Markus. > > I'm interested in computing various stats about Wikidata. For example, I want > to > compute the degree of interlinking between Wikidata and external databases, > per > entity type, per databases, etc. So I need a way to know which properties have > an external identifier as range, along with the name of the external database > they point to. For example P345 is an external identifier to IMDB ; P2639 is > an > external identifier to Filmportal, etc. The machine readable description of P2639 can be found at <http://wikidata.org/entity/P2639.json> or, if you prefer, <http://wikidata.org/entity/P2639.ttl>. Similarly, the class "Film" is described at <http://wikidata.org/entity/Q11424.json> resp <http://wikidata.org/entity/Q11424.ttl> Since these are regular "entities" (items or properties), they are themselves described in terms of the wikibase data model and the wikidata vocabulary, not in terms of RDFS/OWL. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] language fallbacks on Wikipedia and co
Am 15.06.2016 um 23:53 schrieb Gerard Meijssen: > Hoi, > Wil it work using the #babel templates? No, because that would be inconsistent with the fallback that is applied when using Lua or {{#property}} in wikitext. The fallback is based on the fallback that is defined by MediaWiki for the interface labnguages. In wikitext, we cannot use the Babel templates, because that would break caching. The rendering can depend on a few user specific settings, but caching a rendered version of every page for every possible combination of babel templates is not feasible. We could in theory use a different fallback mechanism on Special:AboutTopic, but that would be quite confusing - why does it look differently in articles? Also, when talking to others about the output of Special:AboutTopic, this might get confusing: if someone complains that e.g. some label they see there is wrong, and you go to the page but what you see is different, it becomes hard to discuss the issue. There would be no way to link to the page as you see it. Everyone would potentially see different output. -- daniel ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Fwd: Using sparql to query for labels with matching regex
Hi Mike! I'm no SPARQL expert, but regular expressions in queries are often not optimized using indexes. So *all* labels would need to be checked against the regular expression, which of course times out. But there are other options. Perhaps instead of FILTER regex(?label, "^apparel") try FILTER (STRSTARTS(?label,"apparel")) See <https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#func-strstarts> Another option would be Blazegraph's full text index: WHERE { ?label bds:search "apparel*" . } This woudl match any label that contains a word that starts with apparel. See <https://wiki.blazegraph.com/wiki/index.php/FullTextSearch> HTH Am 29.03.2016 um 22:47 schrieb mike white: > > Hi all > > I am trying to query the wiki data for entities with labels that matches a > regex. I am new in the sparql world. So could you please help me with it. Here > is what I have for now. > > https://gist.github.com/anonymous/2810eb5747e51a9ae746183a43f20771 > > But I don't think it is the right way. Any help will be much appreciate. > Thanks > > > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Wordnet mappings
Am 12.04.2016 um 08:42 schrieb Stas Malyshev: > Hi! > >> Is there a property for WordnetId? More mappings are always good. The case of WordNet is a bit tricky though, since WordNet is about words, not concepts. Wikidata items can perhaps be mapped to SynSets, but we still have to be careful not to get confused about the semantics. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Status and ETA External ID conversion
nical as well as the product level, which in turn is informed from community interaction, among other things. As is often the case, solutions that have to be maintainable and scalable are not quite as nice as one-off solutions for a special case. MediaWiki is conservative about adding special case features for good reasons: it's quite complex as it is, if it had tried to cater to every special case, it would have collapsed under its own weight a long time ago. The idea is to generalize from special cases, and implement something that will work for many more cases, even though it perhaps covers only 90% of what you could do by catering to the special case directly. Of course, overly generic multi-option multi-purpose mechanisms should also be avoided, because they are hard to understand and hard to maintain. So a balance needs to be found. Trying to strike that balance, in 2012 we (in this case including you, iirc) designed data types to be a simple yet sufficiently generic mechanism for associating behavior with values. So now we use it to associate behavior with values (like mapping to URLs and URIs), and I am very reluctant to introduce another mechanism for associating behavior with values. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Status and ETA External ID conversion
Am 10.03.2016 um 20:08 schrieb Young,Jeff (OR): > Then perhaps umbel:isLike instead of owl:sameAs? > > http://wiki.opensemanticframework.org/index.php/UMBEL_Vocabulary#isLike_Property In some cases owl:equivalentProperty may be appropriate https://www.w3.org/TR/owl-ref/#equivalentProperty-def -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Status and ETA External ID conversion
Am 10.03.2016 um 10:26 schrieb Markus Kroetzsch: > I am surprised by the amount of confusion in this discussion. There is > absolutely no relationship between mapping of Wikidata values to URIs and the > external id datatype. You are correct that such a relationship does not necessarily follow from first principles. You are however incorrect in saying that there is no relationship in Wikibase: The way the data model is currently defined and the way mappings are implemented, we made a conscious decision to support such mappings only for ExternalId values. I think it would help the discussion if we could keep apart: - what follows from formal principles - what you (or I) consider best - what the software currently does > (3) The external id datatype does not provide any mapping and the criteria > used > for it by the community do not imply that such mappings should exist for these > cases, or that they should not exist for other cases. That is incorrect from the way Wikibase defines and uses the ExternalId datatype: the intent is indeed to say that something is an identifier that can be mapped, and that such a (direct) mapping is not supported for other data types. (That doesn't mean we will not offer different mappings for other data types, perhaps URLs for looking up coordinates, etc). Modeling this explicitly is indeed the reason to have this datatype. > I am most worried about Daniel's remark. He says that we wants to use external > ids to identify properties with "values that identify a resource", but does > not > mention the existing, community-supported mechanism for doing just that (2), > and > instead proposes another mechanism (3), which the community is clearly not > using > for this purpose at all. That's a misunderstanding. The plan is to support P1921 for URI mappings, and we already do support P1630 for URL mappings. But we intentionally do this only for ExternalId values, not for plain strings or other types. So, the technical implementation does follow the community convention, with the restriction that properties that should use this kind of mapping need to explicitly be declared to be identifiers. We are also considering implementing validation and normalization for ExternalId values, but it's not clear yet how we can safely apply community supplied validation and normalization patterns. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Status and ETA External ID conversion
Am 07.03.2016 um 11:54 schrieb Markus Kroetzsch: > In general, the community uses several classes for properties that could have > been used for UI organisation, rather than introducing new datatypes. Technically, the main purpose of having a separate datatype was to explicity model values that identify a resource (in the RDF sense, where resource means "anything that can be identified unambiguously"), so we can apply mappings (e.g. to URIs and URLs) when exporting and displaying them. Using the datatype for the UI structure is an attempt to kill two birds with one stone. I think it's a pretty good start, but I agree that we should revisit this once we have gathered some feedback. It would not be too hard to base the structure on different criteria (well, depends on the criteria). -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] nice
"They found 12,703 battles which had an exact location and date, 2,657 of them are from Wikidata, the others are from DPpedia." Maybe we can do better? Am 02.03.2016 um 22:14 schrieb Lydia Pintscher: > On Wed, Mar 2, 2016 at 8:14 PM Gerard Meijssen <mailto:gerard.meijs...@gmail.com>> wrote: > > Hoi, > Yup I missed that one.. this [1] was my source :) > Gerard > > [1] http://www.bbc.com/news/magazine-35685889 > > > This is really great. I am thrilled about this because this isn't coverage > about > Wikidata but coverage _with_ Wikidata on major news sites for the second time > this week > (http://www.faz.net/aktuell/feuilleton/kino/academy-awards-die-oscars-von-1929-bis-heute-12820119.html > being > the other one). They're using Wikidata data to do meaningful reporting. Our > data > and the project as a whole got (at the very least) good enough for this. It > feels to me like we've broken through a wall. > High5 everyone! :D > > Cheers > Lydia > -- > Lydia Pintscher - http://about.me/lydia.pintscher > Product Manager for Wikidata > > Wikimedia Deutschland e.V. > Tempelhofer Ufer 23-24 > 10963 Berlin > www.wikimedia.de <http://www.wikimedia.de> > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. > > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter > der > Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für > Körperschaften I Berlin, Steuernummer 27/029/42207. > > > _______ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] SPARQL CONSTRUCT results truncated
Am 11.02.2016 um 10:17 schrieb Gerard Meijssen: > Your response is technical and seriously, query is a tool and it should > function > for people. When the tool is not good enough fix it. What I hear: "A hammer is a tool, it should work for people. Tearing down a building with it takes forever, so fix the hammer!" The query service was never intended to run arbitrarily large or complex queries. Sure, would be nice, but that also means committing an arbitrary amount of resources to a single request. We don't have arbitrary amounts of resources. We basically have two choices: either we offer a limited interface that only allows for a narrow range of queries to be run at all. Or we offer a very general interface that can run arbitrary queries, but we impose limits on time and memory consumption. I would actually prefer the first option, because it's more predictable, and doesn't get people's hopes up too far. What do you think? Oh, and +1 for making it easy to use WDT on labs. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata