from:"\"Daniel Kinzler\""

Re: [Wikidata] Big numbers

2019-10-07 Thread Daniel Kinzler

Am 07.10.19 um 09:50 schrieb John Erling Blad:
> Found a few references to bcmath, but some weirdness made me wonder if it 
> really
> was bcmath after all. I wonder if the weirdness is the juggling with double 
> when
> bcmath is missing.

I haven't looked at the code in five years or so, but when I wrote it, Number
was indeed bcmath with fallback to float. The limit of 127 characters sounds
right, though I'm not sure without looking at the code.

Quantity is based on Number, with quite a bit of added complexity for converting
between units while considering the value's precision. e.g. "3 meters" should
not turn into "118,11 inch", but "118 inch" or even "120 inch", if it's the
default +/- 0.5 meter = 19,685 inch, which means the last digit is
insignificant. Had lots of fun and confusion with that. I also implemented
rounding on decimal strings for that. And initially screwed up some edge cases,
which I only realized when helping my daughter with her homework ;)

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Personal news: a new role

2019-09-19 Thread Daniel Kinzler

Very cool! Looking forward to seeing more of you!

Am 19.09.19 um 18:56 schrieb Denny Vrandečić:
> Hello all,
> 
> Over the last few years, more and more research teams all around the world 
> have
> started to use Wikidata. Wikidata is becoming a fundamental resource [1]. That
> is also true for research at Google. One advantage of using Wikidata as a
> research resource is that it is available to everyone. Results can be 
> reproduced
> and validated externally. Yay!
> 
> I had used my 20% time to support such teams. The requests became more 
> frequent,
> and now I am moving to a new role in Google Research, akin to a Wikimedian in
> Residence [2]: my role is to promote understanding of the Wikimedia projects
> within Google, work with Googlers to share more resources with the Wikimedia
> communities, and to facilitate the improvement of Wikimedia content by the
> Wikimedia communities, all with a strong focus on Wikidata.
> 
> One deeply satisfying thing for me is that the goals of my new role and the
> goals of the communities are so well aligned: it is really about improving the
> coverage and quality of the content, and about pushing the projects closer
> towards letting everyone share in the sum of all knowledge.
> 
> Expect to see more from me again - there are already a number of fun ideas in
> the pipeline, and I am looking forward to see them get out of the gates! I am
> looking forward to hearing your ideas and suggestions, and to continue
> contributing to the Wikimedia goals.
> 
> Cheers,
> Denny
> 
> P.S.: Which also means, incidentally, that my 20% time is opening for new
> shenanigans [3].
> 
> [1] https://www.semanticscholar.org/search?q=wikidata&sort=relevance
> [2] https://meta.wikimedia.org/wiki/Wikimedian_in_residence
> [3] https://wikipedia20.pubpub.org/pub/vyf7ksah
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Language for non-logged in users

2019-01-25 Thread Daniel Kinzler

Am 25.01.19 um 13:33 schrieb DaB.:> Hello.
> Am 25.01.2019 um 12:13 schrieb Daniel Kinzler:
>> Serving different content from the same URL is generally a bad thing.
>
> No, it’s not. That’s the reason they invented Language-headers in the
> first place: So you can view a page in your language and I can view a
> site in my language. Please respect that not everybody can read english
> (fluently).

Headers can solve the caching problem, but this makes it impossible to link to a
specific language version of a page. That is bad when discussing specifics of
the page, and can cause confusion. It's also bad for search engine indexes,
which should index all language versions.

I very much want everyone to be able to see each page in their own language. The
idea is to redirect based on the language header, when visiting the neutral URL.
Please read the proposal.


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Language for non-logged in users

2019-01-25 Thread Daniel Kinzler

The reason this is not trivial is two-fold: 1) caching and b) the semantics of
URLs. Serving different content from the same URL is generally a bad thing.

A soltuion for this is dicussed in <https://phabricator.wikimedia.org/T114662>,
but work on this is currently not resourced.

Am 25.01.19 um 11:44 schrieb Darren Cook:> I wanted to send someone a URL to
show them how a data item looks in
> Japanese (so we could see which items have a translation). But am I
> right in thinking there is nothing I can put in the URL to do this?
>
> I also tried changing my accept-language header to put "ja" first, but
> it is ignored. Was this a feature that was discussed and rejected; or
> just an itch that no-one has got around to scratching yet?
>
> Darren
>
> P.S. I realize I can login, change my UI to another language, and see
> the data that way. But that is quite a long-winded process, especially
> if the person has not created an account yet.
>
> It also changes the whole UI, not just the data, which is painful if I
> just want to see what has been translated but cannot read the language.
> I think for a project about data, you should be able to set the UI
> language and the content language separately.
>
> E.g. I just put a page into Greek (I think), and now I can see the few
> items that have been translated, but cannot read the property names! Let
> alone navigate the site.) (The switch back to previous language link at
> the top was a great idea, though - thank-you to whoever thought of that
> shortcut.)
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-12-06 Thread Daniel Kinzler

Am 06.12.18 um 09:49 schrieb Daniel Kinzler:
> Am 02.12.18 um 02:28 schrieb Erik Paulson:
>> How do these external identifiers work, and how do I get something into one 
>> of
>> these namespaces? (I apologize if I have missed them in the documentation)
> 
> Hi Erik!

Oh, I forgot an important disclaimer: I used to be on the  Wikidata team and I
was involved in discussing and specifying the different levels of federations
for Wikibase repos. I am no longer part of the Wikidata team though, and may not
to up to date to the latest progress. I cannot in any way speak for the Wikidata
team or make any promises.


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-12-06 Thread Daniel Kinzler

Am 02.12.18 um 02:28 schrieb Erik Paulson:
> How do these external identifiers work, and how do I get something into one of
> these namespaces? (I apologize if I have missed them in the documentation)

Hi Erik!

You got the right idea. Sadly, this feature is not implemented yet. I don't know
if there is any public documentation for this by now, but here is a very rough
list of the stepping stones towards allowing what you want:

1) Enable Items and Properties that exist on Wikidata to be referenced from
other Wikibase instances (repo or client) that can access the Wikidata's
internal database directly, and do not themselves define Items or Properties
(but may define other kinds of entities). This is implemented, but not deployed
yet. It is scheduled to be deployed soon on Wikimedia Commons, as part of the
"Structured Data on Coommons" projects (aka Wikibase MediaInfo).

2) Enable Items and Properties that exist on Wikidata to be referenced from
other Wikibase instances (repo or client) that call Wikidata's web API, and do
not themselves define Items or Properties (but may define other kinds of
entities). This is relatively simple, but details about the caching mechanisms
need to be ironed out. Ask Adam and Lydia about the timeline for this.

3) Enable Items and Properties that exist on Wikidata to be referenced from
other Wikibase instances (repo or client) that call Wikidata's web API, and *do*
themselves also define Items or Properties which are *distinct* from the ones
that Wikidata defines. The spec for this is clear, but some old code needs to be
updated to enable this, and some details about the user interface need to be
worked out. Ask Adam and Lydia about the timeline for this.

4) Enable Items and Properties that exist on Wikidata to be referenced from
other Wikibase instances (repo or client) that call Wikidata's web API, and may
 "augment" or "override" the descriptions of Items and Properties defined on
Wikidata. There seems to be a lot of demand for this, but the details of the
semantics are unclear, especially with respect to SPARQL queries. More
discussion is needed.

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-11-29 Thread Daniel Kinzler

Am 29.11.18 um 10:40 schrieb Yuri Astrakhan:>If at
> some point you decide to add some new area of data, e.g. biological, you could
> add new prefixes for that too, but that would also be a "separate" project.

The Q, P, L, M, etc are used to identify the *type* of entity. They are not for
keeping projects separate. That was never their purpose. Wikibase uses prefixes
before that, but they are prefixed *before* the letter that indicates the type.

> The prefix can be omitted for local entities, so Q12345
> is an item on the local repo (or the default repo of a wikibase client).
>
> I think that was a big mistake -- the "(or the default repo of a wikibase
> client)"  -- because wd implies Wikidata, not Wikibase, so it dilutes the
> meaning of "wd:". See my other email on how I fixed it.

I'm confused - yes, we: should ALWAYS imply wikidata. Your wikibase instance
would have its own prefix (that can be omitted for local use), e.g. "osm:".

For the record, I'm just voicing my oppinion here, and telling you what the
original intention was. I'm no longer working on Wikidata or Wikibase, and I
can't make any decisions on any of this.

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-11-29 Thread Daniel Kinzler

Am 29.11.18 um 01:00 schrieb Lydia Pintscher:
> On Thu, Nov 29, 2018 at 9:46 AM Andra Waagmeester  wrote:
>> I fully agree. I rather see the scarse development resources being focused 
>> on fixing this, than the p/q business, as you nicely call it. Tbh, I really 
>> don't see an issue with multiple p's and q's over different Wikibases. That 
>> is where prefixes are for, to distinguish between different resources. 
>> Examples of identical identifier (literal) schemes between multiple  
>> resources are abundant. (e.g. PubMed and NCBI gene) It really is a matter of 
>> getting used to, or am I missing something?
> 
> Are we talking about https://phabricator.wikimedia.org/T194180? I'm
> happy to push that into one of the next sprints if so.

This doesn't fix the hard-coded prefix in the RDF output generated by Wikibase.


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-11-29 Thread Daniel Kinzler

Am 29.11.18 um 08:21 schrieb Imre Samu:
> - What is the real meaning of Q/P prefix  ->  Wikidata or Wikibase?  

The intention was:

P and Q indicate the *type* of the entity ("P" = "Property", "Q" = "Item" for
arcane reasons), "L" = Lexeme, "F" = Form, "S" = Sense, "M" = MediaInfo). As you
can tell, we'd quickly run out of letters and cause confusion if this became
configurable.

Using prefixes to indicate where the entity comes from is indeed useful and is
already part of the model. The prefix for Wikidata is "wd:", wo "wd:Q12345" is
an item from Wikidata. The prefix can be omitted for local entities, so Q12345
is an item on the local repo (or the default repo of a wikibase client).

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-11-29 Thread Daniel Kinzler

Am 28.11.18 um 23:53 schrieb Olaf Simons:
> I will receive answers in the form of 
> 
> wd:q25 
> 
> but they do not lenk to wd, wikidata, but into our database 
> https://database.factgrid.de/entity/Q25. 

Right, that prefix should not be "wd" for your own query service. I'm afraid
that's currently hard coded in the RdfVocabulary class. That should indeed be
fixed.


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-11-28 Thread Daniel Kinzler

Am 28.11.18 um 10:15 schrieb James Heald:
> It should also be made possible for the local wikibase to use local prefixes
> other than 'P' and 'Q' for its own local properties and items, otherwise it
> makes things needlessly confusing -- but currently I think this is not 
> possible.
I think the opposite is the case: ending up with a zoo of prefixes, with items
being called A73834 and F0924095 and Q98985 and W094509, would be very
confusing. The current approach is to to use the same approach that RDF and XML
use: add a kind of namespace identifier in front of "foreign" identifiers. So
you would have Q437643 for "local" items, xy:Q8743 for items from xy,
foo:Q873287 for items from foo, etc. This is how foreign IDs are currently
implemented in Wikibase.


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Daniel Kinzler

Hi Pine, sorry for the misleading wording. Let me clarify below.

Am 19.10.18 um 9:51 nachm. schrieb Pine W:
> Hi Markus, I seem to be missing something. Daniel said, "And I think the best
> way to achieve this is to start using the ontology as an ontology on wikimedia
> projects, and thus expose the fact that the ontology is broken. This gives
> incentive to fix it, and examples as to what things should be possible using
> that ontology (namely, some level of basic inference)." I think that I
> understand the basic idea behind structured data on Commons. I also think 
> that I
> understand your statement above. What I'm not understanding is how Daniel's
> proposal to "start using the ontology as an ontology on wikimedia projects, 
> and
> thus expose the fact that the ontology is broken." isn't a proposal to add 
> poor
> quality information from Wikidata onto Wikipedia and, in the process, give
> Wikipedians more problems to fix. Can you or Daniel explain this?

What I meant in concrete terms was: let's start using wikidata items for tagging
on commons, even though search results based on such tags will currently not
yield very good results, due to the messy state of the ontology, and hope people
fix the ontology to get better search results. If people use "poodle" to tag an
image and it's not found when searching for "dog", this may lead to people
investigating why that is, and coming up with ontology improvements to fix it.

What I DON'T mean is "let's automatically generate navigation boxes for
wikipedia articles based on an imperfect  ontology, and push them on everyone".
I mean, using the ontology to generate navigation boxes for some kinds of
articles may be a nice idea, and could indeed have the same effect - that people
notice problems in the ontology, and fix them. But that would be something the
local wiki communities decide to do, not something that comes from Wikidata or
the Structured Data project.

The point I was trying to make is: the Wiki communities are rather good in
creating structures that serve their purpose, but they do so pragmatically,
along the behavior of the existing tools. So, rather than trying to work around
the quirks of the ontology in software, the software should use very simply
rules (such as following the subclass relation), and let people adopt the data
to this behavior, if and when they find it useful to do so. This approach, over
time, provides better results in my opinion.

Also, keep in mind that I was referring to an imperfect *improvement* of search.
the alternative being to only return things tagged with "dog" when searching for
"dog". I was not suggesting to degrade user experience in order to incentivize
editors. I'm rather suggesting the opposite: let's NOT give people a reason tag
images that show poodles with "poodle" and "dog" and "mammal" and "animal" and
"pet" and...

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-18 Thread Daniel Kinzler

Am 18.10.2018 um 19:05 schrieb Peter F. Patel-Schneider:
> On 10/17/18 7:04 AM, Daniel Kinzler wrote:
>> My (very belated) thoughts on this issue:
>>
> [...]
>> I say: let it produce> bad results, tell people why the results are bad, and
> what they can do about it!
> [...]
>>
>> -- daniel
> My view is that there is a big problem with this for industrial use of 
> Wikidata.
> 
[...]
> What is the biggest problem I see in Wikidata?  It is the poor organization of
> the Wikidata ontology.  To fix the ontology, beyond doing point fixes, is
> going to require some commitment from the Wikidata community.

I agree. And I think the best way to achieve this is to start using the ontology
as an ontology on wikimedia projects, and thus expose the fact that the ontology
is broken. This gives incentive to fix it, and examples as to what things should
be possible using that ontology (namely, some level of basic inference).

-- 
Daniel Kinzler
Principal Software Engineer, MediaWiki Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-17 Thread Daniel Kinzler

gt; "...the burden of proof has to be placed on authority, and it should be
> dismantled if that burden cannot be met..."
> 
> -Thad
> +ThadGuidry <https://plus.google.com/+ThadGuidry>
> 
> 
> On Sat, Sep 29, 2018 at 2:49 AM Ettore RIZZA  <mailto:ettoreri...@gmail.com>> wrote:
> 
> Hi,
> 
> The Wikidata's ontology is a mess, and I do not see how it could be
> otherwise. While the creation of new properties is controlled, any 
> fool
> can decide that a woman <https://www.wikidata.org/wiki/Q467>is no 
> longer
> a human or is part of family. Maybe I'm a fool too? I wanted to remove
> the claim that a ship <https://www.wikidata.org/wiki/Q11446> is an
> instance of "ship type" because it produces weird circular inferences 
> in
> my application; but maybe that makes sense to someone else.
> 
> There will never be a universal ontology on which everyone agrees. I
> wonder (sorry to think aloud) if Wikidata should not rather facilitate
> the use of external classifications. Many external ids are knowledge
> organization systems (ontologies, thesauri, classifications ...) I 
> dream
> of a simple query that could search, in Wikidata, "all elements of the
> same class as 'poodle' according to the classification of imagenet
> <http://imagenet.stanford.edu/synset?wnid=n02113335>.
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Solve legal uncertainty of Wikidata

2018-05-18 Thread Daniel Kinzler

Am 18.05.2018 um 21:37 schrieb Amirouche Boubekki:
> What wikidata doesn't track the license of each piece of information?!

Facts don't *have* licenses. They have sources, and we track those. Which may
have licenses, depending on jurisdiction, interpretation, form, content, etc.
But the fact itself doesn't, it's not copyrightable.


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] RDF: All vs Truthy

2017-12-03 Thread Daniel Kinzler

Am 03.12.2017 um 15:06 schrieb Fariz Darari:
> Current state gives me one result, the Russian ruble, due to its preferred 
> rank
> (notice the wdt prefix):
> 
> https://query.wikidata.org/#select%20%2a%0A%7B%20wd%3AQ159%20wdt%3AP38%20%3Fcurrency%20%7D

Ah, right - the current answer would by convention be marked as preferred, so
only it counts as "truthy". Sorry for the confusion.


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] RDF: All vs Truthy

2017-12-03 Thread Daniel Kinzler

Am 03.12.2017 um 14:49 schrieb Imre Samu:
>>All=contains not only the Truthy ones,but also the ones with qualifiers
> 
> imho:  Sometimes Qualifiers is very important for multiple values  (   like
> "Start time","End time","point in time", ... )
> for example:   Russia https://www.wikidata.org/wiki/Q159  :  Russia - 
> P38:"currency"
> has 2 "statements" both with qualifiers:
> 
> * Russian ruble -  ( start time: 1992 )
> * Soviet ruble  - (end time: September 1993 )
> 
> My Question:
> in this case - what is the "Truthy=simple" result for   Russia-P38:"currency" 
> ?

You will simply get two truthy results: Russian rubel, and Soviet rubel. Both
are Russian currencies. If you want to know when, why, where, etc, you have to
check the qualified "full" statements.

That's why it's called "truthy": the answer is kind of true, depending on 
context.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] How to get direct link to image

2017-10-30 Thread Daniel Kinzler

Am 30.10.2017 um 19:10 schrieb Laura Morales:
>> You can also use the Wikimedia Commons API made by Magnus:
> https://tools.wmflabs.org/magnus-toolserver/commonsapi.php
>> It will also gives you metadata about the image (so you'll be able to cite 
>> the author of the image when you reuse it).
> 
> Is the same metadata also available in the Turtle/HDT dump?

Sadly not. We don't have proper structured meta-data yet. That's what the
Structured Data on Commons project is about:
<https://commons.wikimedia.org/wiki/Commons:Structured_data>


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata is becoming a proper citizen of the linked open data web

2017-10-27 Thread Daniel Kinzler

Due to massive performance problems, Wikibase was roleld back to the previous
version last night. So the new features currently do not work. We are working
hard to find the cause of the problem (which may or may not be related to
Wikidata), so we can deploy the latest version again.

Sorry for the confusion!

Am 27.10.2017 um 11:06 schrieb Jakob Voß:
> Hi Lydia an all of you,
> 
> Lydia wrote:
> 
>> The identifier can often be expanded to a full URI. (For example, LoC
>> ID n81114174 becomes http://id.loc.gov/authorities/names/n81114174.)
>> This full URI can then be used in the linked open data web to match
>> our data with other datasets and use both of them together easily.
>>
>>  From today on, Wikidata has full URIs for statements that represent
>> external identifiers in its RDF exports, and thereby becomes a proper
>> citizen of the linked open data web. To make this work the property
>> for the external ID needs to have a statement with property “URI used
>> in RDF” (https://www.wikidata.org/wiki/Property:P1921).
> 
> Could you give an example? The RDF of item Q43027 with LoC ID n81114174 does 
> not
> include the URI <http://id.loc.gov/authorities/names/n81114174> if exported 
> with
> http://www.wikidata.org/wiki/Special:EntityData/Q43027
> 
> I also tried with a statement just added to make sure it's not some caching
> issue. Is the feature not enabled yet?
> 
> In particular I'm interested how the external URI and Wikidata URI are 
> connected.
> 
> subject: <http://www.wikidata.org/entity/Q43027>
> property: ???
> object: <http://id.loc.gov/authorities/names/n81114174>
> 
> I'm sure the RDF-property also depends on the Wikidata-property so this 
> feature
> requires some additional tweaking. At least the property is not always
> owl:sameAs because we have at least 1-to-n relationships between Wikidata 
> items
> and external ids.
> 
> Cheers,
> Jakob
> 
> P.S: Won't have time to cover all these aspects in my WikidataCon Lightening
> talk about https://www.wikidata.org/wiki/Wikidata:Identifiers
> 


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata prefix search is now Elastic

2017-10-26 Thread Daniel Kinzler

Am 26.10.2017 um 11:36 schrieb Marco Fossati:
> Thanks a lot Stas for this present.
> Could you please share any pointers on how to integrate it into other tools?

Just keep using wgsearchentities. It now uses Cirrus as a backend, instead of
SQL. That should provide better performance, and better ranking.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Navigation to Wikipedia links on Wikidata

2017-09-05 Thread Daniel Kinzler

If your browser window is wide enough, Sitelinks to Wikipedia should already be
close to the top of the page, on the right-hand side.

But in any case, you can always add #sitelinks-wikipedia to the URL, like in
<https://www.wikidata.org/wiki/Q1#sitelinks-wikipedia>. That will make the
browser jump right to the wikipedia section.

Am 05.09.2017 um 16:47 schrieb Tito Dutta:
> Hello,
> If I am on a Wikidata item page (QX), what's the easiest way to navigate 
> to
> the Wikipedia links other than manual scrolling? Sometimes (actually a lot of
> times) I need to check Wikipedia articles (not only English) before I add
> description part. Is there any user script or something that puts Wikipedia
> links above statement or any other suggestion?
> 
> Thanks
> Tito Dutta
> Note: If I don't reply to your email in 2 days, please feel free to remind me
> over email or phone call.
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] lib.reviews: Review anything with a Wikidata entry

2017-07-26 Thread Daniel Kinzler

Thanks for sharing, Erik!

Combining search and quiery capabilities would needed be useful for quite a few
things. We'll probably be working on making this easier soon.

-- daniel

Am 26.07.2017 um 07:26 schrieb Erik Moeller:
> A small update on this: based on some off-list feedback, I replaced
> the way I exclude disambiguation pages and the like from the
> autocomplete list. The autocomplete widget now performs two queries: a
> MediaWiki API (wbsearchentities) query, and a follow-up WDQS SPARQL
> query to exclude disambiguation pages, Wikinews articles, and other
> content that folks are most likely not interested in reviewing.
> 
> I didn't find a good example for this in the examples directory, so I
> figured folks might find the query I'm using useful. Before I add it
> to the examples, please let me know if you see obvious ways in which
> it can be improved.
> 
> Here's an example query:
> 
> # For a list of items, exclude the ones that have "instance of" set to
> # one from a given set of excluded classes
> SELECT DISTINCT ?item WHERE {
>  ?item ?property ?value
> 
>   # Excluded classes: disambiguation pages, Wikinews articles, etc.
>   MINUS { ?item wdt:P31 wd:Q4167410 }
>   MINUS { ?item wdt:P31 wd:Q17633526 }
>   MINUS { ?item wdt:P31 wd:Q11266439 }
>   MINUS { ?item wdt:P31 wd:Q4167836 }
>   MINUS { ?item wdt:P31 wd:Q14204246 }
> 
>   # Set of items to check against the above exclusion list
>   # wd:Q355362 is a disambiguation page and will therefore not be in
>   # the result set
>   VALUES ?item { wd:Q23548 wd:Q355362 wd:Q1824521 wd:Q309751 wd:Q6952373 }
> }
> 
> _______
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wiki PageID

2017-04-24 Thread Daniel Kinzler

Hello Gintautas!

Am 21.04.2017 um 17:58 schrieb Gintautas Sulskus:
> I have a couple of questions regarding the Wiki Page ID. Does it always stay
> unique for the page, where the page itself is just a placeholder for any kind 
> of
> information that might change over time? 

That is indeed the idea. COntent changes, the page ID stays the same. If you
need to identify a specific state of the page, use the revision ID (aka 
permalink).

Note however that page IDs are considered "internal" identifiers. They are
stable, but they are not the canonical way to access or identify a page. Use the
title for that - or, in the context of Wikidata, use the entity ID.

> Consider the following cases:
> 1. The first time someone creates page "Moon" it is assigned ID=1. If at some
> point the page is renamed to "The_Moon", the ID=1 remains intact. Is this 
> correct?

Yes, page IDs survive renaming/moving the page.

> 2. What if we have page "Moon" with ID=1. Someone creates a second-page
> "The_Moon" with ID=2. Is it possible that page "Moon" is transformed into a
> redirect? Then, "Moon" would be redirecting to page "The_Moon"?

Yes, pages can become redirects.

> 3. Is it possible for page "Moon" to become a category "Category:Moon" with 
> the
> same ID=1?

Yes, pages can be moved into the category namespace.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler

Am 10.04.2017 um 18:12 schrieb Denny Vrandečić:
> So assume we enter a new Lexeme in Examplarian (which has a Q-Item), but
> Examplarian has no language code for whatever reason. What language code would
> they enter in the MultilingualTextValue?

My plan is: it will be "mis+Q7654321" internally, which will be exposed in HTML
and RDF as "mis".

We will want to distinguish "a known language not on this list (mis)" from "an
unknown language (und)" and "translingual" (Wiktionary uses "mul" for
translingual, but that's not technically correct).

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler

Am 10.04.2017 um 19:24 schrieb Denny Vrandečić:
> Daniel, I agree, but isn't that what Multilingual Text requires? A language 
> code?

Yes. Well, internally, it just has to be *some* unique code. But for
interoperability, we want it to be a standard code. So I propose to internally
use something like "de+Q1980305", and expose that as "de" externally. This
allows us to distinguish however many variants of German we want internally, and
tag them all as "de" in HTML and RDF, so standard tools can use the language
information.

> I assume most of it is hidden behind mini-wizards like "Create a new lexeme",
> which actually make sure the multitext language and the language property are
> consistently set. In that case I can see this work.

Yes, that is exactly the plan for the NewLexeme page.

We'll still have to come up with a nifty UI for "add a lemma, select a language,
and optionally an item identifying a variant of that language".

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler

Am 10.04.2017 um 18:56 schrieb Gerard Meijssen:
> Hoi,
> The standard for the identification of a language should suffice.

I know no standard that would be sufficient for our use case.

For instance, we not only need identifiers for German, Swiss and Austrian
German. We also need identifiers for German German before and after the spelling
reform of 1901, and before and ofter the spelling reform of 1996. We will also
need identifiers for the "language" of mathematical notation. And for various
variants of ancient languages: not just Sumerian, but Sumerian from different
regions and periods.

The only system I know that gives us that flexibility is Wikidata. For
interoperability, we should provide a standard language code (aka subtag). But a
language code alone is not going to be sufficient to distinguish the different
variants we will need.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler

Tobias' comment made me realize that I did not clarify wone very important
distinction: there are two kinds of places where a "language" is needed in the
Lexeme data model
<https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model>:

1) the "lexeme language". This can be any Item, language code or no. This is
what Tobias would have to use in his query.

2) the language codes used in the MultilingualTextValues (lemma, representation,
and gloss). This is where my "hybrid" approach comes in: use a standard language
code augmented by an item ID to identify the variant.

To make it easy to create new Lexemes, the lexeme language can serve as a
default for lemma, representation, and gloss - but only if it has a language
code. If it does not have one, the user will have to specify one for use in
MultilingualTextValues.


Am 06.04.2017 um 19:59 schrieb Tobias Schönberg:
> An example using the second suggestion:
> 
> If I would like to query all L-items that contain a combination of letters and
> limit those results by getting the Q-items of the language and limit those, to
> those that have Latin influences.
> 
> In my imagination this would work better using the second suggestion. Also the
> flexibility of "what is a language" and "what is a dialect" would seem easier 
> if
> we can attach statements to the UserLanguageCode or the Q-item of the 
> language.
> 
> -Tobias


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Disputed territories in WDQS

2017-04-09 Thread Daniel Kinzler

Hi Andrea!

As Nicolas pointed out, the map view of WDQS is based on OpenStreetMap. So the
territory would have to be marked as disputed there.

However, perhaps you can turn this into a positive example for Wikidata's
flexibility and NPOV afterall: I have added some statements to
<https://www.wikidata.org/wiki/Q5671580> to show how a territorial dispute can
be modeled on Wikidata.

I was lazy and didn't add any sources, though - I didn't know what to make of
"Donovan 2003" given in Wikipedia, as it doesn't give the title of a
publication. But I suppose sources for these things should be easy to find.

HTH
daniel

Am 09.04.2017 um 14:54 schrieb Andra Waagmeester:
> I am currently in Suriname, where I gave a talk on open 
> data/wikipedia/wikidata.
> Next week there will be a handson session, where I hope to get as much
> contribution from this country as possible.
> 
> When I demonstrated the WDQS, the audience took offense in the way Suriname is
> depicted on the map view used in the WDQS. There is a territorial dispute with
> the neighboring country Guyana, called the Tigri
> area(https://en.wikipedia.org/wiki/Tigri_Area). In the WDQS this area is
> currently being drawn as being part of Guyana. The maps drawn in the WIkipedia
> article shows how the issue is dealt with here when drawing maps. i.e. The 
> area
> is explicitly drawn as being a territorial dispute, which is more factual. 
> 
> Any idea's on how to get a similar mapview on the WDQS? Thanks to Wikipedia
> Zero, where people can have free access to Wikidata (even in remote area's),
> there is quite some potential to get people involved in adding local data.
> Having the current mapview is counter productive. 
> 
> Cheers,
> 
> Andra
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-07 Thread Daniel Kinzler

Am 07.04.2017 um 01:34 schrieb Denny Vrandečić:
> I foresee that might be a bit of a problem for external tools consuming
> this data - how they would figure out what language it is if it's
> doesn't have a code? We could of course generate fake codes like
> mis-x-q12345, maybe that would work.
> 
> Q-items for languages already have a property to state their language code. 
> It's
> just an extra hop away. 

We want ISO codes (or rather, IANA language subtags [1]), so we can use them in
HTML lang attributes, and in RDF literals. This allows interoperability with
standard tools.

For this reason, I also favor a mixed approach, that allows standard language
tags to be used whenever possible. I have some ideas on how that could work, but
no definite plan yet.

Something like de+Q1980305 could work; when generating HTML or RDF, we'd just
drop the suffix. For transligual entries (e.g. the for number symbol i), we
could use e.g. mis+Q1140046.


[1]
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Significant change: new data type for geoshapes

2017-03-29 Thread Daniel Kinzler

Am 29.03.2017 um 15:19 schrieb Luca Martinelli:
>> One thing to note: We currently do not export statements that use this
>> datatype to RDF. They can therefore not be queried in the Wikidata Query
>> Service. The reason is that we are still waiting for geoshapes to get stable
>> URIs. This is handled in this ticket.

This ticket: <https://phabricator.wikimedia.org/T159517>. And more generally
<https://phabricator.wikimedia.org/T161527>.

The technically inclined of you may be interested in joining the relevant RFC
discussion on IRC tonight at 21:00 UTC (2pm PDT, 23:00 CEST) #wikimedia-office.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] wikibase:directClaim predicate?

2017-03-27 Thread Daniel Kinzler

Am 27.03.2017 um 23:48 schrieb Kingsley Idehen:
> I think we can just agree to disagree for now, since nothing you've
> stated is fundamentally contrary to my view of RDF --  as a Language for
> describing anything (including statements)  :)

Yes, that's what RDF is. My pint is: just because seomthing can be described in
RDF doesn't mean it *is* RDF.

As you said, RDF can describe anything. If anything that can be described with
RDF *is* RDF, then everything is RDF. Then the term would be meaningless.

The Wikibase model "is" an RDF model just as much as it "is" modal logic system,
or any other sufficiently powerful formal language.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] wikibase:directClaim predicate?

2017-03-27 Thread Daniel Kinzler

Am 27.03.2017 um 15:13 schrieb Kingsley Idehen:
> I see Wikidata is a collection of reified RDF Statements. I don't see how this
> model differs from RDF's model. It just so happens (in my eyes) that Wikidata
> includes description of statements about things which provides rich metadata, 
> in
> line with the goals of Wikidata.

It's a matter of perspective.

I agree that Wikidata can be *represented* as a collection of reified RDF
Statements. That's what we do for the query service. But I do not agree that
this is what Wikidata *is*.

RDF and the Wikibase model are quite different conceptually. But they are of
equal power and thus formally equivalent: one can be represented using the
other. Just because a Turing Machine is computationally equivalent to lambda
calculus, that does not mean they are the same thing. Understanding one in terms
of the other may be helpful in some context, and irrelevant in another.

There is nothing special about the relationship between Wikibase/Wikidata and
RDF; Wikibase has an RDF binding, but it is not defined in terms of RDF, its
specification does not rely on RDF concepts. The Wikibase model can just as well
(or perhaps more easily) be understood and represented in terms of the Topic
Maps model (ISO 13250).

Academically, the Wikibase model could perhaps be described as an extended model
logic with reasoning rules for provenance. I think W. Stelzner explored related
ideas in the 80s. Maybe one day I'll find the time to dig into this some more.


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Does Wikidata use a property store or a RDF triplestore?

2017-03-24 Thread Daniel Kinzler

The primary data storage is document oriented, and very dumb. It's JSON blobs
stored as wiki page content, using MediaWiki's standard content blob storage
mechanism.

We have a live export to a triple store, and an open SPARQL endpoint.

These links may be helpful:

https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
https://www.wikidata.org/wiki/Wikidata:Data_access

If you want to play with the data, try http://query.wikidata.org/

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] What kind of bot "wiktionary in wikidata" needs?

2017-03-24 Thread Daniel Kinzler

Am 22.03.2017 um 10:10 schrieb Amirouche:
> My understanding is that wiktionary (and wikipedia) CC-BY-SA license is
> incompatible with wikidata CC0 license.

That is true, for any copyrighted information on Wiktionary. That will mainly be
definitions, and maybe example sentences. Facts, such as word type or
morphology, are not copyrightable.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] wikibase:directClaim predicate?

2017-03-19 Thread Daniel Kinzler

Am 19.03.2017 um 18:21 schrieb Bob DuCharme:
> I do have to ask: if the mapping used on wikidata.org has diverged from what 
> is
> described there, is a more up-to-date description of the mapping available
> anywhere?

The current mapping is the one described at
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] wikibase:directClaim predicate?

2017-03-18 Thread Daniel Kinzler

Am 18.03.2017 um 23:15 schrieb Daniel Kinzler:
> Wikibase Entities are certainly Resources in the RDF sense, but so are some of
> the more fine grained components of the Wikibase model, such as Statements and
> References. You can find the OWL file for the RDF binding of Wikibase at
> <http://wikiba.se/ontology>.

If you are interested, there's a paper about mapping Wikidata to RDF:

http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf

Note however that the mapping used on wikidata.org has somewhat diverged from
what is discribed in the paper.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] wikibase:directClaim predicate?

2017-03-18 Thread Daniel Kinzler

Am 18.03.2017 um 22:48 schrieb Bob DuCharme:
> New question: when I see that https://www.wikidata.org/wiki/Special:EntityData
> says "This page provides a linked data interface to entity values", can you 
> tell
> me what "entity" means in the context of Wikidata? If I was going to refer to
> something that can be identified with a URI and described by triples in which 
> it
> is the subject, I would just use the term "resource" as described at
> https://www.w3.org/TR/rdf11-concepts/#resources-and-statements (and 
> remembering
> what "RDF" stands for!) so I'm guessing that "entity" means something a little
> more specific than that here.

The Wikidata (or technically, Wikibase) data model is not defined in terms of
RDF. Have a look at the primer
<https://www.mediawiki.org/wiki/Wikibase/DataModel/Primer> and the spec
<https://www.mediawiki.org/wiki/Wikibase/DataModel>.

Entitites are the top level elements of Wikidata. There are currently two kinds:
Items (things or concepts in the world) and Properties (attributes for
describing Items and other entities).

Wikibase Entities are certainly Resources in the RDF sense, but so are some of
the more fine grained components of the Wikibase model, such as Statements and
References. You can find the OWL file for the RDF binding of Wikibase at
<http://wikiba.se/ontology>.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] wikibase:directClaim predicate?

2017-03-18 Thread Daniel Kinzler

Am 18.03.2017 um 21:27 schrieb Bob DuCharme:
> Thanks Daniel!
> 
> How do I find a full statement representation? For example, what would the 
> full
> statement representation be for a triple like
> {wd:Q64 wdt:P1376 wd:Q183}?

The full representation of the statement in this case is:

wds:Q64-43CCD3D6-F52E-4742-B0E3-BCA671B69D2C a wikibase:Statement,
wikibase:BestRank ;
wikibase:rank wikibase:PreferredRank ;
ps:P1376 wd:Q183 ;
prov:wasDerivedFrom wdref:ba76a7c0f885fa85b10368696ab4ac89680aa073 .

wdref:ba76a7c0f885fa85b10368696ab4ac89680aa073 a wikibase:Reference ;
pr:P248 wd:Q451546 ;
pr:P958 "Artikel 2 (1)" .

This RDF representation can be found at
<https://www.wikidata.org/wiki/Special:EntityData/Q64.ttl>. Content negotiation
will take you there from the canonical URI,
<https://www.wikidata.org/entity/Q64.ttl>

In addition to the actual value, the RDF above also give the rank, and a source
reference (nameley, the re-unification treaty).

This statement doesn't currently have a qualifier - it should have at least one,
stating since when Berlin is the Capital of Germany. That qualifier would be
represented as:

wds:Q64-43CCD3D6-F52E-4742-B0E3-BCA671B69D2C pq:P580
  "1990-10-03T00:00:00Z"^^xsd:dateTime ;


The Statement ID, Q64$43CCD3D6-F52E-4742-B0E3-BCA671B69D2C, can be found in the
HTML source of the page, encoded as a CSS class. These IDs are not exposed
nicely anywhere. But usually, one would look at the RDF representation right
away, or at least got from HTML to *all* the RDF.

HTH
-- daniel


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] wikibase:directClaim predicate?

2017-03-18 Thread Daniel Kinzler

Am 18.03.2017 um 19:03 schrieb Bob DuCharme:
> What makes a predicate a direct claim predicate?

It's a predicate (that's what RDF calls all relationships) that expresses a
direct claim (as opposed to a full statement).

Direct claims are one of two ways Wikidata Statements are mapped to RDF. In the
wikidata query service, each statement is represented twice - once as a full
statement, and once as a direct claim.


Direct claims represent a "naive projection" of wikidata to RDF: everything that
is claimed (by anyone) to be true (under any circumstances) is assumed to be
true. So you get triples like meaning "Berlin -
capital-of - Germany".

Simple to work with, but incomplete: you also get  
  ("Berlin - capital-of - Kingdom of Prussia"), without an easy way
to see that one is current and the other is not.

To get all the additional context information, you need to look at the full
statement representation, which provides a complex structure of value,
qualifiers, and source references. The full mapping will use predicates with the
"wds" prefix to connect the item (subject) to the structure representing the
statement with all its parts.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread Daniel Kinzler

Am 27.02.2017 um 18:18 schrieb James Heald:
> From what Daniel is saying, it seems this may not be possible, because the
> template expansion would then depend on the user's preferred language(s), 
> which
> would not be compatible with the template cacheing.
> 
> Is that right?   Or is there a way round this?

We are currently aiming for a compromise: we render the page with the user's
interface language as the target language, and apply fallback accordingly. We do
not take into account secondary user languages, as defined e.g. by the Babel or
Translate extensions.

This means a user with the UI language set to French will see French if
available, but will not see Spanish, even if they somehow declared that they
also speak Spanish.

This way, we split the parser cache once per UI language - a factor of 300, but
not the exponential explosion we would get if we would split on every possible
permutation of languages (does anyone want to compute 300 factorial?).


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread Daniel Kinzler

Am 27.02.2017 um 17:01 schrieb James Hare:
> One option is to allow users to define their own ranked preferences for 
> language
> beyond just first place. (I personally would enjoy having French as a fallback
> to English.)

That would badly fragment the parser cache. I don't think it's viable.

> This has the downside of only really working for people with
> accounts, which I suspect might be a minority of overall traffic.

Currently, we only support English for anon visiors (yes, this is very sad; the
reason is, again, caching - varnish, this time).

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread Daniel Kinzler

Am 19.02.2017 um 17:00 schrieb Romaine Wiki:
> Hi all,
> 
> If you look in the recent changes, most items have labels in English and those
> are shown in the recent changes and elsewhere (so we know what the item is 
> about
> without opening first).

Wikidata actually tries to show you the labels in your üpreferred interface
language. And if you user language is not available, it uses a fallback
mechanism to show the next-best language, which may even include automated
transciptions. When all else fails, it will show the English label. If that
doesn't exist, it shows the ID.

> But not all items have labels, and these items without
> English label are often items with only a label in Chinese, Arabic, Cyrillic
> script, Hebrew, etc. This forms a significant gap.

The fallback mechanism works OK, but is not great for English speaking users who
see a lot of items that have no English label. For English, we just don't know
what to fall back to. Just anything? Or try european languages first? What
should the rule be? If we can decide on a good rule, it should actualyl be
pretty simple to add such fallback for English.

> Is there a way to easily make a transcription from one language to another?

We have such rules for some languages/variants, e.g. between the cyrillic and
the roman representations of Kazakh or Uzbek. But translitteration rules can be
complex, and covering every permutation of the 300 languages we support would
mean we'd need about 45000 rule sets...

> Or alternatively if there is a database that has such transcriptions?

Not yet. One of the goals of Wikidata is to be that database.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Full Text Search in Query Service

2017-02-17 Thread Daniel Kinzler

Am 17.02.2017 um 22:02 schrieb James Heald:
> Quick question on this Stas:
> 
> * Why do the suggestions that come up when typing in the search box seem so 
> much
> more on-point (ie better at presenting the most likely option first) than the
> ones that come up in the results list?

The reason is that the "search box" on wikidata.org is fake: it is not the
search box you see on wikipedia, it does not use the search infrastructure that
Special:Search uses (Cirrus). It uses a custom API module (wbsearchentities)
which relies on a custom database table (wb_terms). We need this because Cirrus
did not have suppor for structured data or multilingual fields. That is changing
now, and we want to use Cirrus for everything. But until then, wikidata is using
two completely different search mechanisms, both of which work well for some
things, and really badly for others.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata ontology

2017-01-11 Thread Daniel Kinzler

Am 09.01.2017 um 11:16 schrieb Peter F. Patel-Schneider:
> Although there is no formal problem here, care does have to be taken when
> modelling entities that are to be considered as both classes and non-classes
> (or, and especially, metaclasses and non-metaclass classes).  It is all too
> easy for even experienced modellers to make mistakes.  The problem is worse
> when the modelling formalism is weak (as the Wikidata formalism is) and thus
> does not itself provide much support to detect mistakes.  The problem is even
> worse when the modelling methodology often does not provide much description
> of the entities (as is the case in Wikidata).

That's what I meant by "problematic". I did not mean to say it's *wrong* per se.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata ontology

2017-01-09 Thread Daniel Kinzler

Am 09.01.2017 um 04:36 schrieb Markus Kroetzsch:
> Only the "current king of Iberia" is a single person, but Wikidata is about 
> all
> of history, so there are many such kings. The office of "King of Iberia" is
> still singular (it is a singular class) and it can have its own properties 
> etc.
> I would therefore say (without having checked the page):
> 
> King of Iberiainstance of  office
> King of Iberiasubclass of  king

To be semantically strict, you would need to have two separate items, one for
the office, and one for the class. Because the individual kinds have not been
instances of the office - they have been holders of the office. And they have
been instances of the class, but not holders of the class.

On wikidata, we often conflate these things for sake of simplicity. But when you
try to write queries, this does not make things simpler, it makes it harder.

Anything that is a subclass of X, and at the same an instance of Y, where Y is
not "class", is problematic. I think this is the root of the confusion Gerards
speaks of.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata ontology

2017-01-05 Thread Daniel Kinzler

Am 04.01.2017 um 11:00 schrieb Léa Lacroix:
> Hello,
> 
> You can find it here: http://wikiba.se/ontology-1.0.owl
> 
> If you have questions regarding the ontology, feel free to ask.


Please note that this is the *wikibase* ontology, which thefines the meta-model
for the information on Wikidata. It defines models statements, sitelinks, source
references, etc.

This ontology does not model "real world" concepts or properties like location
or color or children, etc. Modeling on this level is done on Wikidata itself,
there is no fixed RDF or OWL schema or ontology.

The best you can get in terms of "downloading the wikidata ontology" would be to
download all properties and all the items representing classes. We currently
don't have a separate dump for these. Also, do not expect this to be a concise
or consistent model that can be used for reasoning. You are bound to find
contradictions and lose ends.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata Redirects in dumps

2016-12-13 Thread Daniel Kinzler

Am 12.12.2016 um 20:53 schrieb Praveen Balaji:
> When using JSON dumps, how can I tell a redirected entity from the JSON dumps

If you look at the ID of the entity you get when you ask for
<https://www.wikidata.org/wiki/Special:EntityData/Q6703218.json>, you will
notice that it does not have the ID you requested. This way, you know that you
have been redirected.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?

2016-11-28 Thread Daniel Kinzler

Am 28.11.2016 um 17:34 schrieb gnosygnu:
>> The datatype is implicit, it can be derived from the property ID. You can 
>> find
>> it by looking at the Property page's JSON.
>> ...
> 
> Thanks for all the info. I see my error. I didn't realize that
> mainsnak.datatype was inferred. I assumed it would have to be embedded
> directly in the XML's JSON  (partly because it is embedded directly in
> the JSON's dump JSON)
> 
> The rest of your points make sense. Thanks again for taking the time to 
> clarify.

If you have problems accessing the datatype from Lua or elsewhere, let me know.
There may be issues with the import process.

It's always cool to see that people use our data and our software!


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?

2016-11-28 Thread Daniel Kinzler

Am 28.11.2016 um 16:31 schrieb gnosygnu:
>> If you are also using the same software (Wikibase on MediaWiki), the XML 
>> dumps
>> should Just Work (tm). The idea of the XML dumps is that the "text" blobs are
>> opaque to 3rd parties, but will continue to work with future versions of
>> MediaWiki & friends (with a compatible configuration - which is rather 
>> tricky).
> 
> Not sure I follow. Even from a Wikibase on MediaWiki perspective, the
> XML dumps are still incomplete (since they're missing
> mainsnak.datatype).

The datatype is implicit, it can be derived from the property ID. You can find
it by looking at the Property page's JSON.

The XML dumps are complete by definition, since they contain a raw copy of the
primary data blob. All other data is derived from this. However, since they are
"raw", they are not easy to process by consumers, and we make no guarantees
regarding the raw data format.

We include the data type in the statements of the canonical JSON dumps for
convenience. We are planning to add more things to the JSON output for
convenience. That does not make the XML dumps incomplete.

You use case is special since you want canonical JSON *and* wikitext. I'm afraid
you will have to process both kinds of dumps.

> One line of the file specifically checks for datatype: "if datatype
> and datatype == 'commonsMedia' then". This line always evaluates to
> false, even though you are looking at an entity (Q38: Italy) and
> property (P41: flag image) which does have a datatype for
> "commonsMedia" (since the XML dump does not have "mainsnak.datatype").

That is incorrect. datatype will always be set in Lua, even if it is not present
in the XML. Remember that it is not present in the primary blob on Wikidata
either. Wikibase will look it up internally, from the wb_property_info table,
and make that information available to Lua.

When loading the XML file, a lot of secondary information is extracted into
database tables for this kind of use, e.g. all the labels and descriptions go
into the wb_terms table, property types go into wb_property_info, links to other
items go to page_links, etc.

Actually, you may have to run refreshLinks.php or rebuildall.php after doing the
XML import, I'm not sure which is needed when any more. But the point is: the
XML dump contains all information needed to reconstruct the content. This is
true for wikitext as well as for Wikibase JSON data. All derived information is
extracted upon import, and is made available via the respective APIs, including
Lua, just like on Wikidata.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?

2016-11-27 Thread Daniel Kinzler

Am 27.11.2016 um 01:15 schrieb gnosygnu:
> This is useful, but unfortunately it won't suffice. Wikidata also has
> pages which are wikitext (for example,
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Names). These
> wikitext pages are in the XML dumps, but aren't in the stub dumps nor
> the JSON dumps. I actually do use these Wikidata wikitext entries to
> try to reproduce Wikidata in its entirety. 

If you are also using the same software (Wikibase on MediaWiki), the XML dumps
should Just Work (tm). The idea of the XML dumps is that the "text" blobs are
opaque to 3rd parties, but will continue to work with future versions of
MediaWiki & friends (with a compatible configuration - which is rather tricky).


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] [wikicite-discuss] Entity tagging and fact extraction (from a scholarly publisher perspective)

2016-11-27 Thread Daniel Kinzler

Am 18.11.2016 um 22:12 schrieb Ruben Verborgh:
> In case you consider scenarios where clients perform federation,
> you might be interested to see that lightweight interfaces
> can outperform full SPARQL interfaces:
> http://linkeddatafragments.org/publications/jws2016.pdf#page=26

We are indeed planning to experiment with LDF, see
https://phabricator.wikimedia.org/T136358


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Determining Wikidata Usage in Wikipedia Pages

2016-11-27 Thread Daniel Kinzler

Am 26.11.2016 um 23:33 schrieb Andrew Hall:
>  1. In the “Wikidata entities used in this page” section, are the entities 
> used
> dependent on, for example, the logic of the templates through which they 
> are
> referenced? If entities are listed in this section, are they for sure 
> always
> coming from Wikidata?

Yes, *any* use is tracked and recorded, including accessing some part of the
entity from a conditional somewhere in the Lua code. And all entities come from
Wikidata -- we don't have any other Wikibase repo yet, and when we do, usage
will be tracked separately for that.


>  2. Sometimes “other (statements)” is specified in the “Wikidata entities used
> in this page” section. Is it possible to determine what those statements 
> are?

No, that information is not recorded. There is no way to find out without
tracing all templates, parameters, and Lua code. We may start tracking this in
the future, but it's a lot of data.

I'm sure we had a ticket for changiong this, but couldn't find it, so I made a
new one: https://phabricator.wikimedia.org/T151717


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?

2016-11-26 Thread Daniel Kinzler

Hi gnosygnu!

The JSON in the XML dumps is the raw contents of the storage backend. It can't
be changed retroactively, and re-encoding everything on the fly would be too
expensive. Also, the JSON embedded in the XML files is not officially supported
as a stable interface of Wikibase. The JSON format in the XML files can change
without notice, and you may encounter different representations even within the
same dump.

I recommend to use the JSON dumps, they contain our data in canonical form. To
avoid downloading redundant information, you can use one of the
wikidatawiki-20161120-stub-* dumps instead of the full page dumps. These don't
contain the actual page content, just meta-data.

Caveat: there is currently no dump that contains the JSON of old revisions of
entities in canonical form. You can only get them individually from
Special:EntityData, e.g.
<https://www.wikidata.org/wiki/Special:EntityData/Q23.json?oldid=30279>

HTH
-- daniel

Am 26.11.2016 um 02:13 schrieb gnosygnu:
> Hi everyone. I have a question about the Wikidata xml dump, but I'm
> posting this question here, because it looks more related to Wikidata.
> 
> In short, it seems that the "pages-articles.xml" does not include the
> datatype property for snaks. For example, the xml dump does not list a
> datatype for Q38 (Italy) and P41 (flag image). In contrast, the json
> dump does list a datatype of "commonsMedia".
> 
> Can this datatype property be included in future xml dumps? The
> alternative would be to download two large and redundant dumps (xml
> and json) in order to reconstruct a Wikidata instance.
> 
> More information is provided below the break. Let me know if you need
> anything else.
> 
> Thanks.
> 
> 
> 
> Here's an excerpt from the xml data dump for Q38 (Italy) and P41 (flag
> image). Notice that there is no "datatype" property
>   // 
> https://dumps.wikimedia.org/wikidatawiki/20161120/wikidatawiki-20161120-pages-articles.xml.bz2
>   "mainsnak": {
> "snaktype": "value",
> "property": "P41",
> "hash": "a3bd1e026c51f5e0bdf30b2323a7a1fb913c9863",
> "datavalue": {
>   "value": "Flag of Italy.svg",
>   "type": "string"
> }
>   },
> 
> Meanwhile, the API and the JSON dump lists a datatype property of
> "commonsMedia":
>   // https://www.wikidata.org/w/api.php?action=wbgetentities&ids=q38
>   // 
> https://dumps.wikimedia.org/wikidatawiki/entities/20161114/wikidata-20161114-all.json.bz2
>   "P41": [{
> "mainsnak": {
>   "snaktype": "value",
>   "property": "P41",
>   "datavalue": {
> "value": "Flag of Italy.svg",
> "type": "string"
>   },
>   "datatype": "commonsMedia"
> },
> 
> As far as I can tell, the Turtle (ttl) dump does not list a datatype
> property either, but this may be because I don't understand its
> format.
>   wd:Q38 p:P41 wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D .
>   wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D a wikibase:Statement,
>   wikibase:BestRank ;
> wikibase:rank wikibase:NormalRank ;
> ps:P41 
> <http://commons.wikimedia.org/wiki/Special:FilePath/Flag%20of%20Italy.svg>
> ;
> pq:P580 "1946-06-19T00:00:00Z"^^xsd:dateTime ;
> pqv:P580 wdv:204e90b1bce9f96d6d4ff632a8da0ecc .
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Determining Wikidata Usage in Wikipedia Pages

2016-11-24 Thread Daniel Kinzler

Am 23.11.2016 um 21:33 schrieb Andrew Hall:
> Hi,
> 
> I’m a PhD student/researcher at the University of Minnesota who (along with 
> Max
> Klein and another grad student/researcher) has been interested in 
> understanding
> the extent to which Wikidata is used in (English, for now) Wikipedia.
> 
> There seems to be no easy way to determine Wikidata usage in Wikipedia pages 
> so
> I’ll describe two approaches we’ve considered as our best attempts at solving
> this problem. I’ll also describe shortcomings of each approach.

There is two pretty easy ways, which you may not have found because they were
added only a couple of months ago:

You can look at the "page information" (action=info, linked from the sidebar),
e.g.
<https://en.wikipedia.org/w/index.php?title=South_Pole_Telescope&action=info>.
Near the bottom you can find "Wikidata entities used in this page".

The same information is available via an API module,
<https://en.wikipedia.org/w/api.php?action=query&prop=wbentityusage&titles=South_Pole_Telescope>.
See
<https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bwbentityusage>
for documentation.


These URLs will list all direct and indirect usages, and also indicate what part
or aspect of the entity was used.

HTH

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] BREAKING CHANGE: Quantity Bounds Become Optional

2016-11-04 Thread Daniel Kinzler

Hi all!

This is an announcement for a breaking change to the Wikidata API, JSON and RDF
binding, to go live on 2016-11-15. It affects all clients that process quantity
values.


As Lydia explained in the mail she just sent to the Wikidata list, we have been
working on improving our handling of quantity values. In particular, we are
making upper- and lower bounds optional: When the uncertainty of a quantity
measurement is not explicitly known, we no longer require the bounds to somehow
be specified anyway, but allow them to be omitted.

This means that the upperBound and lowerBound fields of quantity values become
optional in all API input and output, as well as the JSON dumps and the RDF 
mapping.

Clients that import quantities should now omit the bounds if they do not have
explicit information on the uncertainty of a quantity value.

Clients that process quantity values must be prepared to process such values
without any upper and lower bound set.


That is, instead of this

"datavalue":{
  "value":{
"amount":"+700",
"unit":"1",
"upperBound":"+710",
"lowerBound":"+690"
  },
  "type":"quantity"
},


clients may now also encounter this:

"datavalue":{
  "value":{
"amount":"+700",
"unit":"1"
  },
  "type":"quantity"
},


The intended semantics is that the uncertainty is unspecified if not bounds are
present in the XML, JSON or RDF representation. If they are given, the
interpretation is as before.


For more information, see the JSON model documentation [1]. Note that quantity
bounds have been marked as optional in the documentation since August. The RDF
mapping spec [2] has been adjusted accordingly.


This change is scheduled for deployment on November 15.

Please let us know if you have any comments or objections.

-- daniel


[1] https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON
[2] https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Quantity


Relevant tickets:
* <https://phabricator.wikimedia.org/T115269>

Relevant patches:
* <https://gerrit.wikimedia.org/r/#/c/302248>
*
<https://github.com/DataValues/Number/commit/2e126eee1c0067c6c0f35b4fae0388ff11725307>

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Stable Interface Policy: Database Schema as a stable API

2016-11-04 Thread Daniel Kinzler

I have updated the Stable Interface Policy according to the discusstion at

<https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Database_Schema_as_a_stable_API>

The diff is here:

<https://www.wikidata.org/w/index.php?title=Wikidata%3AStable_Interface_Policy&type=revision&diff=400924854&oldid=382163118>

-- daniel

Am 28.10.2016 um 17:59 schrieb Daniel Kinzler:
> Hi all!
> 
> I plan to add the wikibase (SQL) database schema as a stable interface.
> 
> Typically, a database schema is considered internal, but since we have tools 
> on
> labs that may rely on the current schema, breaking changes to the schema 
> should
> be announced as such. To address this, I plan to add the following paragraph 
> to
> the Stable Public APIs section:
> 
> The database schema as exposed on Wikimedia Labs is considered a stable
> interface. Changes to the available tables and fields are subject to the
> above notification policy.
> 
> In addition, I plan to add the following paragraph to the Extensibility 
> section:
> 
> In a tabular data representation, such as a relational database schema, 
> the
> addition of fields is not considered a breaking change. Any change to the
> interpretation of a field, as well as the removal of fields, are 
> considered
> breaking. Changes to existing unique indexes or primary keys are breaking
> changes; changes to other indexes as well as the addition of new unique
> indexes are not breaking changes.
> 
> If you have any thoughts ob objections, please let me know at
> <https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Database_Schema_as_a_stable_API>
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Stable Interface Policy: Database Schema as a stable API

2016-10-28 Thread Daniel Kinzler

Hi all!

I plan to add the wikibase (SQL) database schema as a stable interface.

Typically, a database schema is considered internal, but since we have tools on
labs that may rely on the current schema, breaking changes to the schema should
be announced as such. To address this, I plan to add the following paragraph to
the Stable Public APIs section:

The database schema as exposed on Wikimedia Labs is considered a stable
interface. Changes to the available tables and fields are subject to the
above notification policy.

In addition, I plan to add the following paragraph to the Extensibility section:

In a tabular data representation, such as a relational database schema, the
addition of fields is not considered a breaking change. Any change to the
interpretation of a field, as well as the removal of fields, are considered
breaking. Changes to existing unique indexes or primary keys are breaking
changes; changes to other indexes as well as the addition of new unique
indexes are not breaking changes.

If you have any thoughts ob objections, please let me know at
<https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Database_Schema_as_a_stable_API>

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Acquiring general knowledge from Wikidata

2016-10-25 Thread Daniel Kinzler

Am 25.10.2016 um 17:27 schrieb Federico Leva (Nemo):
> As far as I know, an axiom by definition can't be false. What definition are 
> you
> using? Maybe some jargon specific to this research field?

An axiom is always true in the context of the formal model it helps define. But
if that model corresponds to something in the real world, the axium may well
found to be "false" when applied there.

Say you have an axiom that says "all humans are born with two legs"; this is
then (by definition) true in your model, but may not be an accurate modelling of
the real world, since very rarely, humans are born with more or less than two 
legs.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Greater than 400 char limit for Wikidata string data types

2016-10-08 Thread Daniel Kinzler

That was discussed and declined a while ago, see
<https://phabricator.wikimedia.org/T126862>. Though I think the proposed
realization was presentational rather than functional. I'll have to re-read the
discussion, though.

Am 08.10.2016 um 12:07 schrieb Thomas Douillard:
> Probably a silly question but ... did you all consider creating a datatype for
> molecue representation ? This seem to be a very similar usecase than 
> mathematica
> formula. Essentially we're not dealing with a raw string but a representation 
> of
> molecule formulas, with its own encoding ...
> 
> Changing the limit seem to be a poor workaround to a dedicated datatype - 
> nobody
> seems to have found a relevant usecase and it seem to me that we're 
> essentially
> abusing strings for storing blobs ...
> 
> 2016-10-08 11:33 GMT+02:00 Egon Willighagen  <mailto:egon.willigha...@gmail.com>>:
> 
> 
> 
> On Sat, Oct 8, 2016 at 11:28 AM, Lydia Pintscher
> mailto:lydia.pintsc...@wikimedia.de>> 
> wrote:
> 
> On Sat, Oct 8, 2016 at 11:23 AM, Egon Willighagen
> mailto:egon.willigha...@gmail.com>> 
> wrote:
> > Ah, those numbers are for 
> https://www.wikidata.org/wiki/Property:P234
> <https://www.wikidata.org/wiki/Property:P234> ...
> 
> External identifier then. Cool. And for string like in
> https://www.wikidata.org/wiki/Property:P233
> <https://www.wikidata.org/wiki/Property:P233>? Sebastian's initial 
> email 
> 
> says 1500 to 2000. Is this still a good number after this discussion?
> 
> 
> Yes, that would cover more than 99.9% of all InChIs in PubChem. (See
> Sebastian's reply earlier in this thread.)
> 
> Egon
> 
> -- 
> E.L. Willighagen
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw 
> <http://se.linkedin.com/in/egonw>
> Blog: http://chem-bla-ics.blogspot.com/ 
> <http://chem-bla-ics.blogspot.com/>
> PubList: http://www.citeulike.org/user/egonw/tag/papers
> <http://www.citeulike.org/user/egonw/tag/papers>
> ORCID: -0001-7542-0286
> ImpactStory: https://impactstory.org/u/egonwillighagen
> <https://impactstory.org/u/egonwillighagen>
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> <https://lists.wikimedia.org/mailman/listinfo/wikidata>
> 
> 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Elevation

2016-09-28 Thread Daniel Kinzler

Am 28.09.2016 um 14:13 schrieb Markus Bärlocher:
> Da es um grundlegende Modellierungsfragen geht - wer kann hier helfen?

"Die Community"...

> Ich brauche ein System, um in WD geografische Höhen zu modellieren.
> 
> Eine geografische Höhenangabe besteht aus:
> 1. Zahl (127,53)
> 2. Einheit (Meter, feet)
> 3. Höhenreferenzebene (NN, NHN, LAT, MSL, MHWS, ...)
> 
> Wenn eine der drei Angaben fehlt, ist die Aussage unbrauchbar.

Die Referenzebene kann wie gersagt als Qualifier angegeben werden. Es wäre
sinnvoll, die Property "Elevation over sea level" entsprechend umzudefinieren
oder zu ersetzen. Eine andere Lösung fällt mir nicht ein. Es sei denn, es geht
um "Lichte Höhe", dann kannst du P2793 benutzen. Du brauchst aber immernoch eine
Property für "Reference level". Ich glaube, die gibt es noch nicht.

> Sinnvoll wäre zusatzlich eine Angabe zu:
> 4. Genauigkeit
> 
> Verstehe ich Dich richtig?
> Du schlägst vor, die Genauigkeit hinter die Zahl zu schreiben?
> und beides in einen String zusammenzuführen?
> also 1., 2. und 4. in ein Feld zu packen?
> 
> Beispiel: 123,53±0,005m

Ja, genau so. Oder so ähnlich - bei der Eingabe muss die Einheit momentan noch
separat ausgewählt werden.

> Dann müsste man jede Zahl erst auseinanderdröseln
> um sie in einer Tabelle darstellen und numerisch sortieren zu können?

Nein, das ist ja kein Text-Feld. Wert, Genauigkeit, und Einheit werden separate
gespeichert, dafür haben wir "data types." Details findest du hier:
<https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#quantity> und hier
<https://www.mediawiki.org/wiki/Wikibase/DataModel#Quantities>.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Elevation

2016-09-28 Thread Daniel Kinzler

Am 27.09.2016 um 23:14 schrieb Info WorldUniversity:
> Hi Daniel, Markus and Wikidatans, 
> 
> Thanks for your interesting "modeling elevation with Wikidata" conversation. 
> 
> Daniel, in a related vein and conceptually, how would you model elevation 
> change
> over time (e.g. in a Google Street View/Maps/Earth with TIME SLIDER,
> conceptually, for example) with Wikidata, building on the example you've 
> already
> shared? 

You would use the "point in time" qualifier. We use this a lot with population
data, see for instance <https://www.wikidata.org/wiki/Q64#P1082>.

> Would there be a wikidata Q-item for all 46 sub levels, for example?

That's a question of desirable modelling granularity. I would suppose that for
troy, we would have one item per sub-level, since it's such a famous site. But
we would probably not have every sub-level of every archeological excavation.
This is always a question of balance, and always a matter of debate.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Elevation

2016-09-28 Thread Daniel Kinzler

Am 27.09.2016 um 22:21 schrieb Markus Bärlocher:
>> The "elevation" property we have (P2044) is defined to refer to NN
> 
> It is not a good idea, to define 'elevation'
> like it is "defined" in P2044:
> there are hundreds of reference-levels (not only NN)...

Yes, I agree. But that's how it currently is. You can start a discussion about
it on the property's talk page, or on the project chat page, or some other
appropriate place.

>> Then you could express something like "elevation: 28.3m;
> 
> In WD there is a confusion between altitude and elevation?
> (may be in American and British English?
> or geographic and aviation and astronomy?)

As far as I know, WD uses "altitude" only as an alias of "elevation". I'm not a
native speaker of English, but as far as I know, you can use "altitude" as well
as "elevation" when describing a geographical point. The definitions of the
corresponding items (Q190200 and Q2633778) reflect that, and so do the
definitions in Merriam Webster. However, "elevation" seems to be used only for
fixed places - a plane has altitude, not elevation - so that's a reason not to
merge the two items.

However, if I understood correctly, what you are looking for is actually not
elevation, but "clearance" ("Lichte Höhe"):
<https://www.wikidata.org/wiki/Q1823312>. Interestingly, there is also 
Q2446632...

Oh, we actually do have a property for that! P2793 is the "distance between
surface and bottom of a bridge deck". That's exactly what you need, no?


> But this is a combination of unit and reference-level:
> 'm ü.M.'
> 
> We should not mix or confound this modellings...
>
> What will be the WD-way?
> (you should discuss this with a geodetic specialist...!)

Indeed :) And a civil engineer. But for starters, maybe Aude has some thoughts
on this.

> 
> Additionally we need an expression for 'accuracy' and 'source':
> If the hight unit is 'meter' and the source value is in 'feet',
> the new value could have a lot more/less digits than the source,
> but no better/worse accuracy...

Sources can be given for any statement. Accuracy can be qiven for any quantity
value, just enter 32+-2m. If the source gives the number in feet, please enter
it in feet in Wikidata, and leave the conversion to the software (we are just in
the process of adding support for unit conversion)


HTH,
Daniel

PS: I'm a software guy. I know how Wikibase and MediaWiki work, and I know the
underlying data model of Wikidata quite well. But I do not know all the
properties and conventions, and I may not be aware of the best place to discuss
these things. So please don't rely on my opinion about modeling on Wikidata too
much.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Elevation

2016-09-27 Thread Daniel Kinzler

So you want to e.g. give the height of a bridge above the "mean high water
spring" level of the river it crosses?

You wouldn't use a unit for that, but a qualifier. The unit would be meter (or
feet or whatever).

The "elevation" property we have (P2044) is defined to refer to NN, so it's no
good for your purpose. To model what you want nicely, you would need a more
general "elevation" property, and a "reference level" property to use as a
qualifier. Then you could express something like "elevation: 28.3m;
reference-level: Q6803625".

I'm sure there are other options, but I see no good option that would be
possible with the properties I know.

Anyway, this is really a modelling question, and it can't really be solved with
units.

Am 27.09.2016 um 20:26 schrieb Markus Bärlocher:
> Hallo Daniel,
> 
> nein, ich suche nicht einen WP-Artikel über MHWS,
> (diesen habe ich nur verlinkt als Erklärung)
> 
> sondern eine Einheit/unit,
> um MHWS als Bezugshorizont für geografische Höhen zu beschreiben.
> 
> MHWS wird verwendet, um Brückendurchfahrtshöhen über Wasser zu
> definieren, sowie für die geografische Höhe von Leuchtfeuern.
> 
> Mit herzlichem Gruss,
> Markus
> 
> 
> Am 27.09.2016 um 19:28 schrieb Daniel Kinzler:
>> Am 27.09.2016 um 19:10 schrieb Markus Bärlocher:
>>> I look for this:
>>> "Elevation in metres above 'mean high water spring' level."
>>>
>>> Which means the geographic hight above MHWS:
>>> https://en.wikipedia.org/wiki/Mean_high_water_spring
>>
>> By clicking on "Wikidata Item" in the sidebar of that page, I get to
>> https://www.wikidata.org/wiki/Q6803625 ("highest level that spring tides 
>> reach
>> on average over a period of time")
>>
>> Is that what you need?
>>
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Elevation

2016-09-27 Thread Daniel Kinzler

Am 27.09.2016 um 19:10 schrieb Markus Bärlocher:
> I look for this:
> "Elevation in metres above 'mean high water spring' level."
> 
> Which means the geographic hight above MHWS:
> https://en.wikipedia.org/wiki/Mean_high_water_spring

By clicking on "Wikidata Item" in the sidebar of that page, I get to
https://www.wikidata.org/wiki/Q6803625 ("highest level that spring tides reach
on average over a period of time")

Is that what you need?

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-21 Thread Daniel Kinzler

Am 21.09.2016 um 19:23 schrieb Eric Scott:
> A substantial amount of work in the LOD community seems to have gone into 
> Ontolex:
> 
> https://www.w3.org/community/ontolex/wiki/Final_Model_Specification
> 
> Is there any concern with aligning WD's model to this standard?

Thanks for pointing to this!

From a first look, the models seem to roughly align:

What we call a "Lexeme" corresponds to a "Lexical Entry" in ontolex.
What we call a "Form" corresponds to a "Form" in ontolex.
What we call a "Sense" corresponds to a "Lexical Sense & Reference" in ontolex,
although in ontolex, a reference to a Concept is required, while in our model
that reference would be optional, but a natural language gloss is required.

So the models seem to match fine on a conceptual level. Perhaps someone with
more expertise in RDF modeling can provide a more detailed analysis.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-19 Thread Daniel Kinzler

Am 16.09.2016 um 20:46 schrieb Thad Guidry:
> Daniel,
> 
> I wasn't trying to help solve the issues - I'll be quite now :)
> 
> I was helping to expose one of your test cases :)

Ha, sorry for sounding harsh, and thanks for pointing me to "product"! It's a
good test case indeed.

> 'product' is a lexeme - a headword - a basic unit of meaning that has a 'set 
> of
> forms' and those have 'a set of definitions'

In the current model, a Lexeme has forms and senses. Forms don't have senses
directly, the meanings should apply to all forms. This means lexemes have to be
split with higher granularity:

* product (English noun) would be one lexeme, with "products" being the plural
form, and "product's" the genitive, and "products'" the plural genitive. Sense
include the ones you mentioned.
* (to) produce (English verb) would be another lexeme, with forms like
"produces", "produced", "producing", etc, and senses meaning "to create", "to
show", "to make available", etc
* production (English noun) would be another lexeme, with other forms and 
senses.
* produce (English noun) would be another
* producer (English noun) would be another
* produced (English adjective) another
etc...

These lexemes can be linked using some kind of "derived from" statements.

> But a thought just occured to me...
> A. In order to model this perhaps would be to have those headwords stored in
> Wikidata.  Those headwords ideally would not actually be a Q or a P ... but 
> what
> about instead ... L  ?  Wrapping the graph structure itself ?  Pros / Cons ?

That's the plan, yes: Have lexemes (L...) on wikidata, which wrap the structure
of forms and senses, and has statements for the lexeme, as well as for each form
and each sense.

We don't currently plan a "super-structure" for wrapping derived/related lexemse
(product, produce, production, etc). They would just be inter-linked by 
statements.

> B.  or do we go with Daniel's suggestion of linking out to headwords and not
> actually storing them in Wikidata ?  Pros / Cons ?

The link I suggest is between items (Q...) and lexemes (L...), both on Wikidata.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-16 Thread Daniel Kinzler

Quick clarification:

Am 15.09.2016 um 17:40 schrieb Jan Berkel:
> The pdf mentions 4 new entity types: Lexeme, Statement, Form, Embedded (?).

"Embedded" isn't a separate type, it refers to the fact that Senses and Forms
are stored on the same page as "their" Lexeme. "Statement" isn't an entity, I
assume you meant to write "Sense".

>  Curious, was the existing data model not flexible enough?

It was not expressive enough, no; it would be possible to use items to model
lexemes, but it would be very annoying and complicated. You would need separate
items for each form and sense, and need to keep track of them for deletion,
undeletion, etc.

> Will these new entities be restricted to the usage in a lexicographical 
> context,
> i.e. Wiktionary? 

It will also be accessible from Wikipedia and other wikis.

> How will they fit into the existing data model, will there be
> links from existing Wikidata items to the new entities? (i.e. how will 
> Wikidata
> benefit from the new data?)

Yes, there will be cross-linking.

> I imagine in an integrated Wikidata/Wiktionary world "content" and code lives 
> in
> various places, and we'll have a range of automated processes to copy things
> back and forth, and to automatically create new entries derived from existing 
> ones?

It would be transcluded and generated, like with templates and and Lua. Not so
much copied, with bots.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-16 Thread Daniel Kinzler

Am 16.09.2016 um 20:11 schrieb Thad Guidry:
> Denny,
> 
> I would suggest to use https://en.wiktionary.org/wiki/product as that strawman
> proposal.  Because it has 2 levels of Senses.
>   3. Anything that is produced (contains 6 sub-senses)

Modelling sub-senses is a completely different can of worms. The proposed model
doesn't allow this directly (we try to avoid recursive structures), but it can
be done using statements.

Your example doesn't really say anything about how lexemes could be connected to
items as labels/aliases, which is, i believe, what Gerard and Denny were 
discussing.


My usage of "Sense" and "From" follows
<https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2013-08>
which in turn follows the LEMON model <http://lemon-model.net/>.

Synsets are not directly modeled, but it's possible to construct them via
statements.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Greater than 400 char limit for Wikidata string data types

2016-09-16 Thread Daniel Kinzler

Am 16.09.2016 um 19:38 schrieb Denny Vrandečić:
> Markus' description of the decision for the limit corresponds with mine. I 
> also
> think that this decision can be revisited. I would still advice for caution, 
> due
> to technical issues, but I am sure that the development team will make a
> well-informed decision on this. It would be sad if valid usecases could not be
> supported due to that.

I agree, but re-considering this will have to wait until we have a better
solution for storing terms. The current mechanism, the wb_terms table, is a
massive performance bottleneck, and stuffing more data in there makes me very
uncomfortable.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-16 Thread Daniel Kinzler

Am 16.09.2016 um 19:41 schrieb Denny Vrandečić:
> Yes, there should be some connection between items and lexemes, but I am still
> hazy about details on how exactly this should look like. If someone could
> actually make a strawman proposal, that would be great.
> 
> I think the connection should live in the statement space, and not be on the
> level of labels, but that is just a hunch. I'd be happy to see proposals 
> incoming.

My thinking is this:

On some Sense of a Lexeme, there is a Statement saying that this Sense refers to
a given concept (Item). If the property for stating this is well-known, we can
track the Sense-to-Item relationship in the database. We can then automatically
show the lexeme's lemma as a (pseudo-)alias on the Item, and perhaps also use it
(and maybe all forms of the lexeme!) for indexing the item for search.  So:

  from ( Lexeme - Sense - Statement -> Item )
  we can derive ( Item -> Lexeme - Forms )

In the beginning of Wikidata, I was very reluctant about the software knowing
about "magic" properties. Now I feel better about this, since wikidata
properties are established as a permanent vocabulary that can be used by any
software, including our own.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Stable interfaces policy updated

2016-09-14 Thread Daniel Kinzler

The stable interface policy has now been updated, see
<https://www.wikidata.org/w/index.php?title=Wikidata%3AStable_Interface_Policy&type=revision&diff=376348194&oldid=369006368>


Am 13.09.2016 um 16:58 schrieb Daniel Kinzler:
> Tomorrow I plan to apply the following update to the Stable Interface Policy:
> 
> https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Proposed_change_to_to_the_.22Extensibility.22_section
> 
> Please comment there if you have any objections.
> 
> Thanks!


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-14 Thread Daniel Kinzler

Am 14.09.2016 um 10:51 schrieb Léa Lacroix:
> /- What else can provide wikidata to wiktionary?/
> Machine-readable data will allow users to create new tools, useful for 
> editors,
> based on the communities' needs. By helping the different communities
> (Wiktionaries and Wikidata) working together on the same project, we expect a
> growth of the number of people editing the lexicographical data, providing 
> more
> review and a better quality of the data. Finally, when centralized and
> structured, the data will be easily reusable by third parties, other websites 
> or
> applications... and give a better visibility of the volunteers' work.

Here are some examples of things that will become possible with the new 
structure:

* the fact that the English word "sleeper" may refer to a railway tie, and in
which regions this is the case, only has to be entered once, not separately in
each Wiktionary.

* the fact that "Stuhl" is the German translation of (a specific sense of) the
English word "chair" only has to be entered once, not separately in each 
Wiktionary.

* by connecting lexeme-sense to concepts (items), it will become possible to
automatically search for potential synonyms and translations to other languages.

* by providing a statement defining the morphological class of a lexeme, it
becomes possible to automatically generate derived forms for display and search

* different representations (spellings, scripts) of a lexeme can be covered by a
single entry, information about word senses does not have to be repeated.

* the search interface will know about languages and word types, so you can
search specifically for "french verb dormir" (or perhaps more technical "lang:fr
a:Q24905 dormir")

* Similarly, you can search for or filter by epoch, region, linguistic
convention or methodology, etc.


> - Will editing wiktionary change?
> Yes, changes will happen, but we're working on making editing Wiktionary
> easier. Soon as we can provide some mockups, we will share them with you for
> collecting feedbacks.

The question is if you consider editing wikitext with complex nested templates
"easy" or not. With wikidata, editing would be form-based, with input fields and
suggestions. This makes it a lot easier especially for new editors. And even for
experienced editors, I think it's more convenient for editing individual bits of
information.

The form-based approach is less convenient when you want to enter a lot of
information at once. The solution is to identify the use cases for this, and
provide a specialized interface for that use case. This does not have to depend
on Wikibase developers, it can also be done by wiki users using gadgets,
Labs-based tools, or even bots.


> Because Wikidata is a multilingual project, we already have to deal with the
> language issue, and we hope that with the increase of the numbers of editors
> coming from Wikidata and Wiktionaries, it will become easier to find people 
> with
> at least one common language to communicate between the different projects.

Interestingly, we found that on wikidata there is rarely a conflict about
whether a statement about an item should say X or Y, e.g. whether Chelsea
Manning's gender should be given as "transgender female" or just "female" or
even "male". The conflict does not arise because you can and should simply add
all three, and use qualifiers and source references to specify who claimed which
of these, and for which period of time.

Long discussions do take place about the overall organization of information on
wikidata, about which properties to have and how to use them, about whether
substances like "ethanol" should be considered subclasses or instance of classes
like "alcohol".

I agree however that cross-lingual discussions are indeed an issue, and finding
techniques and strategies to help with communication between  the speakers of
different languages will be a challenge. But isn't the Wiktionary community
perfectly equipped for just that challenge? Isn't it just the crowd you would
ask if you had to solve a problem like this? I would (along perhaps with the
folks from translatewiki.net).


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-13 Thread Daniel Kinzler

Am 13.09.2016 um 15:37 schrieb Gerard Meijssen:
> Hoi,
> You assume that it is not good to have lexicological information in our 
> existing
> items. With Wiktionary support you bring such information on board. It would 
> be
> really awkward when for every concept there has to be an item in two 
> databases.

It will be two namespaces in the same project.

But we will not duplicate items. The proposed structure is not concept-centered
like Omegawiki is. It will be centered about lexemes, like Wiktionary is, but
with a higher level of granularity (a lexeme corresponds to one "morphological"
section on a Wiktionary page).

> Why is there this problem with lexicologival information and how will the
> current data be linked to the future "Wiktionary-data" information if there 
> are
> to be two databases?

Because "bumblebee"  "noun" conflicts with "bumblebee"
 "insect". They can't both be true for the same thing, because
nouns are not insects. One is true for the word, the other is true for the
concept. So they need to be treated separately.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-13 Thread Daniel Kinzler

Am 13.09.2016 um 17:16 schrieb Gerard Meijssen:
> Hoi,
> The database design for OmegaWiki had a distinction between the concept and 
> all
> the derivatives for them.

Wikidata will have Lexemes and their Forms and Senses.

> So bumblebee is more complex than just "instance of" noun. It is an English
> noun. "Hommel" is connected as a Dutch noun for the same concept and "hommels"
> is the Dutch plural...

Wikidata would have a Lexeme for "bumblebee" (english noun) and one for "Hommel"
(dutch noun). Both would have a sense that would describe them as a flying
insect (and perhaps other word senses, such as Q1626135, a creater on the moon).
The senses that refer to the flying insect would be considered translations of
each other, and both senses would refer to the same concept.

So "bumblebee" (insect) is a translation of "Hommel" (insect), and both refer to
the genus Bombus (Q25407). "Hommel" (creater) would share the morphology of
"Hommel" (insect), as it has the same forms (I assume), but it won't share the
translations.

Having lexeme-specific word-senses avoids the loss of connotation and nuance
that you get when you force words of different languages on a shared meaning.
The effect of referring to the same concept can still be achieved via the
reference to a concept (item).

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Proposed update to the stable interfaces policy

2016-09-13 Thread Daniel Kinzler

Tomorrow I plan to apply the following update to the Stable Interface Policy:

https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Proposed_change_to_to_the_.22Extensibility.22_section

Please comment there if you have any objections.

Thanks!

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Announcing the Wikidata Stable Interface Policy

2016-08-23 Thread Daniel Kinzler

Hello all!

After a brief period for final comments (thanks everyone for your input!), the
Stable Interface Policy is now official. You can read it here:

<https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy>

This policy is intended to give authors of software that accesses Wikidata a
guide to what interfaces and formats they can rely on, and which things can
change without warning.

The policy is a statement of intent given by us, the Wikidata development team,
regarding the software running on the site. It does not apply to any content
maintained by the Wikidata community.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Breaking change in JSON serialization?

2016-08-16 Thread Daniel Kinzler

Am 11.08.2016 um 23:12 schrieb Peter F. Patel-Schneider:
> Until suitable versioning is part of the Wikidata JSON dump format and
> contract, however, I don't think that consumers of the dumps should just
> ignore new fields.

Full versioning is still in the future, but I'm happy that we are in the process
of finalizing a policy on stable interfaces, including a contract regarding
adding fields:
<https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy>.
Please comment on the talk page.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Policy on Interface Stability: final feedback wanted

2016-08-16 Thread Daniel Kinzler

Hello all,

repeated discussions about what constitutes a breaking change has prompted us,
the Wikidata development team, to draft a policy on interface stability. The
policy is intended to clearly define what kind of change will be announced when
and where.

A draft of the policy can be found at

 <https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy>

Please comment on the talk page.

Note that this policy is not about the content of the Wikidata site, it's a
commitment by the development team regarding the behavior of the software
running on wikidata.org. It is intended as a reference for bot authors, data
consumers, and other users of our APIs.

We plan to announce this as the development team's official policy on Monday,
August 22.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Render sparql queries using the Histropedia timeline engine

2016-08-11 Thread Daniel Kinzler

Hi Navino!

Thank you for your awesome work!

Since this has caused some confusion again recently, I want to caution you about
a major gotcha regarding dates in RDF and JSON: they use different conventions
to represent years BCE. I just updated our JSON spec to reflect that reality,
see <https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#time>.

There is a lot of confusion about this issue throughout the linked data web,
since the convention changed between XSL 1.0 (which uses -0044 to represent 44
BCE, and -0001 to represent 1 BCE) and XSL 1.1 (which uses -0043 to represent 44
BCE, and + to represent 1 BCE). Our JSON uses the traditional numbering (1
BCE is -0001), while RDF uses the astronomical numbering (1 BCE is +).

Yay, fun.

Am 10.08.2016 um 21:49 schrieb Navino Evans:
> Hi all,
> 
>  
> 
> At long last, we’re delighted to announce you can now render sparql queries
> using the Histropedia timeline engine \o/
> 
> 
> Histropedia WikidataQuery Viewer
> <http://histropedia.com/showcase/wikidata-viewer.html>


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Discussion on graph databases for WIkipedia: applications, volunteers, and stack design recommendations

2016-08-08 Thread Daniel Kinzler

I recommend you have a look at the SWEBLE project <http://sweble.org/>, at least
for the parsing. They basically represent all of Wikipedia (and potentially all
Wikipedias together) as one huge parse tree, using an XML database.

The website doesn't have much details, but they are building some interesting
projects on top of this. Best contact Dirk Riehle directly,
<https://osr.cs.fau.de/people/members/riehle-dirk/>.

Am 01.08.2016 um 20:38 schrieb Ian Seyer:
> Full disclosure: I am the creator of the Project Grant application
> for Arc.heolo.gy <http://arc.heolo.gy/>, located
> here: https://meta.wikimedia.org/wiki/Grants:Project/Arc.heolo.gy
> 
> I hope for this to be a general discussion on potential applications,
> criticisms, questions, technological recommendations, and community discussion
> about a graph representation of Wikipedia.
> 
> Currently, the project has a live Neo4j Graph database built and parsed from a
> download of the English language Wikipedia from April. I have temporarily 
> hosted
> the database instance both on my local machine and a SoftLayer server provided
> under a temporary entrepreneur credit.
> 
> My goal is two fold.
> On the backend: refine the parsing algorithm (I am getting some incorrect
> relationships in the database), automate the parsing so that it updates the
> database frequently, expand language support, and perform semantic parsing to
> weight individual relationships to strengthen the ability to filter out
> extraneous relationships.
> On the frontend: I have done little to zero work here beyond pure
> conceptualization. I would hope to use an asynchronous front-end javascript
> framework to build both a 2d (d3) and 3d (webGL) interface to be able to 
> explore
> the database with a high amount of control and ease.
> 
> If any of you would like to access the database for exploration, please 
> contact
> me privately and I will give you credentials.
> 
> Any recommendations on parsing, hosting, visualization, or otherwise are
> appreciated. Endorsements and Volunteers are also highly appreciated!
> 
> p.s. I am new to directly engaging with the Wiki community, and if I committed
> some faux pas in starting this thread please let me know and I will do my best
> to correct it.
> -- 
> 　　　╭╮
> 　╭╮┃┃
> 　　　╭╮　　╭╮╭╮
> 　　　┃┃　　╭╮　　┃╰╯╰╯┃┃╰
> 　╭╮┃┃╭╮┃┃╭╮┃╰╯
> 　╭╮　　┃╰╯┃┃╰╯
> 　┃┃╭╮┃╰╯┃┃　　╰╯
> ╮┃╰╯┃┃　　╰╯
> ╰╯　　┃┃
> ╰╯
> 
> 
> _______
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata query performance paper

2016-08-07 Thread Daniel Kinzler

Hi Aidan!

Thank you for this very interesting research!

Query performance was of course on of the key factors for selecting the
technology to use for the query services. However, it was only one among several
more. The Wikidata use case is different from most common scenarios in some
ways, for instance:

* We cannot optimize for specific queries, since users are free to submit any
query they like.
* The data representation needs to be intuitive enough for (thenically inclined)
casual users to grasp and write queries.
* The data doesn't hold still, it needs to be updated continuously, mutliple
times per second.
* Our data types are more complex than usual - for instance, we suppor tmultiple
calendar models fro dates, and not only values but also different accuracies up
to billions of years; we use "quantities" with unit and uncertainty instead of
plain numbers, etc.

My point is that, if we had a static data set and a handful of known queries to
optimize for, we could have set up a relational or graph database that would be
far more performant than what we have now. The big advantage of Blazegraph is
its felxibility, not raw performance.

It might be interesting to you to know that we initially started to implement
the query service against a graph database, Titan - which was discontinued while
we were still getting up to speed. Luckily this happened early on, it would have
been quite painful to switch after we had gone live.

-- daniel

Am 06.08.2016 um 18:19 schrieb Aidan Hogan:
> Hey all,
> 
> Recently we wrote a paper discussing the query performance for Wikidata,
> comparing different possible representations of the knowledge-base in Postgres
> (a relational database), Neo4J (a graph database), Virtuoso (a SPARQL 
> database)
> and BlazeGraph (the SPARQL database currently in use) for a set of equivalent
> benchmark queries.
> 
> The paper was recently accepted for presentation at the International Semantic
> Web Conference (ISWC) 2016. A pre-print is available here:
> 
> http://aidanhogan.com/docs/wikidata-sparql-relational-graph.pdf
> 
> Of course there are some caveats with these results in the sense that perhaps
> other engines would perform better on different hardware, or different styles 
> of
> queries: for this reason we tried to use the most general types of queries
> possible and tried to test different representations in different engines (we
> did not vary the hardware). Also in the discussion of results, we tried to 
> give
> a more general explanation of the trends, highlighting some 
> strengths/weaknesses
> for each engine independently of the particular queries/data.
> 
> I think it's worth a glance for anyone who is interested in the
> technology/techniques needed to query Wikidata.
> 
> Cheers,
> Aidan
> 
> 
> P.S., the paper above is a follow-up to a previous work with Markus Krötzsch
> that focussed purely on RDF/SPARQL:
> 
> http://aidanhogan.com/docs/reification-wikidata-rdf-sparql.pdf
> 
> (I'm not sure if it was previously mentioned on the list.)
> 
> P.P.S., as someone who's somewhat of an outsider but who's been watching on 
> for
> a few years now, I'd like to congratulate the community for making Wikidata 
> what
> it is today. It's awesome work. Keep going. :)
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Breaking change in JSON serialization?

2016-08-05 Thread Daniel Kinzler

Am 05.08.2016 um 17:34 schrieb Peter F. Patel-Schneider:
> So some additions are breaking changes then.   What is a system that consumes
> this information supposed to do?  If the system doesn't monitor announcements
> then it has to assume that any new field can be a breaking change and thus
> should not accept data that has any new fields.

The only way to avoid breakage is to monitor announcements. The format is not
final, so changes can happen (not just additions, but also removals), and then
things will break if they are unaware. We tend to be careful and conservative,
and announce any breaking changes in advance, but do not guarantee full
backwards compatibility forever.

The only alternative is a fully versioned interface, which we don't currently
have for JSON, though it has been proposed, see
<https://phabricator.wikimedia.org/T92961>.

> I assume that you are referring to the common practice of adding extra fields
> in HTTP and email transport and header structures under the assumption that
> these extra fields will just be passed on to downstream systems and then
> silently ignored when content is displayed.

Indeed.

> I view these as special cases
> where there is at least an implicit contract that no additional field will
> change the meaning of the existing fields and data.

In the name of the Robustness Principle, I would consider this the normal case,
not the exception.

> When such contracts are
> in place systems can indeed expect to see additional fields, and are permitted
> to ignore these extra fields.

Does this count?
<https://mail-archive.com/wikidata-tech@lists.wikimedia.org/msg00902.html>

> Because XML specifically states that the order of attributes is not
> significant.  Therefore changes to the order of XML attributes is not changing
> the encoding.

That's why I'm proposing to formalize the same kind of contract for us, see
<https://phabricator.wikimedia.org/T142084>.

> Here is where I disagree.  As there is no contract that new fields in the
> Wikidata JSON dumps are not breaking, clients need to treat all new fields as
> potentially breaking and thus should not accept data with unknown fields.

While you are correct that there is no formal contract yet, the topic had been
explicitly discussed before, in particular with Markus.

> I say this for any data, except where there is a contract that such additional
> fields are not meaning-changing.

Quote me on it:

For wikibase serializations, additional fields are not meaning changing. Changes
to the format or interpretation of fields will be announced as a breaking 
change.

>> Clients need to be prepared to encounter entity types and data types they 
>> don't
>> know. But they should also allow additional fields in any JSON object. We
>> guarantee that extra fields do not impact the interpretation of fields they 
>> know
>> about - unless we have announced and documented a breaking change.
> 
> Is this the contract that is going to be put forward?  At some time in the not
> too distant future I hope that my company will be using Wikidata information
> in its products.  This contract is likely to problematic for development
> groups, who want some notion how long they have to prepare for changes that
> can silently break their products.

This is indeed the gist of what I want to establish as a stability policy.
Please comment on <https://phabricator.wikimedia.org/T142084>.

I'm not sure how this could be made less problematic. Even with a fully
versioned JSON interface, available data types etc are a matter of
configuration. All we can do is announce such changes, and advise consumers that
they can safely ignore unknown things.

You raise a valid point about due notice. What do you think would be a good
notice period? Two weeks? A month?


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Breaking change in JSON serialization?

2016-08-05 Thread Daniel Kinzler

Am 05.08.2016 um 15:02 schrieb Peter F. Patel-Schneider:
> I side firmly with Markus here.
> 
> Consumers of data generally cannot tell whether the addition of a new field to
> a data encoding is a breaking change or not.

Without additional information, they cannot know, though for "mix and match"
formats like JSON and XML, it's common practice to assume that ignoring
additions is harmless.

In any case, we had communicated before that we do not consider the addition of
a field a breaking change. It only becomes a breaking change when it impacts the
interpretation of other fields. In which case we would announce it well in 
advance.

> Given this, code that consumes
> encoded data should at least produce warnings when it encounters encodings
> that it is not expecting and preferably should refuse to produce output in
> such circumstances. 

Depends on the circumstances. For a web browser for example, this would be very
annoying behavior. Nearly all websites would be unusable. Similarly, most email
would become unreadable if mail clients would be that strict.

> Producers of data thus should signal in advance any
> changes to the encoding, even if they know that the changes can be safely 
> ignored.

I disagree on "any". For example, do you want announcements about changes to the
order of attributes in XML tags? Why? In case someone uses a regex to process
the XML? Should you not be able to rely on your clients conforming the to XML
spec, which says that the order of attributes is undefined?

In the case at hand (adding a field), it would have been good to communicate it
in advance. But since it wasn't tagged as "breaking", it slipped through. We are
sorry for that. Clients should still not choke on an addition like this.

> I would view software that consumes Wikidata information and silently ignores
> fields that it is not expecting as deficient and would counsel against using
> such software.

Is this just for Wikidata, or does that extend to other kinds of data too? Why,
or why not?

By definition, any extensible format or protocol (HTTP, SMTP, HTML, XML, XMPP,
IRC, etc) can contain parts (headers, elements, attributes) that the client does
not know about, and should ignore. Of course, the spec will tell clients where
to expect and allow extra bits. That's why I'm planning to put up a document
saying clearly what kinds of changes clients should be prepared to see in
Wikidata output:

Clients need to be prepared to encounter entity types and data types they don't
know. But they should also allow additional fields in any JSON object. We
guarantee that extra fields do not impact the interpretation of fields they know
about - unless we have announced and documented a breaking change.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Breaking change in JSON serialization?

2016-08-05 Thread Daniel Kinzler

Hi Markus!

You are asking use to better communicate changes to our serialization, even if
it's not a breaking change according to the spec. I agree we should do that. We
are trying to improve our processes to achieve this.

Can we ask you in return to try to make your software more robust, by not making
unwarranted assumptions about the serialization format?

With regards to communicating more - it's very hard to tell which changes might
break something for someone. For instance, some software might rely on the order
of fields in a JSON object, even though JSON says this is unspecified, just like
you rely on no fields being added, even though there is no guarantee about this.
Similarly, some software might rely on non-ascii characters being represented as
unicode escape sequences, and will break if we use the more compact utf-8. Or
they may break on changes whitespace. Who knows. We can not possibly know what
kind of change will break some 3rd party software.

I don't think announcing any and all changes is feasible. So I think an official
policy about what we announce can be useful. Something like "This is what we
consider a breaking change, and we will definitely announce it. And these are
some kinds of changes we will also communicate ahead of time. And these are some
things that can happen unannounced."

You are right that policies don't change the behavior of software. But perhaps
they can change the behavior of programmers, by telling them what they can (and
can't) safely rely on.

It boils down to this: we can try to be more verbose, but if you make
assumptions beyond the spec, things will break sooner or later. Writing robust
software requires more time and thought initially, but it saves a lot of
headaches later.

-- daniel

Am 04.08.2016 um 21:49 schrieb Markus Kroetzsch:
> Daniel,
> 
> You present arguments on issues that I would never even bring up. I think we
> fully agree on many things here. Main points of misunderstanding:
> 
> * I was not talking about the WMDE definition of "breaking change". I just 
> meant
> "a change that breaks things". You can define this term for yourself as you 
> like
> and I won't argue with this.
> 
> * I would never say that it is "right" that things break in this case. It's
> annoying. However, it is the standard behaviour of widely used JSON parsing
> libraries. We won't discuss it away.
> 
> * I am not arguing that the change as such is bad. I just need to know about 
> it
> to fix things before they break.
> 
> * I am fully aware of many places where my software should be improved, but I
> cannot fix all of them just to be prepared if a change should eventually 
> happen
> (if it ever happens). I need to know about the next thing that breaks so I can
> prioritize this.
> 
> * The best way to fix this problem is to annotate all Jackson classes with the
> respective switch individually. The global approach you linked to requires 
> that
> all users of the classes implement the fix, which is not working in a library.
> 
> * When I asked for announcements, I did not mean an information of the type 
> "we
> plan to add more optional bits soonish". This ancient wiki page of yours that
> mentions that some kind of change should happen at some point is even more
> vague. It is more helpful to learn about changes when you know how they will
> look and when they will happen. My assumption is that this is a "low cost"
> improvement that is not too much to ask for.
> 
> * I did not follow what you want to make an "official policy" for. Software
> won't behave any differently just because there is a policy saying that it 
> should.
> 
> Markus
> 
> 
> On 04.08.2016 16:48, Daniel Kinzler wrote:
>> Hi Markus!
>>
>> I would like to elaborate a little on what Lydia said.
>>
>> Am 04.08.2016 um 09:27 schrieb Markus Kroetzsch:
>>> It seems that some changes have been made to the JSON serialization 
>>> recently:
>>>
>>> https://github.com/Wikidata/Wikidata-Toolkit/issues/237
>>
>> This specific change has been announced in our JSON spec for as long as the
>> document exists.
>> <https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#wikibase-entityid> 
>> sais:
>>
>>> WARNING: wikibase-entityid may in the future change to be represented as a
>>> single string literal, or may even be dropped in favor of using the string
>>> value type to reference entities.
>>>
>>> NOTE: There is currently no reliable mechanism for clients to generate a
>>> prefixed ID or a URL from the information in the data value.
>>
>> That was the problem: With the current form

Re: [Wikidata] Breaking change in JSON serialization?

2016-08-04 Thread Daniel Kinzler

Hi Markus!

I would like to elaborate a little on what Lydia said.

Am 04.08.2016 um 09:27 schrieb Markus Kroetzsch:
> It seems that some changes have been made to the JSON serialization recently:
>
> https://github.com/Wikidata/Wikidata-Toolkit/issues/237

This specific change has been announced in our JSON spec for as long as the
document exists.
<https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#wikibase-entityid> sais:

> WARNING: wikibase-entityid may in the future change to be represented as a
> single string literal, or may even be dropped in favor of using the string
> value type to reference entities.
>
> NOTE: There is currently no reliable mechanism for clients to generate a
> prefixed ID or a URL from the information in the data value.

That was the problem: With the current format, all clients needed a hard coded
mapping of entity types to prefixes, in order to construct ID strings from the
JSON serialization of ID values. That means no entity types can be added without
breaking clients. This has now been fixed.


Of course, it would have been good to announce this in advance. However, it is
not a breaking change, and we do not plan to treat additions as breaking 
changes.

Adding something to a public interface is not a breaking change. Adding a method
to an API isn't, adding an element to XML isn't, and adding a key to JSON isn't
- unless there is a spec that explicitly states otherwise.

These are "mix and match" formats, in which anything that isn't forbidden is
allowed. It's the responsibility of the client to accommodate such changes. This
is simple best practice - a HTTP client shouldn't choke on header fields it
doesn't know, etc. See <https://en.wikipedia.org/wiki/Robustness_principle>.


If you use a library that is touchy about extra data per default, configure it
to be more accommodating, see for instance
<https://stackoverflow.com/questions/14343477/how-do-you-globally-set-jackson-to-ignore-unknown-properties-within-spring>.

> Could somebody from the dev team please comment on this? Is this going to be 
> in
> the dumps as well or just in the API?

Yes, we use the same basic serialization for the API and the dumps. For the
future, note that some parts (such as sitelink URLs) are optional, and we plan
to add more optional bits (such as normalized quantities) soonish.

> Are further changes coming up?

Yes. The next one in the pipeline is Quantities without upperBound and
lowerBound, see <https://phabricator.wikimedia.org/T115270>. That IS a breaking
change, and the implementation is thus blocked on announcing it, see
<https://gerrit.wikimedia.org/r/#/c/302248/>.

Furthermore, we will probably remove the entity-type and numeric-id fields from
the serialization of EntityIdValues eventually. But there is no concrete plan
for that at the moment.

When we remove the old fields for ItemId and PropertyId, that IS a breaking
change, and will be announced as such.

> Are we ever
> going to get email notifications of API changes implemented by the team rather
> than having to fix the damage after they happened?

We aspire to communicate early, and we are sorry we did not announce this change
ahead of time.

However, this is not a breaking change by the common understanding of the term,
and will not be treated as such. We have argued about that on this list before,
see
<https://www.mail-archive.com/wikidata-tech@lists.wikimedia.org/msg00902.html>.
I have made it clear back then what we consider a breaking change and what not,
and I have advised you that being accommodating in what your client code accepts
will avoid headaches in the future.

To make this even more clear, we will enact and document something similar to my
email from February as official policy soon. Watch for an announcement on this 
list.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] An attribute for "famous person"

2016-08-02 Thread Daniel Kinzler

Am 02.08.2016 um 20:19 schrieb Markus Kroetzsch:
> Oh, there is a little misunderstanding here. I have not suggested to create a
> property "number of sitelinks in this document". What I propose instead is to
> create a property "number of sitelinks for the document associated with this
> entity". The domain of this suggested property is entity. The advantage of 
> this
> proposal over the thing that you understood is that it makes queries much
> simpler, since you usually want to sort items by this value, not documents. 
> One
> could also have a property for number of sitelinks per document, but I don't
> think it has such a clear use case.

"number of sitelinks for the document associated with this entity" strikes me as
semantically odd, which was the point of my earlier mail. I'd much rather have
"number of sitelinks in this document". You are right that the primary use would
be to "rank" items, and that it would be more conveniant to have the count
assocdiated directly with the item (the entity), but I fear it will lead to a
blurring of the line between information about the entity, and information about
the document. That is already a common point of confusion, and I'd rather keep
that separation very clear. I also don't think that one level of indirection
would be orribly complicated.

To me it's just natural to include the sitelink info on the same level as we
provide a timestmap or revision id: for the document.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] An attribute for "famous person"

2016-08-02 Thread Daniel Kinzler

Am 02.08.2016 um 18:41 schrieb Andrew Gray:
> I'd agree with both interpretations - the majority of people in Wikidata are
> Using the existence of Wikipedia articles as a threshold, as suggested, seems 
> a
> pretty good test - it's flawed, of course, but it's easy to check for and 
> works
> as a first approximation of "probably is actually famous".

If we want to have the number of sidelinks in RDF, let's please make sure that
this number is associated with the item *document* uri, not with the concept
uri. After all, the person doesn't have links, the item document does.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Grammatical display of units

2016-08-01 Thread Daniel Kinzler

Am 28.07.2016 um 12:26 schrieb Lydia Pintscher:
> The discussion about how to do this is happening in
> https://phabricator.wikimedia.org/T86528 The basic problem is that we
> do use items for the units. I think this is the right thing to do but
> it does make this particular part a bit tricky.

Well, I think we could sidestep the grammar issue by using unit symbols. We
would have to get them from statements, and they would have to be multilingual
values (or mutliple mono-lingual values), but that is still much less
complicated than trying to apply plural rules.

An alternative is to use MediaWiki i18n messages instead of entity labels. E.g.
if the unit is Q11573, we could check if MediaWiki:wikibase-unit-Q11573 exists,
and if it does, use it. We'd get internationalization including support for
plurals for free.

We could actually combine all of these approaches: first check for a system
message, then check for a symbol statement, then use the label, and if all
fails, use the ID.

I'll comment on the ticket.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Controversy around Wikimania talks

2016-07-31 Thread Daniel Kinzler

Am 31.07.2016 um 17:04 schrieb Gerard Meijssen:
> Hoi,
> I am not to judge what conferences will be deemed relevant for an item in
> Wikidata. When a conference is relevant, it is the talks and particularly the
> registrations of the talks, the papers and the presentations that make the
> conference relevant after the fact.

So you think that for every relevant conference, all talks and speakers should
automatically be considered relevant? Does the same aregument apply to all
courses and theachers at all relevant universities and schools?

I'm trying to understand your point. To me it's a question of granularity. We
can't manage arbitrarily fine grained information, so we have to stop at some
point. What do you think, where should that point be for Wikimania, for other
(relevant) conferences, for universities, for schools?

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Controversy around Wikimania talks

2016-07-31 Thread Daniel Kinzler

Am 31.07.2016 um 16:28 schrieb Gerard Meijssen:
> Hoi,
> Really? It is a source for the talks that were given. It contains the papers
> that were the basis for granting a spot on the program. 

To clarify - would the same apply for any talk at any conference? Or do you
think Wikimania schould be especially relevant to Wikidata, because it's a
Wikimedia thing?

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Machine-readable Wikidata ontology/schema?

2016-06-25 Thread Daniel Kinzler

Am 23.06.2016 um 21:34 schrieb Nicolas Torzec:
> Thanks Stas and Markus.
> 
> I'm interested in computing various stats about Wikidata. For example, I want 
> to
> compute the degree of interlinking between Wikidata and external databases, 
> per
> entity type, per databases, etc. So I need a way to know which properties have
> an  external identifier as range, along with the name of the external database
> they point to. For example P345 is an external identifier to IMDB ; P2639 is 
> an
> external identifier to Filmportal, etc.

The machine readable description of P2639 can be found at
<http://wikidata.org/entity/P2639.json> or, if you prefer,
<http://wikidata.org/entity/P2639.ttl>.

Similarly, the class "Film" is described at
<http://wikidata.org/entity/Q11424.json> resp
<http://wikidata.org/entity/Q11424.ttl>

Since these are regular "entities" (items or properties), they are themselves
described in terms of the wikibase data model and the wikidata vocabulary, not
in terms of RDFS/OWL.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] language fallbacks on Wikipedia and co

2016-06-16 Thread Daniel Kinzler

Am 15.06.2016 um 23:53 schrieb Gerard Meijssen:
> Hoi,
> Wil it work using the #babel templates?

No, because that would be inconsistent with the fallback that is applied when
using Lua or {{#property}} in wikitext. The fallback is based on the fallback
that is defined by MediaWiki for the interface labnguages.

In wikitext, we cannot use the Babel templates, because that would break
caching. The rendering can depend on a few user specific settings, but caching a
rendered version of every page for every possible combination of babel templates
is not feasible.

We could in theory use a different fallback mechanism on Special:AboutTopic, but
that would be quite confusing - why does it look differently in articles? Also,
when talking to others about the output of Special:AboutTopic, this might get
confusing: if someone complains that e.g. some label they see there is wrong,
and you go to the page but what you see is different, it becomes hard to discuss
the issue. There would be no way to link to the page as you see it. Everyone
would potentially see different output.

-- daniel


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Fwd: Using sparql to query for labels with matching regex

2016-04-19 Thread Daniel Kinzler

Hi Mike!

I'm no SPARQL expert, but regular expressions in queries are often not optimized
using indexes. So *all* labels would need to be checked against the regular
expression, which of course times out.

But there are other options. Perhaps
instead of FILTER regex(?label, "^apparel")
try FILTER (STRSTARTS(?label,"apparel"))

See <https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#func-strstarts>

Another option would be Blazegraph's full text index:

WHERE {
  ?label bds:search "apparel*" .
  
}

This woudl match any label that contains a word that starts with apparel.

See <https://wiki.blazegraph.com/wiki/index.php/FullTextSearch>

HTH

Am 29.03.2016 um 22:47 schrieb mike white:
> 
> Hi all
> 
> I am trying to query the wiki data for entities with labels that matches a
> regex. I am new in the sparql world. So could you please help me with it. Here
> is what I have for now.
> 
> https://gist.github.com/anonymous/2810eb5747e51a9ae746183a43f20771
> 
> But I don't think it is the right way. Any help will be much appreciate. 
> Thanks
> 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wordnet mappings

2016-04-12 Thread Daniel Kinzler

Am 12.04.2016 um 08:42 schrieb Stas Malyshev:
> Hi!
> 
>> Is there a property for WordnetId?

More mappings are always good. The case of WordNet is a bit tricky though, since
WordNet is about words, not concepts. Wikidata items can perhaps be mapped to
SynSets, but we still have to be careful not to get confused about the 
semantics.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Status and ETA External ID conversion

2016-03-11 Thread Daniel Kinzler

nical
as well as the product level, which in turn is informed from community
interaction, among other things.

As is often the case, solutions that have to be maintainable and scalable are
not quite as nice as one-off solutions for a special case. MediaWiki is
conservative about adding special case features for good reasons: it's quite
complex as it is, if it had tried to cater to every special case, it would have
collapsed under its own weight a long time ago.

The idea is to generalize from special cases, and implement something that will
work for many more cases, even though it perhaps covers only 90% of what you
could do by catering to the special case directly.

Of course, overly generic multi-option multi-purpose mechanisms should also be
avoided, because they are hard to understand and hard to maintain. So a balance
needs to be found.

Trying to strike that balance, in 2012 we (in this case including you, iirc)
designed data types to be a simple yet sufficiently generic mechanism for
associating behavior with values. So now we use it to associate behavior with
values (like mapping to URLs and URIs), and I am very reluctant to introduce
another mechanism for associating behavior with values.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Status and ETA External ID conversion

2016-03-10 Thread Daniel Kinzler

Am 10.03.2016 um 20:08 schrieb Young,Jeff (OR):
> Then perhaps umbel:isLike instead of owl:sameAs?
> 
> http://wiki.opensemanticframework.org/index.php/UMBEL_Vocabulary#isLike_Property

In some cases owl:equivalentProperty may be appropriate
https://www.w3.org/TR/owl-ref/#equivalentProperty-def


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Status and ETA External ID conversion

2016-03-10 Thread Daniel Kinzler

Am 10.03.2016 um 10:26 schrieb Markus Kroetzsch:
> I am surprised by the amount of confusion in this discussion. There is
> absolutely no relationship between mapping of Wikidata values to URIs and the
> external id datatype.

You are correct that such a relationship does not necessarily follow from first
principles. You are however incorrect in saying that there is no relationship in
Wikibase: The way the data model is currently defined and the way mappings are
implemented, we made a conscious decision to support such mappings only for
ExternalId values.

I think it would help the discussion if we could keep apart:
- what follows from formal principles
- what you (or I) consider best
- what the software currently does

> (3) The external id datatype does not provide any mapping and the criteria 
> used
> for it by the community do not imply that such mappings should exist for these
> cases, or that they should not exist for other cases.

That is incorrect from the way Wikibase defines and uses the ExternalId
datatype: the intent is indeed to say that something is an identifier that can
be mapped, and that such a (direct) mapping is not supported for other data
types. (That doesn't mean we will not offer different mappings for other data
types, perhaps URLs for looking up coordinates, etc).

Modeling this explicitly is indeed the reason to have this datatype.

> I am most worried about Daniel's remark. He says that we wants to use external
> ids to identify properties with "values that identify a resource", but does 
> not
> mention the existing, community-supported mechanism for doing just that (2), 
> and
> instead proposes another mechanism (3), which the community is clearly not 
> using
> for this purpose at all.

That's a misunderstanding. The plan is to support P1921 for URI mappings, and we
already do support P1630 for URL mappings. But we intentionally do this only for
ExternalId values, not for plain strings or other types.

So, the technical implementation does follow the community convention, with the
restriction that properties that should use this kind of mapping need to
explicitly be declared to be identifiers. We are also considering implementing
validation and normalization for ExternalId values, but it's not clear yet how
we can safely apply community supplied validation and normalization patterns.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Status and ETA External ID conversion

2016-03-09 Thread Daniel Kinzler

Am 07.03.2016 um 11:54 schrieb Markus Kroetzsch:
> In general, the community uses several classes for properties that could have
> been used for UI organisation, rather than introducing new datatypes. 

Technically, the main purpose of having a separate datatype was to explicity
model values that identify a resource (in the RDF sense, where resource means
"anything that can be identified unambiguously"), so we can apply mappings (e.g.
to URIs and URLs) when exporting and displaying them.

Using the datatype for the UI structure is an attempt to kill two birds with one
stone. I think it's a pretty good start, but I agree that we should revisit this
once we have gathered some feedback. It would not be too hard to base the
structure on different criteria (well, depends on the criteria).

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] nice

2016-03-02 Thread Daniel Kinzler

"They found 12,703 battles which had an exact location and date, 2,657 of them
are from Wikidata, the others are from DPpedia."

Maybe we can do better?

Am 02.03.2016 um 22:14 schrieb Lydia Pintscher:
> On Wed, Mar 2, 2016 at 8:14 PM Gerard Meijssen  <mailto:gerard.meijs...@gmail.com>> wrote:
> 
> Hoi,
> Yup I missed that one.. this [1] was my source :)
> Gerard
> 
> [1] http://www.bbc.com/news/magazine-35685889
> 
> 
> This is really great. I am thrilled about this because this isn't coverage 
> about
> Wikidata but coverage _with_ Wikidata on major news sites for the second time
> this week
> (http://www.faz.net/aktuell/feuilleton/kino/academy-awards-die-oscars-von-1929-bis-heute-12820119.html
>  being
> the other one). They're using Wikidata data to do meaningful reporting. Our 
> data
> and the project as a whole got (at the very least) good enough for this. It
> feels to me like we've broken through a wall.
> High5 everyone! :D
> 
> Cheers
> Lydia
> -- 
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
> 
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de <http://www.wikimedia.de>
> 
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> 
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter 
> der
> Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/029/42207.
> 
> 
> _______
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] SPARQL CONSTRUCT results truncated

2016-02-11 Thread Daniel Kinzler

Am 11.02.2016 um 10:17 schrieb Gerard Meijssen:
> Your response is technical and seriously, query is a tool and it should 
> function
> for people. When the tool is not good enough fix it.

What I hear: "A hammer is a tool, it should work for people. Tearing down a
building with it takes forever, so fix the hammer!"

The query service was never intended to run arbitrarily large or complex
queries. Sure, would be nice, but that also means committing an arbitrary amount
of resources to a single request. We don't have arbitrary amounts of resources.

We basically have two choices: either we offer a limited interface that only
allows for a narrow range of queries to be run at all. Or we offer a very
general interface that can run arbitrary queries, but we impose limits on time
and memory consumption. I would actually prefer the first option, because it's
more predictable, and doesn't get people's hopes up too far. What do you think?

Oh, and +1 for making it easy to use WDT on labs.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

1 2 >

1 - 100 of 178 matches

Mail list logo