from:"Daniel Kinzler"

Re: [Wikidata] Big numbers

2019-10-07 Thread Daniel Kinzler

Am 07.10.19 um 09:50 schrieb John Erling Blad:
> Found a few references to bcmath, but some weirdness made me wonder if it 
> really
> was bcmath after all. I wonder if the weirdness is the juggling with double 
> when
> bcmath is missing.

I haven't looked at the code in five years or so, but when I wrote it, Number
was indeed bcmath with fallback to float. The limit of 127 characters sounds
right, though I'm not sure without looking at the code.

Quantity is based on Number, with quite a bit of added complexity for converting
between units while considering the value's precision. e.g. "3 meters" should
not turn into "118,11 inch", but "118 inch" or even "120 inch", if it's the
default +/- 0.5 meter = 19,685 inch, which means the last digit is
insignificant. Had lots of fun and confusion with that. I also implemented
rounding on decimal strings for that. And initially screwed up some edge cases,
which I only realized when helping my daughter with her homework ;)

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Personal news: a new role

2019-09-19 Thread Daniel Kinzler

Very cool! Looking forward to seeing more of you!

Am 19.09.19 um 18:56 schrieb Denny Vrandečić:
> Hello all,
> 
> Over the last few years, more and more research teams all around the world 
> have
> started to use Wikidata. Wikidata is becoming a fundamental resource [1]. That
> is also true for research at Google. One advantage of using Wikidata as a
> research resource is that it is available to everyone. Results can be 
> reproduced
> and validated externally. Yay!
> 
> I had used my 20% time to support such teams. The requests became more 
> frequent,
> and now I am moving to a new role in Google Research, akin to a Wikimedian in
> Residence [2]: my role is to promote understanding of the Wikimedia projects
> within Google, work with Googlers to share more resources with the Wikimedia
> communities, and to facilitate the improvement of Wikimedia content by the
> Wikimedia communities, all with a strong focus on Wikidata.
> 
> One deeply satisfying thing for me is that the goals of my new role and the
> goals of the communities are so well aligned: it is really about improving the
> coverage and quality of the content, and about pushing the projects closer
> towards letting everyone share in the sum of all knowledge.
> 
> Expect to see more from me again - there are already a number of fun ideas in
> the pipeline, and I am looking forward to see them get out of the gates! I am
> looking forward to hearing your ideas and suggestions, and to continue
> contributing to the Wikimedia goals.
> 
> Cheers,
> Denny
> 
> P.S.: Which also means, incidentally, that my 20% time is opening for new
> shenanigans [3].
> 
> [1] https://www.semanticscholar.org/search?q=wikidata=relevance
> [2] https://meta.wikimedia.org/wiki/Wikimedian_in_residence
> [3] https://wikipedia20.pubpub.org/pub/vyf7ksah
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Language for non-logged in users

2019-01-25 Thread Daniel Kinzler

Am 25.01.19 um 13:33 schrieb DaB.:> Hello.
> Am 25.01.2019 um 12:13 schrieb Daniel Kinzler:
>> Serving different content from the same URL is generally a bad thing.
>
> No, it’s not. That’s the reason they invented Language-headers in the
> first place: So you can view a page in your language and I can view a
> site in my language. Please respect that not everybody can read english
> (fluently).

Headers can solve the caching problem, but this makes it impossible to link to a
specific language version of a page. That is bad when discussing specifics of
the page, and can cause confusion. It's also bad for search engine indexes,
which should index all language versions.

I very much want everyone to be able to see each page in their own language. The
idea is to redirect based on the language header, when visiting the neutral URL.
Please read the proposal.


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Language for non-logged in users

2019-01-25 Thread Daniel Kinzler

The reason this is not trivial is two-fold: 1) caching and b) the semantics of
URLs. Serving different content from the same URL is generally a bad thing.

A soltuion for this is dicussed in <https://phabricator.wikimedia.org/T114662>,
but work on this is currently not resourced.

Am 25.01.19 um 11:44 schrieb Darren Cook:> I wanted to send someone a URL to
show them how a data item looks in
> Japanese (so we could see which items have a translation). But am I
> right in thinking there is nothing I can put in the URL to do this?
>
> I also tried changing my accept-language header to put "ja" first, but
> it is ignored. Was this a feature that was discussed and rejected; or
> just an itch that no-one has got around to scratching yet?
>
> Darren
>
> P.S. I realize I can login, change my UI to another language, and see
> the data that way. But that is quite a long-winded process, especially
> if the person has not created an account yet.
>
> It also changes the whole UI, not just the data, which is painful if I
> just want to see what has been translated but cannot read the language.
> I think for a project about data, you should be able to set the UI
> language and the content language separately.
>
> E.g. I just put a page into Greek (I think), and now I can see the few
> items that have been translated, but cannot read the property names! Let
> alone navigate the site.) (The switch back to previous language link at
> the top was a great idea, though - thank-you to whoever thought of that
> shortcut.)
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] Missing documentation of Wikibase Lexeme data model

2018-12-11 Thread Daniel Kinzler

Am 11.12.18 um 10:38 schrieb Antonin Delpeuch (lists):
> One way to generate a JSON schema would be to use Wikidata-Toolkit's
> implementation, which can generate a JSON schema via Jackson. It could
> be used to validate the entire data model.

Why a schema is nice, it's more important to have documentation that defines the
contract - that is, the intended semantics and guarantees.

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] Missing documentation of Wikibase Lexeme data model

2018-12-11 Thread Daniel Kinzler

Am 11.12.18 um 08:38 schrieb Jakob Voß:> Hi,
>
> I just noted that the official description of the Wikibase data model at
>
> https://www.mediawiki.org/wiki/Wikibase/DataModel
>
> and the description of JSON serialization lack a description of Lexemes, 
> Forms,
> and Senses.
The abstract model for Lexemes is here:
https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model

The RDF binding his here:
https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/RDF_mapping

Looks like documentation for the JSON bindinng is indeed missing.

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-12-06 Thread Daniel Kinzler

Am 06.12.18 um 09:49 schrieb Daniel Kinzler:
> Am 02.12.18 um 02:28 schrieb Erik Paulson:
>> How do these external identifiers work, and how do I get something into one 
>> of
>> these namespaces? (I apologize if I have missed them in the documentation)
> 
> Hi Erik!

Oh, I forgot an important disclaimer: I used to be on the  Wikidata team and I
was involved in discussing and specifying the different levels of federations
for Wikibase repos. I am no longer part of the Wikidata team though, and may not
to up to date to the latest progress. I cannot in any way speak for the Wikidata
team or make any promises.


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-12-06 Thread Daniel Kinzler

Am 02.12.18 um 02:28 schrieb Erik Paulson:
> How do these external identifiers work, and how do I get something into one of
> these namespaces? (I apologize if I have missed them in the documentation)

Hi Erik!

You got the right idea. Sadly, this feature is not implemented yet. I don't know
if there is any public documentation for this by now, but here is a very rough
list of the stepping stones towards allowing what you want:

1) Enable Items and Properties that exist on Wikidata to be referenced from
other Wikibase instances (repo or client) that can access the Wikidata's
internal database directly, and do not themselves define Items or Properties
(but may define other kinds of entities). This is implemented, but not deployed
yet. It is scheduled to be deployed soon on Wikimedia Commons, as part of the
"Structured Data on Coommons" projects (aka Wikibase MediaInfo).

2) Enable Items and Properties that exist on Wikidata to be referenced from
other Wikibase instances (repo or client) that call Wikidata's web API, and do
not themselves define Items or Properties (but may define other kinds of
entities). This is relatively simple, but details about the caching mechanisms
need to be ironed out. Ask Adam and Lydia about the timeline for this.

3) Enable Items and Properties that exist on Wikidata to be referenced from
other Wikibase instances (repo or client) that call Wikidata's web API, and *do*
themselves also define Items or Properties which are *distinct* from the ones
that Wikidata defines. The spec for this is clear, but some old code needs to be
updated to enable this, and some details about the user interface need to be
worked out. Ask Adam and Lydia about the timeline for this.

4) Enable Items and Properties that exist on Wikidata to be referenced from
other Wikibase instances (repo or client) that call Wikidata's web API, and may
 "augment" or "override" the descriptions of Items and Properties defined on
Wikidata. There seems to be a lot of demand for this, but the details of the
semantics are unclear, especially with respect to SPARQL queries. More
discussion is needed.

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-11-29 Thread Daniel Kinzler

Am 29.11.18 um 10:40 schrieb Yuri Astrakhan:>If at
> some point you decide to add some new area of data, e.g. biological, you could
> add new prefixes for that too, but that would also be a "separate" project.

The Q, P, L, M, etc are used to identify the *type* of entity. They are not for
keeping projects separate. That was never their purpose. Wikibase uses prefixes
before that, but they are prefixed *before* the letter that indicates the type.

> The prefix can be omitted for local entities, so Q12345
> is an item on the local repo (or the default repo of a wikibase client).
>
> I think that was a big mistake -- the "(or the default repo of a wikibase
> client)"  -- because wd implies Wikidata, not Wikibase, so it dilutes the
> meaning of "wd:". See my other email on how I fixed it.

I'm confused - yes, we: should ALWAYS imply wikidata. Your wikibase instance
would have its own prefix (that can be omitted for local use), e.g. "osm:".

For the record, I'm just voicing my oppinion here, and telling you what the
original intention was. I'm no longer working on Wikidata or Wikibase, and I
can't make any decisions on any of this.

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-11-29 Thread Daniel Kinzler

Am 29.11.18 um 01:00 schrieb Lydia Pintscher:
> On Thu, Nov 29, 2018 at 9:46 AM Andra Waagmeester  wrote:
>> I fully agree. I rather see the scarse development resources being focused 
>> on fixing this, than the p/q business, as you nicely call it. Tbh, I really 
>> don't see an issue with multiple p's and q's over different Wikibases. That 
>> is where prefixes are for, to distinguish between different resources. 
>> Examples of identical identifier (literal) schemes between multiple  
>> resources are abundant. (e.g. PubMed and NCBI gene) It really is a matter of 
>> getting used to, or am I missing something?
> 
> Are we talking about https://phabricator.wikimedia.org/T194180? I'm
> happy to push that into one of the next sprints if so.

This doesn't fix the hard-coded prefix in the RDF output generated by Wikibase.


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-11-29 Thread Daniel Kinzler

Am 29.11.18 um 08:21 schrieb Imre Samu:
> - What is the real meaning of Q/P prefix  ->  Wikidata or Wikibase?  

The intention was:

P and Q indicate the *type* of the entity ("P" = "Property", "Q" = "Item" for
arcane reasons), "L" = Lexeme, "F" = Form, "S" = Sense, "M" = MediaInfo). As you
can tell, we'd quickly run out of letters and cause confusion if this became
configurable.

Using prefixes to indicate where the entity comes from is indeed useful and is
already part of the model. The prefix for Wikidata is "wd:", wo "wd:Q12345" is
an item from Wikidata. The prefix can be omitted for local entities, so Q12345
is an item on the local repo (or the default repo of a wikibase client).

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-11-29 Thread Daniel Kinzler

Am 28.11.18 um 23:53 schrieb Olaf Simons:
> I will receive answers in the form of 
> 
> wd:q25 
> 
> but they do not lenk to wd, wikidata, but into our database 
> https://database.factgrid.de/entity/Q25. 

Right, that prefix should not be "wd" for your own query service. I'm afraid
that's currently hard coded in the RdfVocabulary class. That should indeed be
fixed.


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-11-28 Thread Daniel Kinzler

Am 28.11.18 um 10:15 schrieb James Heald:
> It should also be made possible for the local wikibase to use local prefixes
> other than 'P' and 'Q' for its own local properties and items, otherwise it
> makes things needlessly confusing -- but currently I think this is not 
> possible.
I think the opposite is the case: ending up with a zoo of prefixes, with items
being called A73834 and F0924095 and Q98985 and W094509, would be very
confusing. The current approach is to to use the same approach that RDF and XML
use: add a kind of namespace identifier in front of "foreign" identifiers. So
you would have Q437643 for "local" items, xy:Q8743 for items from xy,
foo:Q873287 for items from foo, etc. This is how foreign IDs are currently
implemented in Wikibase.


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Daniel Kinzler

Hi Pine, sorry for the misleading wording. Let me clarify below.

Am 19.10.18 um 9:51 nachm. schrieb Pine W:
> Hi Markus, I seem to be missing something. Daniel said, "And I think the best
> way to achieve this is to start using the ontology as an ontology on wikimedia
> projects, and thus expose the fact that the ontology is broken. This gives
> incentive to fix it, and examples as to what things should be possible using
> that ontology (namely, some level of basic inference)." I think that I
> understand the basic idea behind structured data on Commons. I also think 
> that I
> understand your statement above. What I'm not understanding is how Daniel's
> proposal to "start using the ontology as an ontology on wikimedia projects, 
> and
> thus expose the fact that the ontology is broken." isn't a proposal to add 
> poor
> quality information from Wikidata onto Wikipedia and, in the process, give
> Wikipedians more problems to fix. Can you or Daniel explain this?

What I meant in concrete terms was: let's start using wikidata items for tagging
on commons, even though search results based on such tags will currently not
yield very good results, due to the messy state of the ontology, and hope people
fix the ontology to get better search results. If people use "poodle" to tag an
image and it's not found when searching for "dog", this may lead to people
investigating why that is, and coming up with ontology improvements to fix it.

What I DON'T mean is "let's automatically generate navigation boxes for
wikipedia articles based on an imperfect  ontology, and push them on everyone".
I mean, using the ontology to generate navigation boxes for some kinds of
articles may be a nice idea, and could indeed have the same effect - that people
notice problems in the ontology, and fix them. But that would be something the
local wiki communities decide to do, not something that comes from Wikidata or
the Structured Data project.

The point I was trying to make is: the Wiki communities are rather good in
creating structures that serve their purpose, but they do so pragmatically,
along the behavior of the existing tools. So, rather than trying to work around
the quirks of the ontology in software, the software should use very simply
rules (such as following the subclass relation), and let people adopt the data
to this behavior, if and when they find it useful to do so. This approach, over
time, provides better results in my opinion.

Also, keep in mind that I was referring to an imperfect *improvement* of search.
the alternative being to only return things tagged with "dog" when searching for
"dog". I was not suggesting to degrade user experience in order to incentivize
editors. I'm rather suggesting the opposite: let's NOT give people a reason tag
images that show poodles with "poodle" and "dog" and "mammal" and "animal" and
"pet" and...

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-17 Thread Daniel Kinzler

uot;...the burden of proof has to be placed on authority, and it should be
> dismantled if that burden cannot be met..."
> 
> -Thad
> +ThadGuidry <https://plus.google.com/+ThadGuidry>
> 
> 
> On Sat, Sep 29, 2018 at 2:49 AM Ettore RIZZA  <mailto:ettoreri...@gmail.com>> wrote:
> 
> Hi,
> 
> The Wikidata's ontology is a mess, and I do not see how it could be
> otherwise. While the creation of new properties is controlled, any 
> fool
> can decide that a woman <https://www.wikidata.org/wiki/Q467>is no 
> longer
> a human or is part of family. Maybe I'm a fool too? I wanted to remove
> the claim that a ship <https://www.wikidata.org/wiki/Q11446> is an
> instance of "ship type" because it produces weird circular inferences 
> in
> my application; but maybe that makes sense to someone else.
> 
> There will never be a universal ontology on which everyone agrees. I
> wonder (sorry to think aloud) if Wikidata should not rather facilitate
> the use of external classifications. Many external ids are knowledge
> organization systems (ontologies, thesauri, classifications ...) I 
> dream
> of a simple query that could search, in Wikidata, "all elements of the
> same class as 'poodle' according to the classification of imagenet
> <http://imagenet.stanford.edu/synset?wnid=n02113335>.
> 
> _______
> Wikidata mailing list
> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] lexeme fulltext search display

2018-06-18 Thread Daniel Kinzler

Am 18.06.2018 um 19:25 schrieb Stas Malyshev:
> 1. What the link will be pointing to? I haven't found the code to
> generate the link to specific Form.

You can use an EntityTitleLookup to get the Title object for an EntityId. In
case of a Form, it will point to the appropriate section. You can use the
LinkRenderer service to make a link. Or you use an EntityIdHtmlLinkFormatter,
which should do the right thing. You can get one from a
OutputFormatValueFormatterFactory.

-- daniel

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] lexeme fulltext search display

2018-06-18 Thread Daniel Kinzler

Hi Stas!

Your proposal is pretty much what I envision.

Am 14.06.2018 um 19:39 schrieb Stas Malyshev:
> I plan to display Lemma match like this:
> 
> title (LN)
> Synthetic description
> 
> e.g.
> 
> color/colour (L123)
> English noun
> 
> Meaning, the first line with link would be standard lexeme link
> generated by Lexeme code (which also deals with multiple lemmas) and the
> description line is generated description of the Lexeme - just like in
> completion search.

Sounds perfect to me.

> The problem here, however, is since the link is
> generated by the Lexeme code, which has no idea about search, we can not
> properly highlight it. This can be solved with some trickery, probably,
> e.g. to locate search matches inside generated string and highlight
> them, but first I'd like to ensure this is the way it should be looking.

Do we really need the highlight? It does not seem critical to me for this use
case. Just "nice to have".

> More tricky is displaying the Form (representation) match. I could
> display here the same as above, but I feel this might be confusing.
> Another option is to display Form data, e.g. for "colors":
> 
> color/colour (L123)
> colors: plural for color (L123): English noun

I'd rather have this:

 colors/colours (L123-F2)
 plural of color (L123): English noun

Note that in place of "plural", you may have something like "3rd person,
singular, past, conjunctive", derived from multiple Q-ids.

> The description line features matched Form's representation and
> synthetic description for this form. Right now the matched part is not
> highlighted - because it will otherwise always be highlighted, as it is
> taken from the match itself, so I am not sure whether it should be or not.

Again, I don't think any highlighting is needed.

But, as you know, it's all up to Lydia to decide :)

-- daniel

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] Solve legal uncertainty of Wikidata

2018-05-18 Thread Daniel Kinzler

Am 18.05.2018 um 21:37 schrieb Amirouche Boubekki:
> What wikidata doesn't track the license of each piece of information?!

Facts don't *have* licenses. They have sources, and we track those. Which may
have licenses, depending on jurisdiction, interpretation, form, content, etc.
But the fact itself doesn't, it's not copyrightable.


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] Fastest way (API or whatever) to verify a QID

2018-05-15 Thread Daniel Kinzler

You can do this via the API, e.g.:

https://www.wikidata.org/w/api.php?action=query==json=Q1|Qx|Q1003|Q66=1

Note that this uses QIDs directy as page titles. This works on wikidata, but may
not work on all wikibase instances. It also does not work for PIDs: for these,
you have to prefix the Property namespace, as in Property:P31.

A more wikibase way would be to use the wbgetentities API, as in
https://www.wikidata.org/w/api.php?action=wbgetentities=Q42|Q64=

However, this API fails when you proivde a non-existing ID, without providing
any information about other IDs. So you can quickly check if all the IDs you
have are ok, but you may need several calls to get a list of all the bad IDs.

That's rather annoying for your use case. Feel free to file a ticket on
phabricator.wikimedia.org. Use the wikidata tag. Tahnks!

-- daniel

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] Search on Wikibase/Wikidata sans CirrusSearch?

2017-12-30 Thread Daniel Kinzler

Yes, it's supposed to work, see FingerprintSearchTextGenerator and
EntityContent::getTextForSearchIndex

Am 30.12.2017 um 06:47 schrieb Stas Malyshev:
> Hi!
> 
> I wonder if anybody have run/is running Wikibase without CirrusSearch
> installed and whether the fulltext search is supposed to work in that
> configuration? The suggester/prefix search, aka wbsearchentities, works
> ok, but I can't make fulltext aka Special:Search find anything on my VM
> (which very well may be a consequence of me messing up, or some bug, or
> both :)
> So, I wonder - is it *supposed* to be working? Is anybody using it this
> way and does anybody care for such a use case?
> 
> Thanks,
> 


___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] RDF: All vs Truthy

2017-12-03 Thread Daniel Kinzler

Am 03.12.2017 um 14:49 schrieb Imre Samu:
>>All=contains not only the Truthy ones,but also the ones with qualifiers
> 
> imho:  Sometimes Qualifiers is very important for multiple values  (   like
> "Start time","End time","point in time", ... )
> for example:   Russia https://www.wikidata.org/wiki/Q159  :  Russia - 
> P38:"currency"
> has 2 "statements" both with qualifiers:
> 
> * Russian ruble -  ( start time: 1992 )
> * Soviet ruble  - (end time: September 1993 )
> 
> My Question:
> in this case - what is the "Truthy=simple" result for   Russia-P38:"currency" 
> ?

You will simply get two truthy results: Russian rubel, and Soviet rubel. Both
are Russian currencies. If you want to know when, why, where, etc, you have to
check the qualified "full" statements.

That's why it's called "truthy": the answer is kind of true, depending on 
context.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] How to get direct link to image

2017-10-30 Thread Daniel Kinzler

Am 30.10.2017 um 19:10 schrieb Laura Morales:
>> You can also use the Wikimedia Commons API made by Magnus:
> https://tools.wmflabs.org/magnus-toolserver/commonsapi.php
>> It will also gives you metadata about the image (so you'll be able to cite 
>> the author of the image when you reuse it).
> 
> Is the same metadata also available in the Turtle/HDT dump?

Sadly not. We don't have proper structured meta-data yet. That's what the
Structured Data on Commons project is about:
<https://commons.wikimedia.org/wiki/Commons:Structured_data>


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata prefix search is now Elastic

2017-10-26 Thread Daniel Kinzler

Am 26.10.2017 um 11:36 schrieb Marco Fossati:
> Thanks a lot Stas for this present.
> Could you please share any pointers on how to integrate it into other tools?

Just keep using wgsearchentities. It now uses Cirrus as a backend, instead of
SQL. That should provide better performance, and better ranking.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Navigation to Wikipedia links on Wikidata

2017-09-05 Thread Daniel Kinzler

If your browser window is wide enough, Sitelinks to Wikipedia should already be
close to the top of the page, on the right-hand side.

But in any case, you can always add #sitelinks-wikipedia to the URL, like in
<https://www.wikidata.org/wiki/Q1#sitelinks-wikipedia>. That will make the
browser jump right to the wikipedia section.

Am 05.09.2017 um 16:47 schrieb Tito Dutta:
> Hello,
> If I am on a Wikidata item page (QX), what's the easiest way to navigate 
> to
> the Wikipedia links other than manual scrolling? Sometimes (actually a lot of
> times) I need to check Wikipedia articles (not only English) before I add
> description part. Is there any user script or something that puts Wikipedia
> links above statement or any other suggestion?
> 
> Thanks
> Tito Dutta
> Note: If I don't reply to your email in 2 days, please feel free to remind me
> over email or phone call.
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] Does a rollback also roll back revision history?

2017-07-31 Thread Daniel Kinzler

Am 31.07.2017 um 17:01 schrieb Eric Scott:
> * Is is indeed the case that rollbacks also roll back the revision history?

No. All edits are visible in the page history, including rollback, revert,
restore, undo, etc. The only kind of edit that is not recorded is a "null edit"
- an edit that changes nothing compared to the previous version (so it's not
actually an edit). This is sometimes used to rebuild cached derived data.

> * Is there some other place we could look that records such rollbacks?

No. The page history is authoritative. It reflects all changes to the page
content. If you could find a way to trigger this kind of behavior, that would be
a HUGE bug. Let us know.

Note that for wikitext content, this doesn't mean that it contains all changes
to the visible rendering: when a transcluded template is changed, this changes
the rendering, but is not visible in the page's history (but it is instead
visible in the template's history). However, no transclusion mechanism exists
for Wikidata entities.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] lib.reviews: Review anything with a Wikidata entry

2017-07-26 Thread Daniel Kinzler

Thanks for sharing, Erik!

Combining search and quiery capabilities would needed be useful for quite a few
things. We'll probably be working on making this easier soon.

-- daniel

Am 26.07.2017 um 07:26 schrieb Erik Moeller:
> A small update on this: based on some off-list feedback, I replaced
> the way I exclude disambiguation pages and the like from the
> autocomplete list. The autocomplete widget now performs two queries: a
> MediaWiki API (wbsearchentities) query, and a follow-up WDQS SPARQL
> query to exclude disambiguation pages, Wikinews articles, and other
> content that folks are most likely not interested in reviewing.
> 
> I didn't find a good example for this in the examples directory, so I
> figured folks might find the query I'm using useful. Before I add it
> to the examples, please let me know if you see obvious ways in which
> it can be improved.
> 
> Here's an example query:
> 
> # For a list of items, exclude the ones that have "instance of" set to
> # one from a given set of excluded classes
> SELECT DISTINCT ?item WHERE {
>  ?item ?property ?value
> 
>   # Excluded classes: disambiguation pages, Wikinews articles, etc.
>   MINUS { ?item wdt:P31 wd:Q4167410 }
>   MINUS { ?item wdt:P31 wd:Q17633526 }
>   MINUS { ?item wdt:P31 wd:Q11266439 }
>   MINUS { ?item wdt:P31 wd:Q4167836 }
>   MINUS { ?item wdt:P31 wd:Q14204246 }
> 
>   # Set of items to check against the above exclusion list
>   # wd:Q355362 is a disambiguation page and will therefore not be in
>   # the result set
>   VALUES ?item { wd:Q23548 wd:Q355362 wd:Q1824521 wd:Q309751 wd:Q6952373 }
> }
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wiki PageID

2017-04-24 Thread Daniel Kinzler

Hello Gintautas!

Am 21.04.2017 um 17:58 schrieb Gintautas Sulskus:
> I have a couple of questions regarding the Wiki Page ID. Does it always stay
> unique for the page, where the page itself is just a placeholder for any kind 
> of
> information that might change over time? 

That is indeed the idea. COntent changes, the page ID stays the same. If you
need to identify a specific state of the page, use the revision ID (aka 
permalink).

Note however that page IDs are considered "internal" identifiers. They are
stable, but they are not the canonical way to access or identify a page. Use the
title for that - or, in the context of Wikidata, use the entity ID.

> Consider the following cases:
> 1. The first time someone creates page "Moon" it is assigned ID=1. If at some
> point the page is renamed to "The_Moon", the ID=1 remains intact. Is this 
> correct?

Yes, page IDs survive renaming/moving the page.

> 2. What if we have page "Moon" with ID=1. Someone creates a second-page
> "The_Moon" with ID=2. Is it possible that page "Moon" is transformed into a
> redirect? Then, "Moon" would be redirecting to page "The_Moon"?

Yes, pages can become redirects.

> 3. Is it possible for page "Moon" to become a category "Category:Moon" with 
> the
> same ID=1?

Yes, pages can be moved into the category namespace.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] Wikibase and PostgreSQL

2017-04-10 Thread Daniel Kinzler

Hi Denis!

Sorry for the late response.

The information is in the installation requirements, see
<https://www.mediawiki.org/wiki/Extension:Wikibase_Repository#Requirements>.

Where did you expect to find it? Perhaps we can add it in some more places to
avoid confusion and frustration. In the README file, maybe?

-- daniel

Am 06.03.2017 um 09:05 schrieb Denis Rykov:
> Hello!
> 
> It looks like Wikibase extension is not compatible with PostgreSQL backend.
> There are many MySQL specific code in sql scripts (e.g. auto_increment, 
> varbinary).
> How about to add this information to Wikibase docs?
> //


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler

Am 10.04.2017 um 18:12 schrieb Denny Vrandečić:
> So assume we enter a new Lexeme in Examplarian (which has a Q-Item), but
> Examplarian has no language code for whatever reason. What language code would
> they enter in the MultilingualTextValue?

My plan is: it will be "mis+Q7654321" internally, which will be exposed in HTML
and RDF as "mis".

We will want to distinguish "a known language not on this list (mis)" from "an
unknown language (und)" and "translingual" (Wiktionary uses "mul" for
translingual, but that's not technically correct).

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler

Am 10.04.2017 um 19:24 schrieb Denny Vrandečić:
> Daniel, I agree, but isn't that what Multilingual Text requires? A language 
> code?

Yes. Well, internally, it just has to be *some* unique code. But for
interoperability, we want it to be a standard code. So I propose to internally
use something like "de+Q1980305", and expose that as "de" externally. This
allows us to distinguish however many variants of German we want internally, and
tag them all as "de" in HTML and RDF, so standard tools can use the language
information.

> I assume most of it is hidden behind mini-wizards like "Create a new lexeme",
> which actually make sure the multitext language and the language property are
> consistently set. In that case I can see this work.

Yes, that is exactly the plan for the NewLexeme page.

We'll still have to come up with a nifty UI for "add a lemma, select a language,
and optionally an item identifying a variant of that language".

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler

Am 10.04.2017 um 18:56 schrieb Gerard Meijssen:
> Hoi,
> The standard for the identification of a language should suffice.

I know no standard that would be sufficient for our use case.

For instance, we not only need identifiers for German, Swiss and Austrian
German. We also need identifiers for German German before and after the spelling
reform of 1901, and before and ofter the spelling reform of 1996. We will also
need identifiers for the "language" of mathematical notation. And for various
variants of ancient languages: not just Sumerian, but Sumerian from different
regions and periods.

The only system I know that gives us that flexibility is Wikidata. For
interoperability, we should provide a standard language code (aka subtag). But a
language code alone is not going to be sufficient to distinguish the different
variants we will need.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler

Tobias' comment made me realize that I did not clarify wone very important
distinction: there are two kinds of places where a "language" is needed in the
Lexeme data model
<https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model>:

1) the "lexeme language". This can be any Item, language code or no. This is
what Tobias would have to use in his query.

2) the language codes used in the MultilingualTextValues (lemma, representation,
and gloss). This is where my "hybrid" approach comes in: use a standard language
code augmented by an item ID to identify the variant.

To make it easy to create new Lexemes, the lexeme language can serve as a
default for lemma, representation, and gloss - but only if it has a language
code. If it does not have one, the user will have to specify one for use in
MultilingualTextValues.


Am 06.04.2017 um 19:59 schrieb Tobias Schönberg:
> An example using the second suggestion:
> 
> If I would like to query all L-items that contain a combination of letters and
> limit those results by getting the Q-items of the language and limit those, to
> those that have Latin influences.
> 
> In my imagination this would work better using the second suggestion. Also the
> flexibility of "what is a language" and "what is a dialect" would seem easier 
> if
> we can attach statements to the UserLanguageCode or the Q-item of the 
> language.
> 
> -Tobias


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Disputed territories in WDQS

2017-04-09 Thread Daniel Kinzler

Hi Andrea!

As Nicolas pointed out, the map view of WDQS is based on OpenStreetMap. So the
territory would have to be marked as disputed there.

However, perhaps you can turn this into a positive example for Wikidata's
flexibility and NPOV afterall: I have added some statements to
<https://www.wikidata.org/wiki/Q5671580> to show how a territorial dispute can
be modeled on Wikidata.

I was lazy and didn't add any sources, though - I didn't know what to make of
"Donovan 2003" given in Wikipedia, as it doesn't give the title of a
publication. But I suppose sources for these things should be easy to find.

HTH
daniel

Am 09.04.2017 um 14:54 schrieb Andra Waagmeester:
> I am currently in Suriname, where I gave a talk on open 
> data/wikipedia/wikidata.
> Next week there will be a handson session, where I hope to get as much
> contribution from this country as possible.
> 
> When I demonstrated the WDQS, the audience took offense in the way Suriname is
> depicted on the map view used in the WDQS. There is a territorial dispute with
> the neighboring country Guyana, called the Tigri
> area(https://en.wikipedia.org/wiki/Tigri_Area). In the WDQS this area is
> currently being drawn as being part of Guyana. The maps drawn in the WIkipedia
> article shows how the issue is dealt with here when drawing maps. i.e. The 
> area
> is explicitly drawn as being a territorial dispute, which is more factual. 
> 
> Any idea's on how to get a similar mapview on the WDQS? Thanks to Wikipedia
> Zero, where people can have free access to Wikidata (even in remote area's),
> there is quite some potential to get people involved in adding local data.
> Having the current mapview is counter productive. 
> 
> Cheers,
> 
> Andra
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] [Wikidata] Significant change: new data type for geoshapes

2017-03-29 Thread Daniel Kinzler

Am 29.03.2017 um 15:19 schrieb Luca Martinelli:
>> One thing to note: We currently do not export statements that use this
>> datatype to RDF. They can therefore not be queried in the Wikidata Query
>> Service. The reason is that we are still waiting for geoshapes to get stable
>> URIs. This is handled in this ticket.

This ticket: <https://phabricator.wikimedia.org/T159517>. And more generally
<https://phabricator.wikimedia.org/T161527>.

The technically inclined of you may be interested in joining the relevant RFC
discussion on IRC tonight at 21:00 UTC (2pm PDT, 23:00 CEST) #wikimedia-office.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] Significant change: new data type for geoshapes

2017-03-29 Thread Daniel Kinzler

Am 29.03.2017 um 15:19 schrieb Luca Martinelli:
>> One thing to note: We currently do not export statements that use this
>> datatype to RDF. They can therefore not be queried in the Wikidata Query
>> Service. The reason is that we are still waiting for geoshapes to get stable
>> URIs. This is handled in this ticket.

This ticket: <https://phabricator.wikimedia.org/T159517>. And more generally
<https://phabricator.wikimedia.org/T161527>.

The technically inclined of you may be interested in joining the relevant RFC
discussion on IRC tonight at 21:00 UTC (2pm PDT, 23:00 CEST) #wikimedia-office.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] wikibase:directClaim predicate?

2017-03-27 Thread Daniel Kinzler

Am 27.03.2017 um 23:48 schrieb Kingsley Idehen:
> I think we can just agree to disagree for now, since nothing you've
> stated is fundamentally contrary to my view of RDF --  as a Language for
> describing anything (including statements)  :)

Yes, that's what RDF is. My pint is: just because seomthing can be described in
RDF doesn't mean it *is* RDF.

As you said, RDF can describe anything. If anything that can be described with
RDF *is* RDF, then everything is RDF. Then the term would be meaningless.

The Wikibase model "is" an RDF model just as much as it "is" modal logic system,
or any other sufficiently powerful formal language.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Does Wikidata use a property store or a RDF triplestore?

2017-03-24 Thread Daniel Kinzler

The primary data storage is document oriented, and very dumb. It's JSON blobs
stored as wiki page content, using MediaWiki's standard content blob storage
mechanism.

We have a live export to a triple store, and an open SPARQL endpoint.

These links may be helpful:

https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
https://www.wikidata.org/wiki/Wikidata:Data_access

If you want to play with the data, try http://query.wikidata.org/

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] What kind of bot "wiktionary in wikidata" needs?

2017-03-24 Thread Daniel Kinzler

Am 22.03.2017 um 10:10 schrieb Amirouche:
> My understanding is that wiktionary (and wikipedia) CC-BY-SA license is
> incompatible with wikidata CC0 license.

That is true, for any copyrighted information on Wiktionary. That will mainly be
definitions, and maybe example sentences. Facts, such as word type or
morphology, are not copyrightable.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] wikibase:directClaim predicate?

2017-03-19 Thread Daniel Kinzler

Am 19.03.2017 um 18:21 schrieb Bob DuCharme:
> I do have to ask: if the mapping used on wikidata.org has diverged from what 
> is
> described there, is a more up-to-date description of the mapping available
> anywhere?

The current mapping is the one described at
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] wikibase:directClaim predicate?

2017-03-18 Thread Daniel Kinzler

Am 18.03.2017 um 22:48 schrieb Bob DuCharme:
> New question: when I see that https://www.wikidata.org/wiki/Special:EntityData
> says "This page provides a linked data interface to entity values", can you 
> tell
> me what "entity" means in the context of Wikidata? If I was going to refer to
> something that can be identified with a URI and described by triples in which 
> it
> is the subject, I would just use the term "resource" as described at
> https://www.w3.org/TR/rdf11-concepts/#resources-and-statements (and 
> remembering
> what "RDF" stands for!) so I'm guessing that "entity" means something a little
> more specific than that here.

The Wikidata (or technically, Wikibase) data model is not defined in terms of
RDF. Have a look at the primer
<https://www.mediawiki.org/wiki/Wikibase/DataModel/Primer> and the spec
<https://www.mediawiki.org/wiki/Wikibase/DataModel>.

Entitites are the top level elements of Wikidata. There are currently two kinds:
Items (things or concepts in the world) and Properties (attributes for
describing Items and other entities).

Wikibase Entities are certainly Resources in the RDF sense, but so are some of
the more fine grained components of the Wikibase model, such as Statements and
References. You can find the OWL file for the RDF binding of Wikibase at
<http://wikiba.se/ontology>.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] wikibase:directClaim predicate?

2017-03-18 Thread Daniel Kinzler

Am 18.03.2017 um 21:27 schrieb Bob DuCharme:
> Thanks Daniel!
> 
> How do I find a full statement representation? For example, what would the 
> full
> statement representation be for a triple like
> {wd:Q64 wdt:P1376 wd:Q183}?

The full representation of the statement in this case is:

wds:Q64-43CCD3D6-F52E-4742-B0E3-BCA671B69D2C a wikibase:Statement,
wikibase:BestRank ;
wikibase:rank wikibase:PreferredRank ;
ps:P1376 wd:Q183 ;
prov:wasDerivedFrom wdref:ba76a7c0f885fa85b10368696ab4ac89680aa073 .

wdref:ba76a7c0f885fa85b10368696ab4ac89680aa073 a wikibase:Reference ;
pr:P248 wd:Q451546 ;
pr:P958 "Artikel 2 (1)" .

This RDF representation can be found at
<https://www.wikidata.org/wiki/Special:EntityData/Q64.ttl>. Content negotiation
will take you there from the canonical URI,
<https://www.wikidata.org/entity/Q64.ttl>

In addition to the actual value, the RDF above also give the rank, and a source
reference (nameley, the re-unification treaty).

This statement doesn't currently have a qualifier - it should have at least one,
stating since when Berlin is the Capital of Germany. That qualifier would be
represented as:

wds:Q64-43CCD3D6-F52E-4742-B0E3-BCA671B69D2C pq:P580
  "1990-10-03T00:00:00Z"^^xsd:dateTime ;


The Statement ID, Q64$43CCD3D6-F52E-4742-B0E3-BCA671B69D2C, can be found in the
HTML source of the page, encoded as a CSS class. These IDs are not exposed
nicely anywhere. But usually, one would look at the RDF representation right
away, or at least got from HTML to *all* the RDF.

HTH
-- daniel


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread Daniel Kinzler

Am 27.02.2017 um 18:18 schrieb James Heald:
> From what Daniel is saying, it seems this may not be possible, because the
> template expansion would then depend on the user's preferred language(s), 
> which
> would not be compatible with the template cacheing.
> 
> Is that right?   Or is there a way round this?

We are currently aiming for a compromise: we render the page with the user's
interface language as the target language, and apply fallback accordingly. We do
not take into account secondary user languages, as defined e.g. by the Babel or
Translate extensions.

This means a user with the UI language set to French will see French if
available, but will not see Spanish, even if they somehow declared that they
also speak Spanish.

This way, we split the parser cache once per UI language - a factor of 300, but
not the exponential explosion we would get if we would split on every possible
permutation of languages (does anyone want to compute 300 factorial?).


-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread Daniel Kinzler

Am 27.02.2017 um 17:01 schrieb James Hare:
> One option is to allow users to define their own ranked preferences for 
> language
> beyond just first place. (I personally would enjoy having French as a fallback
> to English.)

That would badly fragment the parser cache. I don't think it's viable.

> This has the downside of only really working for people with
> accounts, which I suspect might be a minority of overall traffic.

Currently, we only support English for anon visiors (yes, this is very sad; the
reason is, again, caching - varnish, this time).

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Label gaps on Wikidata

2017-02-27 Thread Daniel Kinzler

Am 19.02.2017 um 17:00 schrieb Romaine Wiki:
> Hi all,
> 
> If you look in the recent changes, most items have labels in English and those
> are shown in the recent changes and elsewhere (so we know what the item is 
> about
> without opening first).

Wikidata actually tries to show you the labels in your üpreferred interface
language. And if you user language is not available, it uses a fallback
mechanism to show the next-best language, which may even include automated
transciptions. When all else fails, it will show the English label. If that
doesn't exist, it shows the ID.

> But not all items have labels, and these items without
> English label are often items with only a label in Chinese, Arabic, Cyrillic
> script, Hebrew, etc. This forms a significant gap.

The fallback mechanism works OK, but is not great for English speaking users who
see a lot of items that have no English label. For English, we just don't know
what to fall back to. Just anything? Or try european languages first? What
should the rule be? If we can decide on a good rule, it should actualyl be
pretty simple to add such fallback for English.

> Is there a way to easily make a transcription from one language to another?

We have such rules for some languages/variants, e.g. between the cyrillic and
the roman representations of Kazakh or Uzbek. But translitteration rules can be
complex, and covering every permutation of the 300 languages we support would
mean we'd need about 45000 rule sets...

> Or alternatively if there is a database that has such transcriptions?

Not yet. One of the goals of Wikidata is to be that database.

-- 
Daniel Kinzler
Principal Platform Engineer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Full Text Search in Query Service

2017-02-17 Thread Daniel Kinzler

Am 17.02.2017 um 22:02 schrieb James Heald:
> Quick question on this Stas:
> 
> * Why do the suggestions that come up when typing in the search box seem so 
> much
> more on-point (ie better at presenting the most likely option first) than the
> ones that come up in the results list?

The reason is that the "search box" on wikidata.org is fake: it is not the
search box you see on wikipedia, it does not use the search infrastructure that
Special:Search uses (Cirrus). It uses a custom API module (wbsearchentities)
which relies on a custom database table (wb_terms). We need this because Cirrus
did not have suppor for structured data or multilingual fields. That is changing
now, and we want to use Cirrus for everything. But until then, wikidata is using
two completely different search mechanisms, both of which work well for some
things, and really badly for others.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata ontology

2017-01-09 Thread Daniel Kinzler

Am 09.01.2017 um 04:36 schrieb Markus Kroetzsch:
> Only the "current king of Iberia" is a single person, but Wikidata is about 
> all
> of history, so there are many such kings. The office of "King of Iberia" is
> still singular (it is a singular class) and it can have its own properties 
> etc.
> I would therefore say (without having checked the page):
> 
> King of Iberiainstance of  office
> King of Iberiasubclass of  king

To be semantically strict, you would need to have two separate items, one for
the office, and one for the class. Because the individual kinds have not been
instances of the office - they have been holders of the office. And they have
been instances of the class, but not holders of the class.

On wikidata, we often conflate these things for sake of simplicity. But when you
try to write queries, this does not make things simpler, it makes it harder.

Anything that is a subclass of X, and at the same an instance of Y, where Y is
not "class", is problematic. I think this is the root of the confusion Gerards
speaks of.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata ontology

2017-01-05 Thread Daniel Kinzler

Am 04.01.2017 um 11:00 schrieb Léa Lacroix:
> Hello,
> 
> You can find it here: http://wikiba.se/ontology-1.0.owl
> 
> If you have questions regarding the ontology, feel free to ask.


Please note that this is the *wikibase* ontology, which thefines the meta-model
for the information on Wikidata. It defines models statements, sitelinks, source
references, etc.

This ontology does not model "real world" concepts or properties like location
or color or children, etc. Modeling on this level is done on Wikidata itself,
there is no fixed RDF or OWL schema or ontology.

The best you can get in terms of "downloading the wikidata ontology" would be to
download all properties and all the items representing classes. We currently
don't have a separate dump for these. Also, do not expect this to be a concise
or consistent model that can be used for reasoning. You are bound to find
contradictions and lose ends.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wikidata Redirects in dumps

2016-12-13 Thread Daniel Kinzler

Am 12.12.2016 um 20:53 schrieb Praveen Balaji:
> When using JSON dumps, how can I tell a redirected entity from the JSON dumps

If you look at the ID of the entity you get when you ask for
<https://www.wikidata.org/wiki/Special:EntityData/Q6703218.json>, you will
notice that it does not have the ID you requested. This way, you know that you
have been redirected.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?

2016-11-28 Thread Daniel Kinzler

Am 28.11.2016 um 17:34 schrieb gnosygnu:
>> The datatype is implicit, it can be derived from the property ID. You can 
>> find
>> it by looking at the Property page's JSON.
>> ...
> 
> Thanks for all the info. I see my error. I didn't realize that
> mainsnak.datatype was inferred. I assumed it would have to be embedded
> directly in the XML's JSON  (partly because it is embedded directly in
> the JSON's dump JSON)
> 
> The rest of your points make sense. Thanks again for taking the time to 
> clarify.

If you have problems accessing the datatype from Lua or elsewhere, let me know.
There may be issues with the import process.

It's always cool to see that people use our data and our software!


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?

2016-11-28 Thread Daniel Kinzler

Am 28.11.2016 um 16:31 schrieb gnosygnu:
>> If you are also using the same software (Wikibase on MediaWiki), the XML 
>> dumps
>> should Just Work (tm). The idea of the XML dumps is that the "text" blobs are
>> opaque to 3rd parties, but will continue to work with future versions of
>> MediaWiki & friends (with a compatible configuration - which is rather 
>> tricky).
> 
> Not sure I follow. Even from a Wikibase on MediaWiki perspective, the
> XML dumps are still incomplete (since they're missing
> mainsnak.datatype).

The datatype is implicit, it can be derived from the property ID. You can find
it by looking at the Property page's JSON.

The XML dumps are complete by definition, since they contain a raw copy of the
primary data blob. All other data is derived from this. However, since they are
"raw", they are not easy to process by consumers, and we make no guarantees
regarding the raw data format.

We include the data type in the statements of the canonical JSON dumps for
convenience. We are planning to add more things to the JSON output for
convenience. That does not make the XML dumps incomplete.

You use case is special since you want canonical JSON *and* wikitext. I'm afraid
you will have to process both kinds of dumps.

> One line of the file specifically checks for datatype: "if datatype
> and datatype == 'commonsMedia' then". This line always evaluates to
> false, even though you are looking at an entity (Q38: Italy) and
> property (P41: flag image) which does have a datatype for
> "commonsMedia" (since the XML dump does not have "mainsnak.datatype").

That is incorrect. datatype will always be set in Lua, even if it is not present
in the XML. Remember that it is not present in the primary blob on Wikidata
either. Wikibase will look it up internally, from the wb_property_info table,
and make that information available to Lua.

When loading the XML file, a lot of secondary information is extracted into
database tables for this kind of use, e.g. all the labels and descriptions go
into the wb_terms table, property types go into wb_property_info, links to other
items go to page_links, etc.

Actually, you may have to run refreshLinks.php or rebuildall.php after doing the
XML import, I'm not sure which is needed when any more. But the point is: the
XML dump contains all information needed to reconstruct the content. This is
true for wikitext as well as for Wikibase JSON data. All derived information is
extracted upon import, and is made available via the respective APIs, including
Lua, just like on Wikidata.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?

2016-11-27 Thread Daniel Kinzler

Am 27.11.2016 um 01:15 schrieb gnosygnu:
> This is useful, but unfortunately it won't suffice. Wikidata also has
> pages which are wikitext (for example,
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Names). These
> wikitext pages are in the XML dumps, but aren't in the stub dumps nor
> the JSON dumps. I actually do use these Wikidata wikitext entries to
> try to reproduce Wikidata in its entirety. 

If you are also using the same software (Wikibase on MediaWiki), the XML dumps
should Just Work (tm). The idea of the XML dumps is that the "text" blobs are
opaque to 3rd parties, but will continue to work with future versions of
MediaWiki & friends (with a compatible configuration - which is rather tricky).


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] [wikicite-discuss] Entity tagging and fact extraction (from a scholarly publisher perspective)

2016-11-27 Thread Daniel Kinzler

Am 18.11.2016 um 22:12 schrieb Ruben Verborgh:
> In case you consider scenarios where clients perform federation,
> you might be interested to see that lightweight interfaces
> can outperform full SPARQL interfaces:
> http://linkeddatafragments.org/publications/jws2016.pdf#page=26

We are indeed planning to experiment with LDF, see
https://phabricator.wikimedia.org/T136358


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Can mainsnak.datatype be included in the pages-articles.xml dump?

2016-11-26 Thread Daniel Kinzler

Hi gnosygnu!

The JSON in the XML dumps is the raw contents of the storage backend. It can't
be changed retroactively, and re-encoding everything on the fly would be too
expensive. Also, the JSON embedded in the XML files is not officially supported
as a stable interface of Wikibase. The JSON format in the XML files can change
without notice, and you may encounter different representations even within the
same dump.

I recommend to use the JSON dumps, they contain our data in canonical form. To
avoid downloading redundant information, you can use one of the
wikidatawiki-20161120-stub-* dumps instead of the full page dumps. These don't
contain the actual page content, just meta-data.

Caveat: there is currently no dump that contains the JSON of old revisions of
entities in canonical form. You can only get them individually from
Special:EntityData, e.g.
<https://www.wikidata.org/wiki/Special:EntityData/Q23.json?oldid=30279>

HTH
-- daniel

Am 26.11.2016 um 02:13 schrieb gnosygnu:
> Hi everyone. I have a question about the Wikidata xml dump, but I'm
> posting this question here, because it looks more related to Wikidata.
> 
> In short, it seems that the "pages-articles.xml" does not include the
> datatype property for snaks. For example, the xml dump does not list a
> datatype for Q38 (Italy) and P41 (flag image). In contrast, the json
> dump does list a datatype of "commonsMedia".
> 
> Can this datatype property be included in future xml dumps? The
> alternative would be to download two large and redundant dumps (xml
> and json) in order to reconstruct a Wikidata instance.
> 
> More information is provided below the break. Let me know if you need
> anything else.
> 
> Thanks.
> 
> 
> 
> Here's an excerpt from the xml data dump for Q38 (Italy) and P41 (flag
> image). Notice that there is no "datatype" property
>   // 
> https://dumps.wikimedia.org/wikidatawiki/20161120/wikidatawiki-20161120-pages-articles.xml.bz2
>   "mainsnak": {
> "snaktype": "value",
> "property": "P41",
> "hash": "a3bd1e026c51f5e0bdf30b2323a7a1fb913c9863",
> "datavalue": {
>   "value": "Flag of Italy.svg",
>   "type": "string"
> }
>   },
> 
> Meanwhile, the API and the JSON dump lists a datatype property of
> "commonsMedia":
>   // https://www.wikidata.org/w/api.php?action=wbgetentities=q38
>   // 
> https://dumps.wikimedia.org/wikidatawiki/entities/20161114/wikidata-20161114-all.json.bz2
>   "P41": [{
> "mainsnak": {
>   "snaktype": "value",
>   "property": "P41",
>   "datavalue": {
> "value": "Flag of Italy.svg",
> "type": "string"
>   },
>   "datatype": "commonsMedia"
> },
> 
> As far as I can tell, the Turtle (ttl) dump does not list a datatype
> property either, but this may be because I don't understand its
> format.
>   wd:Q38 p:P41 wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D .
>   wds:q38-574446A6-FD05-47AE-86E3-AA745993B65D a wikibase:Statement,
>   wikibase:BestRank ;
> wikibase:rank wikibase:NormalRank ;
> ps:P41 
> <http://commons.wikimedia.org/wiki/Special:FilePath/Flag%20of%20Italy.svg>
> ;
> pq:P580 "1946-06-19T00:00:00Z"^^xsd:dateTime ;
> pqv:P580 wdv:204e90b1bce9f96d6d4ff632a8da0ecc .
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] Two questions about Lexeme Modeling

2016-11-25 Thread Daniel Kinzler

Am 25.11.2016 um 12:16 schrieb David Cuenca Tudela:
>> If we want to avoid this complexity, we could just go by prefix. So if the
>> languages is "de", variants like "de-CH" or "de-DE_old" would be considered 
>> ok.
>> Ordering these alphabetically would put the "main" code (with no suffix) 
>> first.
>> May be ok for a start.
> 
> I find this issue potentially controversial, and I think that the community at
> large should be involved in this matter to avoid future dissatisfaction and to
> promote involvement in the decision-making.

We should absolutely discuss this with Wiktionarians. My suggestion was intended
as a baseline implementation. Details about the restrictions on which variants
are allowed on a Lexeme, or in what order they are shown, can be changed later
without breaking anything.

> In my opinion it would be more appropriate to use standardized language codes,
> and then specify the dialect with an item, as it provides greater flexibility.
> However, as mentioned before I would prefer if this topic in particular would 
> be
> discussed with wiktionarians.

Using Items to represent dialects is going to be tricky. We need ISO language
codes for use in HTML and RDF. We can somehow map between Items and ISO codes,
but that's going to be messy, especially when that mapping changes.

So it seems like we need to further discuss how to represent a Lexeme's language
and each lemma's variant. My current thinking is to represent the language as an
Item reference, and the variant as an ISO code. But you are suggesting the
opposite.

I can see why one would want items for dialects, but I currently have no good
idea for making this work with the existing technology. Further investigation is
needed.

I have filed a Phabricator task for investiagting this. I suggest to take the
discussion about how to represent languages/variants/dialects/etc there:

https://phabricator.wikimedia.org/T151626

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] Two questions about Lexeme Modeling

2016-11-25 Thread Daniel Kinzler

Thank you Denny for having an open mind! And sorry for being a nuisance ;)

I think it's very important to have controversial but constructive discussions
about these things. Data models are very hard to change even slightly once
people have started to create and use the data. We need to try hard to get it as
right as possible off the bat.

Some remarks inline below.

Am 25.11.2016 um 03:32 schrieb Denny Vrandečić:
> There is one thing that worries me about the multi-lemma approach, and that 
> are
> mentions of a discussion about ordering. If possible, I would suggest not to
> have ordering in every single Lexeme or even Form, but rather to use the
> following solution:
> 
> If I understand it correctly, we won't let every Lexeme have every arbitrary
> language anyway, right? Instead we will, for each language that has variants
> have somewhere in the configurations an explicit list of these variants, i.e.
> say, for English it will be US, British, etc., for Portuguese Brazilian and
> Portuguese, etc.

That approach is similar to what we are now doing for sorting Statement groups
on Items. There is a global ordering of properties defined on a wiki page. So
the community can still fight over it, but only in one place :) We can re-order
based on user preference using a Gadget.

For the multi-variant lemmas, we need to declare the Lexeme's language
separately, in addition to the language code associated with each lemma variant.
It seems like the language will probably represented as reference to a Wikidata
Item (that is, a Q-Id). That Item can be associated with an (ordered) list of
matching language codes, via Statements on the Item, or via configuration (or,
like we do for unit conversion, configuration generated from Statements on 
Items).

If we want to avoid this complexity, we could just go by prefix. So if the
languages is "de", variants like "de-CH" or "de-DE_old" would be considered ok.
Ordering these alphabetically would put the "main" code (with no suffix) first.
May be ok for a start.

I'm not sure yet on what level we want to enforce the restriction on language
codes. We can do it just before saving new data (the "validation" step), or we
could treat it as a community enforced soft constraint. I'm tending towards the
former, though.

> Given that, we can in that very same place also define their ordering and 
> their
> fallbacks.

Well, all lemmas would fall back on each other, the question is just which ones
should be preferred. Simple heuristic: prefer the shortest language code. Or go
by what MediaWiki does fro the UI (which is what we do for Item labels).

> The upside is that it seems that this very same solution could also be used 
> for
> languages with different scripts, like Serbian, Kazakh, and Uzbek (although it
> would not cover the problems with Chinese, but that wasn't solved previously
> either - so the situation is strictly better). (It doesn't really solve all
> problems - there is a reason why ISO treats language variants and scripts
> independently - but it improves on the vast majority of the problematic 
> cases).

Yes, it's not the only decision we have to make in this regard, but the most
fundamental one, I think.

One consequence of this is that Forms should probably also allow multiple
representations/spellings. This is for consistency with the lemma, for code
re-use, and for compatibility with Lemon.

> So, given that we drop any local ordering in the UI and API, I think that
> staying close to Lemon and choosing a TermList seems currently like the most
> promising approach to me, and I changed my mind. 

Knowing that you won't do that without a good reason, I thank you for the
compliment :)

> My previous reservations still
> hold, and it will lead to some more complexity in the implementation not only 
> of
> Wikidata but also of tools built on top of it,

The complexity of handling a multi-variant lemma is higher than a single string,
but any wikibase client already needs to have the relevant code anyway, to
handle item labels. So I expect little overhead. We'll want the lemma to be
represented in a more compact way in the UI than we currently use for labels,
though.


Thank you all for your help!


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] Determining Wikidata Usage in Wikipedia Pages

2016-11-24 Thread Daniel Kinzler

Am 23.11.2016 um 21:33 schrieb Andrew Hall:
> Hi,
> 
> I’m a PhD student/researcher at the University of Minnesota who (along with 
> Max
> Klein and another grad student/researcher) has been interested in 
> understanding
> the extent to which Wikidata is used in (English, for now) Wikipedia.
> 
> There seems to be no easy way to determine Wikidata usage in Wikipedia pages 
> so
> I’ll describe two approaches we’ve considered as our best attempts at solving
> this problem. I’ll also describe shortcomings of each approach.

There is two pretty easy ways, which you may not have found because they were
added only a couple of months ago:

You can look at the "page information" (action=info, linked from the sidebar),
e.g.
<https://en.wikipedia.org/w/index.php?title=South_Pole_Telescope=info>.
Near the bottom you can find "Wikidata entities used in this page".

The same information is available via an API module,
<https://en.wikipedia.org/w/api.php?action=query=wbentityusage=South_Pole_Telescope>.
See
<https://en.wikipedia.org/w/api.php?action=help=query%2Bwbentityusage>
for documentation.


These URLs will list all direct and indirect usages, and also indicate what part
or aspect of the entity was used.

HTH

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] Two questions about Lexeme Modeling

2016-11-22 Thread Daniel Kinzler

Am 12.11.2016 um 00:08 schrieb Denny Vrandečić:
> I am not a friend of multi-variant lemmas. I would prefer to either have
> separate Lexemes or alternative Forms. 

We have created a decision matrix to help with discussing the pros and cons of
the different approaches. PLease have a look and comment:

https://docs.google.com/spreadsheets/d/1PtGkt6E8EadCoNvZLClwUNhCxC-cjTy5TY8seFVGZMY/edit?ts=5834219d#gid=0

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] Two questions about Lexeme Modeling

2016-11-21 Thread Daniel Kinzler

y is creaking and not working well, and then think about
> these issues.

Slow iteration is nice as long as you don't produce artifact you need to stay
compatible with. I have become extremely wary of lock-in - Wikitext is the worst
lock-in I have ever seen. Some aspects of how we implemented the Wikibase model
for Wikidata also have proven to be really hard to iterate on. Iterating the
model itself is even harder, since it is bound to break all clients in a
fundamental way. We just got very annoyed comments just for making two fields in
the Wikibase model optional.

Switching from single-lemma to multi-lemma would be a major breaking change,
with lots of energy burned on backwards compatibility. The opposite switch would
be much simpler (because it adds guarantees, instead of removing them).

> But until then I would prefer to keep the system as dumb and
> simple as possible.

I would prefer to keep the user generated *data* as straight forward as
possible. That's more important to me than a simple meta-model. The complexity
of the instance data determines the maintenance burden.


Am 20.11.2016 um 21:06 schrieb Philipp Cimiano:
> Please look at the final spec of the lemon model:
>
>
https://www.w3.org/community/ontolex/wiki/Final_Model_Specification#Syntactic_Frames
>
> In particular, check example: synsem/example7

Ah, thank you! I think we could model this in a similar way, by referencing an
Item that represents a (type of) frame from the Sense. Whether this should be a
special field or just a Statement I'm still undecided on.

Is it correct that in the Lemon model, it's not *required* to define a syntactic
frame for a sense? Is there something like a default frame?

> 2) Such spelling variants are modelled in lemon as two different
> representations
> of the same lexical entry.
[...]
> In our understanding these are not two different forms as you mention, but two
> different spellings of the same form.

Indeed, sorry for being imprecise. And yes, if we have a multi-variant lemma, we
should also have multi-variant Forms. Our lemma corresponds to the canonical
form in Lemon, if I understand correctly.

> The preference for showing e.g. the American or English variant should be
> stated by the application that uses the lexicon.

I agree. I think Denny is concerned with putting that burden on the application.
Proper language fallback isn't trivial, and the application may be a light
weight JS library... But I think for the naive case, it's fine to simply show
all representations.


Thank you all for your input!

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] Linking RDF resources for external IDs

2016-11-14 Thread Daniel Kinzler

By the way, I'm also re-considering my original approach:

Simply replace the plain value with the resolved URI when we can. This would
*not* cause the same property to be used with literals and non-literals, since
the predicate name is derived from the proeprty ID, and a property either
provides a URI mapping, or it doesn't.

Problems would arise during transition, making this a breaking change:

1) when introducing this feature, existing queries that compare a newly
URI-ified property to a string literal will fail.

2) when a URI mapping is added, we'd either need to immediately update all
statements that use that property, or the triple store would have some old
triples where the relevant predicates point to a literal, and some new triples
where it pints to a resource.

This would avoid duplicating more predicates, and keeps the model straight
forward. But it would cause a bumpy transition.

Please let me know which approach you prefer. Have a look at the files attached
to my original message.

Thanks,
Daniel

Am 09.11.2016 um 17:46 schrieb Daniel Kinzler:
> Hi Stas, Markus, Denny!
> 
> For a long time now, we have been wanting to generate proper resource 
> references
> (URIs) for external identifier values, see
> <https://phabricator.wikimedia.org/T121274>.
> 
> Implementing this is complicated by the fact that "expanded" identifiers may
> occur in four different places in the data model (direct, statement, 
> qualifier,
> reference), and that we can't simply replace the old string value, we need to
> provide an additional value.
> 
> I have attached three files with snippets of three different RDF mappings:
> - Q111.ttl - the status quo, with normalized predicates declared but not used.
> - Q111.rc.ttl - modeling resource predicates separately from normalized 
> values.
> - Q111.norm.ttl - modeling resource predicates as normalized values.
> 
> The "rc" variant means more overhead, the "norm" variant may have semantic
> difficulties. Please look at the two options for the new mapping and let me 
> know
> which you like best. You can use a plain old diff between the files for a 
> first
> impression.
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

[Wikidata-tech] Two questions about Lexeme Modeling

2016-11-11 Thread Daniel Kinzler

Hi all!

There is two questions about modelling lexemes that are bothering me. One is an
old question, and one I only came across recently.

1) The question that came up for me recently is how we model the grammatical
context for senses. For instance, "to ask" can mean requesting information, or
requesting action, depending on whether we use "ask somebody about" or "ask
somebody to". Similarly, "to shit" has entirely different meanings when used
reflexively ("I shit myself").

There is no good place for this in our current model. The information could be
placed in a statement on the word Sense, but that would be kind of non-obvious,
and would not (at least not easily) allow for a concise rendering, in the way we
see it in most dictionaries ("to ask sbdy to do sthg"). The alternative would be
to treat each usage with a different grammatical context as a separate Lexeme (a
verb phrase Lexeme), so "to shit oneself" would be a separate lemma. That could
lead to a fragmentation of the content in a way that is quite unexpected to
people used to traditional dictionaries.

We could also add this information as a special field in the Sense entity, but I
don't even know what that field should contain, exactly.

Got a better idea?


2) The older question is how we handle different renderings (spellings, scripts)
of the same lexeme. In English we have "color" vs "colour", in German we have
"stop" vs "stopp" and "Maße" vs "Masse". In Serbian, we have a Roman and
Cyrillic rendering for every word. We can treat these as separate Lexemes, but
that would mean duplicating all information about them. We could have a single
Lemma, and represent the others as alternative Forms, or using statements on the
Lexeme. But that raises the question which spelling or script should be the
"main" one, and used in the lemma.

I would prefer to have multi-variant lemmas. They would work like the
multi-lingual labels we have now on items, but restricted to the variants of a
single language. For display, we would apply a similar language fallback
mechanism we now apply when showing labels.

2b) if we treat lemmas as multi-variant, should Forms also be multi-variant, or
should they be per-variant? Should the glosse of a Sense be multi-variant? I
currently tend towards "yes" for all of the above.


What do you think?


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] Why term for lemma?

2016-11-11 Thread Daniel Kinzler

Am 11.11.2016 um 14:38 schrieb Thiemo Mättig:
> Tpt asked:
> 
>> why having both the Term and the MonolingualText data structures? Is it just 
>> for historical reasons (labels have been introduced before statements and so 
>> before all the DataValue system) or is there an architectural reason behind?
> 
> That's not the only reason.

Besides the code perspective that Thiemo just explained, there is also the
conceptual perspective: Terms are editorial information attached to an entity
for search and display. DataValues such as MonolingualText represent a value
withing a Statement, citing an external authority. This leads to slight
differences in behavior - for instance, the set of languages available for Terms
is suptly different from the set of languages available for MonolongualText.

Anyway, the fact that the two are totally separate has historical reasons. One
viable approach for code sharing would be to have MonolingualText contain a Term
object. But that would introduce more coupling between our components. I don't
think the little bit of code that could be shared is worth the effort.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

[Wikidata-tech] Linking RDF resources for external IDs

2016-11-09 Thread Daniel Kinzler

Hi Stas, Markus, Denny!

For a long time now, we have been wanting to generate proper resource references
(URIs) for external identifier values, see
<https://phabricator.wikimedia.org/T121274>.

Implementing this is complicated by the fact that "expanded" identifiers may
occur in four different places in the data model (direct, statement, qualifier,
reference), and that we can't simply replace the old string value, we need to
provide an additional value.

I have attached three files with snippets of three different RDF mappings:
- Q111.ttl - the status quo, with normalized predicates declared but not used.
- Q111.rc.ttl - modeling resource predicates separately from normalized values.
- Q111.norm.ttl - modeling resource predicates as normalized values.

The "rc" variant means more overhead, the "norm" variant may have semantic
difficulties. Please look at the two options for the new mapping and let me know
which you like best. You can use a plain old diff between the files for a first
impression.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix wikibase: <http://wikiba.se/ontology-beta#> .
@prefix wdata: <http://localhost/daniel/wikidata/index.php/Special:EntityData/> .
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix wdref: <http://www.wikidata.org/reference/> .
@prefix wdv: <http://www.wikidata.org/value/> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
@prefix wdtn: <http://www.wikidata.org/prop/direct-normalized/> .
@prefix p: <http://www.wikidata.org/prop/> .
@prefix ps: <http://www.wikidata.org/prop/statement/> .
@prefix psv: <http://www.wikidata.org/prop/statement/value/> .
@prefix psn: <http://www.wikidata.org/prop/statement/value-normalized/> .
@prefix pq: <http://www.wikidata.org/prop/qualifier/> .
@prefix pqv: <http://www.wikidata.org/prop/qualifier/value/> .
@prefix pqn: <http://www.wikidata.org/prop/qualifier/value-normalized/> .
@prefix pr: <http://www.wikidata.org/prop/reference/> .
@prefix prv: <http://www.wikidata.org/prop/reference/value/> .
@prefix prn: <http://www.wikidata.org/prop/reference/value-normalized/> .
@prefix wdno: <http://www.wikidata.org/prop/novalue/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix schema: <http://schema.org/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix prov: <http://www.w3.org/ns/prov#> .

wd:Q111 a wikibase:Item ;
	rdfs:label "silver"@en ;
	skos:prefLabel "silver"@en ;
	schema:name "silver"@en ;
	wdt:P20 "asdfasdf" ;
	wdtn:P20 <http://musicbrainz.org/asdfasdf/place> .

wd:Q111 p:P20 wds:Q111-5459c580-4b6f-c306-184f-b7fa132b32d8 .

wds:Q111-5459c580-4b6f-c306-184f-b7fa132b32d8 a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P20 "asdfasdf" ;
	psn:P20 <http://musicbrainz.org/asdfasdf/place> ;
	pq:P30 "qwertyqwerty" ;
	pqn:P30 <http://vocab.getty.edu/aat/qwertyqwerty> ;
	prov:wasDerivedFrom wdref:7335a5598064cd8716cc9e31d164f2803e376b99 .

wdref:7335a5598064cd8716cc9e31d164f2803e376b99 a wikibase:Reference ;
	pr:P40 "zxcvbnzxcvbn" ;
	prn:P40 <https://www.sbfi.admin.ch/ontology/occupation/zxcvbnzxcvbn> .
	
wd:P20 a wikibase:Property ;
	wikibase:propertyType <http://wikiba.se/ontology-beta#ExternalId> ;
	wikibase:directClaim wdt:P20 ;
	wikibase:directClaimNormalized wdtn:P20 ;
	wikibase:claim p:P20 ;
	wikibase:statementProperty ps:P20 ;
	wikibase:statementValue psv:P20 ;
	wikibase:statementValueNormalized psn:P20 ;
	wikibase:qualifier pq:P20 ;
	wikibase:qualifierValue pqv:P20 ;
	wikibase:qualifierValueNormalized pqn:P20 ;
	wikibase:reference pr:P20 ;
	wikibase:referenceValue prv:P20 ;
	wikibase:referenceValueNormalized prn:P20 ;
	wikibase:novalue wdno:P20 .

p:P20 a owl:ObjectProperty .

psv:P20 a owl:ObjectProperty .

pqv:P20 a owl:ObjectProperty .

prv:P20 a owl:ObjectProperty .

psn:P20 a owl:ObjectProperty .

pqn:P20 a owl:ObjectProperty .

prn:P20 a owl:ObjectProperty .

wdt:P20 a owl:DatatypeProperty .

ps:P20 a owl:DatatypeProperty .

pq:P20 a owl:DatatypeProperty .

pr:P20 a owl:DatatypeProperty .

wdtn:P20 a owl:ObjectProperty .

wdno:P20 a owl:Class ;
	owl:complementOf _:genid2 .

_:genid2 a owl:Restriction ;
	owl:onProperty wdt:P20 ;
	owl:someValuesFrom owl:Thing .

wd:P20 rdfs:label "MusicBrainz place ID"@en .
@prefix rdf: <http://www.w3.org/1999/02/22-rd

[Wikidata] BREAKING CHANGE: Quantity Bounds Become Optional

2016-11-04 Thread Daniel Kinzler

Hi all!

This is an announcement for a breaking change to the Wikidata API, JSON and RDF
binding, to go live on 2016-11-15. It affects all clients that process quantity
values.


As Lydia explained in the mail she just sent to the Wikidata list, we have been
working on improving our handling of quantity values. In particular, we are
making upper- and lower bounds optional: When the uncertainty of a quantity
measurement is not explicitly known, we no longer require the bounds to somehow
be specified anyway, but allow them to be omitted.

This means that the upperBound and lowerBound fields of quantity values become
optional in all API input and output, as well as the JSON dumps and the RDF 
mapping.

Clients that import quantities should now omit the bounds if they do not have
explicit information on the uncertainty of a quantity value.

Clients that process quantity values must be prepared to process such values
without any upper and lower bound set.


That is, instead of this

"datavalue":{
  "value":{
"amount":"+700",
"unit":"1",
"upperBound":"+710",
"lowerBound":"+690"
  },
  "type":"quantity"
},


clients may now also encounter this:

"datavalue":{
  "value":{
"amount":"+700",
"unit":"1"
  },
  "type":"quantity"
},


The intended semantics is that the uncertainty is unspecified if not bounds are
present in the XML, JSON or RDF representation. If they are given, the
interpretation is as before.


For more information, see the JSON model documentation [1]. Note that quantity
bounds have been marked as optional in the documentation since August. The RDF
mapping spec [2] has been adjusted accordingly.


This change is scheduled for deployment on November 15.

Please let us know if you have any comments or objections.

-- daniel


[1] https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON
[2] https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Quantity


Relevant tickets:
* <https://phabricator.wikimedia.org/T115269>

Relevant patches:
* <https://gerrit.wikimedia.org/r/#/c/302248>
*
<https://github.com/DataValues/Number/commit/2e126eee1c0067c6c0f35b4fae0388ff11725307>

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata-tech] BREAKING CHANGE: Quantity Bounds Become Optional

2016-11-04 Thread Daniel Kinzler

Hi all!

This is an announcement for a breaking change to the Wikidata API, JSON and RDF
binding, to go live on 2016-11-15. It affects all clients that process quantity
values.


As Lydia explained in the mail she just sent to the Wikidata list, we have been
working on improving our handling of quantity values. In particular, we are
making upper- and lower bounds optional: When the uncertainty of a quantity
measurement is not explicitly known, we no longer require the bounds to somehow
be specified anyway, but allow them to be omitted.

This means that the upperBound and lowerBound fields of quantity values become
optional in all API input and output, as well as the JSON dumps and the RDF 
mapping.

Clients that import quantities should now omit the bounds if they do not have
explicit information on the uncertainty of a quantity value.

Clients that process quantity values must be prepared to process such values
without any upper and lower bound set.


That is, instead of this

"datavalue":{
  "value":{
"amount":"+700",
"unit":"1",
"upperBound":"+710",
"lowerBound":"+690"
  },
  "type":"quantity"
},


clients may now also encounter this:

"datavalue":{
  "value":{
"amount":"+700",
"unit":"1"
  },
  "type":"quantity"
},


The intended semantics is that the uncertainty is unspecified if not bounds are
present in the XML, JSON or RDF representation. If they are given, the
interpretation is as before.


For more information, see the JSON model documentation [1]. Note that quantity
bounds have been marked as optional in the documentation since August. The RDF
mapping spec [2] has been adjusted accordingly.


This change is scheduled for deployment on November 15.

Please let us know if you have any comments or objections.

-- daniel


[1] https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON
[2] https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Quantity


Relevant tickets:
* <https://phabricator.wikimedia.org/T115269>

Relevant patches:
* <https://gerrit.wikimedia.org/r/#/c/302248>
*
<https://github.com/DataValues/Number/commit/2e126eee1c0067c6c0f35b4fae0388ff11725307>

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] Why term for lemma?

2016-11-02 Thread Daniel Kinzler

Am 02.11.2016 um 21:53 schrieb Denny Vrandečić:
> Hi,
> 
> I am not questioning or criticizing, just curious - why was it decided to
> implement lemmas as terms? I guess it is for code reuse purposes, but just
> wanted to ask.

Yes, ideed. We have code for rendering, serializing, indexing, and searching
Terms. We do not have any infrastructure for plain strings. We could also handle
it as a monolingual-text StringValue, but that offers less re-use, in particular
no search, and no batch lookup for rendering.

Also, conceptually, the lemma is rather similar to a label. And it's always *in*
a language. The only question is whether we only have one, or multiple (for
variants/scripts). But one will do for now.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

[Wikidata] Stable Interface Policy: Database Schema as a stable API

2016-10-28 Thread Daniel Kinzler

Hi all!

I plan to add the wikibase (SQL) database schema as a stable interface.

Typically, a database schema is considered internal, but since we have tools on
labs that may rely on the current schema, breaking changes to the schema should
be announced as such. To address this, I plan to add the following paragraph to
the Stable Public APIs section:

The database schema as exposed on Wikimedia Labs is considered a stable
interface. Changes to the available tables and fields are subject to the
above notification policy.

In addition, I plan to add the following paragraph to the Extensibility section:

In a tabular data representation, such as a relational database schema, the
addition of fields is not considered a breaking change. Any change to the
interpretation of a field, as well as the removal of fields, are considered
breaking. Changes to existing unique indexes or primary keys are breaking
changes; changes to other indexes as well as the addition of new unique
indexes are not breaking changes.

If you have any thoughts ob objections, please let me know at
<https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Database_Schema_as_a_stable_API>

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Acquiring general knowledge from Wikidata

2016-10-25 Thread Daniel Kinzler

Am 25.10.2016 um 17:27 schrieb Federico Leva (Nemo):
> As far as I know, an axiom by definition can't be false. What definition are 
> you
> using? Maybe some jargon specific to this research field?

An axiom is always true in the context of the formal model it helps define. But
if that model corresponds to something in the real world, the axium may well
found to be "false" when applied there.

Say you have an axiom that says "all humans are born with two legs"; this is
then (by definition) true in your model, but may not be an accurate modelling of
the real world, since very rarely, humans are born with more or less than two 
legs.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Greater than 400 char limit for Wikidata string data types

2016-10-08 Thread Daniel Kinzler

That was discussed and declined a while ago, see
<https://phabricator.wikimedia.org/T126862>. Though I think the proposed
realization was presentational rather than functional. I'll have to re-read the
discussion, though.

Am 08.10.2016 um 12:07 schrieb Thomas Douillard:
> Probably a silly question but ... did you all consider creating a datatype for
> molecue representation ? This seem to be a very similar usecase than 
> mathematica
> formula. Essentially we're not dealing with a raw string but a representation 
> of
> molecule formulas, with its own encoding ...
> 
> Changing the limit seem to be a poor workaround to a dedicated datatype - 
> nobody
> seems to have found a relevant usecase and it seem to me that we're 
> essentially
> abusing strings for storing blobs ...
> 
> 2016-10-08 11:33 GMT+02:00 Egon Willighagen <egon.willigha...@gmail.com
> <mailto:egon.willigha...@gmail.com>>:
> 
> 
> 
> On Sat, Oct 8, 2016 at 11:28 AM, Lydia Pintscher
> <lydia.pintsc...@wikimedia.de <mailto:lydia.pintsc...@wikimedia.de>> 
> wrote:
> 
> On Sat, Oct 8, 2016 at 11:23 AM, Egon Willighagen
> <egon.willigha...@gmail.com <mailto:egon.willigha...@gmail.com>> 
> wrote:
> > Ah, those numbers are for 
> https://www.wikidata.org/wiki/Property:P234
> <https://www.wikidata.org/wiki/Property:P234> ...
> 
> External identifier then. Cool. And for string like in
> https://www.wikidata.org/wiki/Property:P233
> <https://www.wikidata.org/wiki/Property:P233>? Sebastian's initial 
> email 
> 
> says 1500 to 2000. Is this still a good number after this discussion?
> 
> 
> Yes, that would cover more than 99.9% of all InChIs in PubChem. (See
> Sebastian's reply earlier in this thread.)
> 
> Egon
> 
> -- 
> E.L. Willighagen
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw 
> <http://se.linkedin.com/in/egonw>
> Blog: http://chem-bla-ics.blogspot.com/ 
> <http://chem-bla-ics.blogspot.com/>
> PubList: http://www.citeulike.org/user/egonw/tag/papers
> <http://www.citeulike.org/user/egonw/tag/papers>
> ORCID: -0001-7542-0286
> ImpactStory: https://impactstory.org/u/egonwillighagen
> <https://impactstory.org/u/egonwillighagen>
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> <https://lists.wikimedia.org/mailman/listinfo/wikidata>
> 
> 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Elevation

2016-09-28 Thread Daniel Kinzler

Am 28.09.2016 um 14:13 schrieb Markus Bärlocher:
> Da es um grundlegende Modellierungsfragen geht - wer kann hier helfen?

"Die Community"...

> Ich brauche ein System, um in WD geografische Höhen zu modellieren.
> 
> Eine geografische Höhenangabe besteht aus:
> 1. Zahl (127,53)
> 2. Einheit (Meter, feet)
> 3. Höhenreferenzebene (NN, NHN, LAT, MSL, MHWS, ...)
> 
> Wenn eine der drei Angaben fehlt, ist die Aussage unbrauchbar.

Die Referenzebene kann wie gersagt als Qualifier angegeben werden. Es wäre
sinnvoll, die Property "Elevation over sea level" entsprechend umzudefinieren
oder zu ersetzen. Eine andere Lösung fällt mir nicht ein. Es sei denn, es geht
um "Lichte Höhe", dann kannst du P2793 benutzen. Du brauchst aber immernoch eine
Property für "Reference level". Ich glaube, die gibt es noch nicht.

> Sinnvoll wäre zusatzlich eine Angabe zu:
> 4. Genauigkeit
> 
> Verstehe ich Dich richtig?
> Du schlägst vor, die Genauigkeit hinter die Zahl zu schreiben?
> und beides in einen String zusammenzuführen?
> also 1., 2. und 4. in ein Feld zu packen?
> 
> Beispiel: 123,53±0,005m

Ja, genau so. Oder so ähnlich - bei der Eingabe muss die Einheit momentan noch
separat ausgewählt werden.

> Dann müsste man jede Zahl erst auseinanderdröseln
> um sie in einer Tabelle darstellen und numerisch sortieren zu können?

Nein, das ist ja kein Text-Feld. Wert, Genauigkeit, und Einheit werden separate
gespeichert, dafür haben wir "data types." Details findest du hier:
<https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#quantity> und hier
<https://www.mediawiki.org/wiki/Wikibase/DataModel#Quantities>.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Elevation

2016-09-28 Thread Daniel Kinzler

Am 27.09.2016 um 23:14 schrieb Info WorldUniversity:
> Hi Daniel, Markus and Wikidatans, 
> 
> Thanks for your interesting "modeling elevation with Wikidata" conversation. 
> 
> Daniel, in a related vein and conceptually, how would you model elevation 
> change
> over time (e.g. in a Google Street View/Maps/Earth with TIME SLIDER,
> conceptually, for example) with Wikidata, building on the example you've 
> already
> shared? 

You would use the "point in time" qualifier. We use this a lot with population
data, see for instance <https://www.wikidata.org/wiki/Q64#P1082>.

> Would there be a wikidata Q-item for all 46 sub levels, for example?

That's a question of desirable modelling granularity. I would suppose that for
troy, we would have one item per sub-level, since it's such a famous site. But
we would probably not have every sub-level of every archeological excavation.
This is always a question of balance, and always a matter of debate.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Elevation

2016-09-27 Thread Daniel Kinzler

So you want to e.g. give the height of a bridge above the "mean high water
spring" level of the river it crosses?

You wouldn't use a unit for that, but a qualifier. The unit would be meter (or
feet or whatever).

The "elevation" property we have (P2044) is defined to refer to NN, so it's no
good for your purpose. To model what you want nicely, you would need a more
general "elevation" property, and a "reference level" property to use as a
qualifier. Then you could express something like "elevation: 28.3m;
reference-level: Q6803625".

I'm sure there are other options, but I see no good option that would be
possible with the properties I know.

Anyway, this is really a modelling question, and it can't really be solved with
units.

Am 27.09.2016 um 20:26 schrieb Markus Bärlocher:
> Hallo Daniel,
> 
> nein, ich suche nicht einen WP-Artikel über MHWS,
> (diesen habe ich nur verlinkt als Erklärung)
> 
> sondern eine Einheit/unit,
> um MHWS als Bezugshorizont für geografische Höhen zu beschreiben.
> 
> MHWS wird verwendet, um Brückendurchfahrtshöhen über Wasser zu
> definieren, sowie für die geografische Höhe von Leuchtfeuern.
> 
> Mit herzlichem Gruss,
> Markus
> 
> 
> Am 27.09.2016 um 19:28 schrieb Daniel Kinzler:
>> Am 27.09.2016 um 19:10 schrieb Markus Bärlocher:
>>> I look for this:
>>> "Elevation in metres above 'mean high water spring' level."
>>>
>>> Which means the geographic hight above MHWS:
>>> https://en.wikipedia.org/wiki/Mean_high_water_spring
>>
>> By clicking on "Wikidata Item" in the sidebar of that page, I get to
>> https://www.wikidata.org/wiki/Q6803625 ("highest level that spring tides 
>> reach
>> on average over a period of time")
>>
>> Is that what you need?
>>
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-19 Thread Daniel Kinzler

Am 16.09.2016 um 20:46 schrieb Thad Guidry:
> Daniel,
> 
> I wasn't trying to help solve the issues - I'll be quite now :)
> 
> I was helping to expose one of your test cases :)

Ha, sorry for sounding harsh, and thanks for pointing me to "product"! It's a
good test case indeed.

> 'product' is a lexeme - a headword - a basic unit of meaning that has a 'set 
> of
> forms' and those have 'a set of definitions'

In the current model, a Lexeme has forms and senses. Forms don't have senses
directly, the meanings should apply to all forms. This means lexemes have to be
split with higher granularity:

* product (English noun) would be one lexeme, with "products" being the plural
form, and "product's" the genitive, and "products'" the plural genitive. Sense
include the ones you mentioned.
* (to) produce (English verb) would be another lexeme, with forms like
"produces", "produced", "producing", etc, and senses meaning "to create", "to
show", "to make available", etc
* production (English noun) would be another lexeme, with other forms and 
senses.
* produce (English noun) would be another
* producer (English noun) would be another
* produced (English adjective) another
etc...

These lexemes can be linked using some kind of "derived from" statements.

> But a thought just occured to me...
> A. In order to model this perhaps would be to have those headwords stored in
> Wikidata.  Those headwords ideally would not actually be a Q or a P ... but 
> what
> about instead ... L  ?  Wrapping the graph structure itself ?  Pros / Cons ?

That's the plan, yes: Have lexemes (L...) on wikidata, which wrap the structure
of forms and senses, and has statements for the lexeme, as well as for each form
and each sense.

We don't currently plan a "super-structure" for wrapping derived/related lexemse
(product, produce, production, etc). They would just be inter-linked by 
statements.

> B.  or do we go with Daniel's suggestion of linking out to headwords and not
> actually storing them in Wikidata ?  Pros / Cons ?

The link I suggest is between items (Q...) and lexemes (L...), both on Wikidata.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-16 Thread Daniel Kinzler

Am 16.09.2016 um 20:11 schrieb Thad Guidry:
> Denny,
> 
> I would suggest to use https://en.wiktionary.org/wiki/product as that strawman
> proposal.  Because it has 2 levels of Senses.
>   3. Anything that is produced (contains 6 sub-senses)

Modelling sub-senses is a completely different can of worms. The proposed model
doesn't allow this directly (we try to avoid recursive structures), but it can
be done using statements.

Your example doesn't really say anything about how lexemes could be connected to
items as labels/aliases, which is, i believe, what Gerard and Denny were 
discussing.


My usage of "Sense" and "From" follows
<https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2013-08>
which in turn follows the LEMON model <http://lemon-model.net/>.

Synsets are not directly modeled, but it's possible to construct them via
statements.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Greater than 400 char limit for Wikidata string data types

2016-09-16 Thread Daniel Kinzler

Am 16.09.2016 um 19:38 schrieb Denny Vrandečić:
> Markus' description of the decision for the limit corresponds with mine. I 
> also
> think that this decision can be revisited. I would still advice for caution, 
> due
> to technical issues, but I am sure that the development team will make a
> well-informed decision on this. It would be sad if valid usecases could not be
> supported due to that.

I agree, but re-considering this will have to wait until we have a better
solution for storing terms. The current mechanism, the wb_terms table, is a
massive performance bottleneck, and stuffing more data in there makes me very
uncomfortable.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-16 Thread Daniel Kinzler

Am 16.09.2016 um 19:41 schrieb Denny Vrandečić:
> Yes, there should be some connection between items and lexemes, but I am still
> hazy about details on how exactly this should look like. If someone could
> actually make a strawman proposal, that would be great.
> 
> I think the connection should live in the statement space, and not be on the
> level of labels, but that is just a hunch. I'd be happy to see proposals 
> incoming.

My thinking is this:

On some Sense of a Lexeme, there is a Statement saying that this Sense refers to
a given concept (Item). If the property for stating this is well-known, we can
track the Sense-to-Item relationship in the database. We can then automatically
show the lexeme's lemma as a (pseudo-)alias on the Item, and perhaps also use it
(and maybe all forms of the lexeme!) for indexing the item for search.  So:

  from ( Lexeme - Sense - Statement -> Item )
  we can derive ( Item -> Lexeme - Forms )

In the beginning of Wikidata, I was very reluctant about the software knowing
about "magic" properties. Now I feel better about this, since wikidata
properties are established as a permanent vocabulary that can be used by any
software, including our own.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-13 Thread Daniel Kinzler

Am 13.09.2016 um 15:37 schrieb Gerard Meijssen:
> Hoi,
> You assume that it is not good to have lexicological information in our 
> existing
> items. With Wiktionary support you bring such information on board. It would 
> be
> really awkward when for every concept there has to be an item in two 
> databases.

It will be two namespaces in the same project.

But we will not duplicate items. The proposed structure is not concept-centered
like Omegawiki is. It will be centered about lexemes, like Wiktionary is, but
with a higher level of granularity (a lexeme corresponds to one "morphological"
section on a Wiktionary page).

> Why is there this problem with lexicologival information and how will the
> current data be linked to the future "Wiktionary-data" information if there 
> are
> to be two databases?

Because "bumblebee"  "noun" conflicts with "bumblebee"
 "insect". They can't both be true for the same thing, because
nouns are not insects. One is true for the word, the other is true for the
concept. So they need to be treated separately.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Let's move forward with support for Wiktionary

2016-09-13 Thread Daniel Kinzler

Am 13.09.2016 um 17:16 schrieb Gerard Meijssen:
> Hoi,
> The database design for OmegaWiki had a distinction between the concept and 
> all
> the derivatives for them.

Wikidata will have Lexemes and their Forms and Senses.

> So bumblebee is more complex than just "instance of" noun. It is an English
> noun. "Hommel" is connected as a Dutch noun for the same concept and "hommels"
> is the Dutch plural...

Wikidata would have a Lexeme for "bumblebee" (english noun) and one for "Hommel"
(dutch noun). Both would have a sense that would describe them as a flying
insect (and perhaps other word senses, such as Q1626135, a creater on the moon).
The senses that refer to the flying insect would be considered translations of
each other, and both senses would refer to the same concept.

So "bumblebee" (insect) is a translation of "Hommel" (insect), and both refer to
the genus Bombus (Q25407). "Hommel" (creater) would share the morphology of
"Hommel" (insect), as it has the same forms (I assume), but it won't share the
translations.

Having lexeme-specific word-senses avoids the loss of connotation and nuance
that you get when you force words of different languages on a shared meaning.
The effect of referring to the same concept can still be achieved via the
reference to a concept (item).

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Proposed update to the stable interfaces policy

2016-09-13 Thread Daniel Kinzler

Tomorrow I plan to apply the following update to the Stable Interface Policy:

https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Proposed_change_to_to_the_.22Extensibility.22_section

Please comment there if you have any objections.

Thanks!

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata-tech] Proposed update to the stable interfaces policy

2016-09-13 Thread Daniel Kinzler

Tomorrow I plan to apply the following update to the Stable Interface Policy:

https://www.wikidata.org/wiki/Wikidata_talk:Stable_Interface_Policy#Proposed_change_to_to_the_.22Extensibility.22_section

Please comment there if you have any objections.

Thanks!

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

[Wikidata] Announcing the Wikidata Stable Interface Policy

2016-08-23 Thread Daniel Kinzler

Hello all!

After a brief period for final comments (thanks everyone for your input!), the
Stable Interface Policy is now official. You can read it here:

<https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy>

This policy is intended to give authors of software that accesses Wikidata a
guide to what interfaces and formats they can rely on, and which things can
change without warning.

The policy is a statement of intent given by us, the Wikidata development team,
regarding the software running on the site. It does not apply to any content
maintained by the Wikidata community.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata-tech] Announcing the Wikidata Stable Interface Policy

2016-08-23 Thread Daniel Kinzler

Hello all!

After a brief period for final comments (thanks everyone for your input!), the
Stable Interface Policy is now official. You can read it here:

<https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy>

This policy is intended to give authors of software that accesses Wikidata a
guide to what interfaces and formats they can rely on, and which things can
change without warning.

The policy is a statement of intent given by us, the Wikidata development team,
regarding the software running on the site. It does not apply to any content
maintained by the Wikidata community.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] Breaking change in JSON serialization?

2016-08-16 Thread Daniel Kinzler

Am 11.08.2016 um 23:12 schrieb Peter F. Patel-Schneider:
> Until suitable versioning is part of the Wikidata JSON dump format and
> contract, however, I don't think that consumers of the dumps should just
> ignore new fields.

Full versioning is still in the future, but I'm happy that we are in the process
of finalizing a policy on stable interfaces, including a contract regarding
adding fields:
<https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy>.
Please comment on the talk page.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Policy on Interface Stability: final feedback wanted

2016-08-16 Thread Daniel Kinzler

Hello all,

repeated discussions about what constitutes a breaking change has prompted us,
the Wikidata development team, to draft a policy on interface stability. The
policy is intended to clearly define what kind of change will be announced when
and where.

A draft of the policy can be found at

 <https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy>

Please comment on the talk page.

Note that this policy is not about the content of the Wikidata site, it's a
commitment by the development team regarding the behavior of the software
running on wikidata.org. It is intended as a reference for bot authors, data
consumers, and other users of our APIs.

We plan to announce this as the development team's official policy on Monday,
August 22.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata-tech] Policy on Interface Stability: final feedback wanted

2016-08-16 Thread Daniel Kinzler

Hello all,

repeated discussions about what constitutes a breaking change has prompted us,
the Wikidata development team, to draft a policy on interface stability. The
policy is intended to clearly define what kind of change will be announced when
and where.

A draft of the policy can be found at

 <https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy>

Please comment on the talk page.

Note that this policy is not about the content of the Wikidata site, it's a
commitment by the development team regarding the behavior of the software
running on wikidata.org. It is intended as a reference for bot authors, data
consumers, and other users of our APIs.

We plan to announce this as the development team's official policy on Monday,
August 22.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] Render sparql queries using the Histropedia timeline engine

2016-08-11 Thread Daniel Kinzler

Hi Navino!

Thank you for your awesome work!

Since this has caused some confusion again recently, I want to caution you about
a major gotcha regarding dates in RDF and JSON: they use different conventions
to represent years BCE. I just updated our JSON spec to reflect that reality,
see <https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#time>.

There is a lot of confusion about this issue throughout the linked data web,
since the convention changed between XSL 1.0 (which uses -0044 to represent 44
BCE, and -0001 to represent 1 BCE) and XSL 1.1 (which uses -0043 to represent 44
BCE, and + to represent 1 BCE). Our JSON uses the traditional numbering (1
BCE is -0001), while RDF uses the astronomical numbering (1 BCE is +).

Yay, fun.

Am 10.08.2016 um 21:49 schrieb Navino Evans:
> Hi all,
> 
>  
> 
> At long last, we’re delighted to announce you can now render sparql queries
> using the Histropedia timeline engine \o/
> 
> 
> Histropedia WikidataQuery Viewer
> <http://histropedia.com/showcase/wikidata-viewer.html>


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Breaking change in JSON serialization?

2016-08-05 Thread Daniel Kinzler

Am 05.08.2016 um 17:34 schrieb Peter F. Patel-Schneider:
> So some additions are breaking changes then.   What is a system that consumes
> this information supposed to do?  If the system doesn't monitor announcements
> then it has to assume that any new field can be a breaking change and thus
> should not accept data that has any new fields.

The only way to avoid breakage is to monitor announcements. The format is not
final, so changes can happen (not just additions, but also removals), and then
things will break if they are unaware. We tend to be careful and conservative,
and announce any breaking changes in advance, but do not guarantee full
backwards compatibility forever.

The only alternative is a fully versioned interface, which we don't currently
have for JSON, though it has been proposed, see
<https://phabricator.wikimedia.org/T92961>.

> I assume that you are referring to the common practice of adding extra fields
> in HTTP and email transport and header structures under the assumption that
> these extra fields will just be passed on to downstream systems and then
> silently ignored when content is displayed.

Indeed.

> I view these as special cases
> where there is at least an implicit contract that no additional field will
> change the meaning of the existing fields and data.

In the name of the Robustness Principle, I would consider this the normal case,
not the exception.

> When such contracts are
> in place systems can indeed expect to see additional fields, and are permitted
> to ignore these extra fields.

Does this count?
<https://mail-archive.com/wikidata-tech@lists.wikimedia.org/msg00902.html>

> Because XML specifically states that the order of attributes is not
> significant.  Therefore changes to the order of XML attributes is not changing
> the encoding.

That's why I'm proposing to formalize the same kind of contract for us, see
<https://phabricator.wikimedia.org/T142084>.

> Here is where I disagree.  As there is no contract that new fields in the
> Wikidata JSON dumps are not breaking, clients need to treat all new fields as
> potentially breaking and thus should not accept data with unknown fields.

While you are correct that there is no formal contract yet, the topic had been
explicitly discussed before, in particular with Markus.

> I say this for any data, except where there is a contract that such additional
> fields are not meaning-changing.

Quote me on it:

For wikibase serializations, additional fields are not meaning changing. Changes
to the format or interpretation of fields will be announced as a breaking 
change.

>> Clients need to be prepared to encounter entity types and data types they 
>> don't
>> know. But they should also allow additional fields in any JSON object. We
>> guarantee that extra fields do not impact the interpretation of fields they 
>> know
>> about - unless we have announced and documented a breaking change.
> 
> Is this the contract that is going to be put forward?  At some time in the not
> too distant future I hope that my company will be using Wikidata information
> in its products.  This contract is likely to problematic for development
> groups, who want some notion how long they have to prepare for changes that
> can silently break their products.

This is indeed the gist of what I want to establish as a stability policy.
Please comment on <https://phabricator.wikimedia.org/T142084>.

I'm not sure how this could be made less problematic. Even with a fully
versioned JSON interface, available data types etc are a matter of
configuration. All we can do is announce such changes, and advise consumers that
they can safely ignore unknown things.

You raise a valid point about due notice. What do you think would be a good
notice period? Two weeks? A month?


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Breaking change in JSON serialization?

2016-08-05 Thread Daniel Kinzler

Am 05.08.2016 um 15:02 schrieb Peter F. Patel-Schneider:
> I side firmly with Markus here.
> 
> Consumers of data generally cannot tell whether the addition of a new field to
> a data encoding is a breaking change or not.

Without additional information, they cannot know, though for "mix and match"
formats like JSON and XML, it's common practice to assume that ignoring
additions is harmless.

In any case, we had communicated before that we do not consider the addition of
a field a breaking change. It only becomes a breaking change when it impacts the
interpretation of other fields. In which case we would announce it well in 
advance.

> Given this, code that consumes
> encoded data should at least produce warnings when it encounters encodings
> that it is not expecting and preferably should refuse to produce output in
> such circumstances. 

Depends on the circumstances. For a web browser for example, this would be very
annoying behavior. Nearly all websites would be unusable. Similarly, most email
would become unreadable if mail clients would be that strict.

> Producers of data thus should signal in advance any
> changes to the encoding, even if they know that the changes can be safely 
> ignored.

I disagree on "any". For example, do you want announcements about changes to the
order of attributes in XML tags? Why? In case someone uses a regex to process
the XML? Should you not be able to rely on your clients conforming the to XML
spec, which says that the order of attributes is undefined?

In the case at hand (adding a field), it would have been good to communicate it
in advance. But since it wasn't tagged as "breaking", it slipped through. We are
sorry for that. Clients should still not choke on an addition like this.

> I would view software that consumes Wikidata information and silently ignores
> fields that it is not expecting as deficient and would counsel against using
> such software.

Is this just for Wikidata, or does that extend to other kinds of data too? Why,
or why not?

By definition, any extensible format or protocol (HTTP, SMTP, HTML, XML, XMPP,
IRC, etc) can contain parts (headers, elements, attributes) that the client does
not know about, and should ignore. Of course, the spec will tell clients where
to expect and allow extra bits. That's why I'm planning to put up a document
saying clearly what kinds of changes clients should be prepared to see in
Wikidata output:

Clients need to be prepared to encounter entity types and data types they don't
know. But they should also allow additional fields in any JSON object. We
guarantee that extra fields do not impact the interpretation of fields they know
about - unless we have announced and documented a breaking change.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Breaking change in JSON serialization?

2016-08-05 Thread Daniel Kinzler

Hi Markus!

You are asking use to better communicate changes to our serialization, even if
it's not a breaking change according to the spec. I agree we should do that. We
are trying to improve our processes to achieve this.

Can we ask you in return to try to make your software more robust, by not making
unwarranted assumptions about the serialization format?

With regards to communicating more - it's very hard to tell which changes might
break something for someone. For instance, some software might rely on the order
of fields in a JSON object, even though JSON says this is unspecified, just like
you rely on no fields being added, even though there is no guarantee about this.
Similarly, some software might rely on non-ascii characters being represented as
unicode escape sequences, and will break if we use the more compact utf-8. Or
they may break on changes whitespace. Who knows. We can not possibly know what
kind of change will break some 3rd party software.

I don't think announcing any and all changes is feasible. So I think an official
policy about what we announce can be useful. Something like "This is what we
consider a breaking change, and we will definitely announce it. And these are
some kinds of changes we will also communicate ahead of time. And these are some
things that can happen unannounced."

You are right that policies don't change the behavior of software. But perhaps
they can change the behavior of programmers, by telling them what they can (and
can't) safely rely on.

It boils down to this: we can try to be more verbose, but if you make
assumptions beyond the spec, things will break sooner or later. Writing robust
software requires more time and thought initially, but it saves a lot of
headaches later.

-- daniel

Am 04.08.2016 um 21:49 schrieb Markus Kroetzsch:
> Daniel,
> 
> You present arguments on issues that I would never even bring up. I think we
> fully agree on many things here. Main points of misunderstanding:
> 
> * I was not talking about the WMDE definition of "breaking change". I just 
> meant
> "a change that breaks things". You can define this term for yourself as you 
> like
> and I won't argue with this.
> 
> * I would never say that it is "right" that things break in this case. It's
> annoying. However, it is the standard behaviour of widely used JSON parsing
> libraries. We won't discuss it away.
> 
> * I am not arguing that the change as such is bad. I just need to know about 
> it
> to fix things before they break.
> 
> * I am fully aware of many places where my software should be improved, but I
> cannot fix all of them just to be prepared if a change should eventually 
> happen
> (if it ever happens). I need to know about the next thing that breaks so I can
> prioritize this.
> 
> * The best way to fix this problem is to annotate all Jackson classes with the
> respective switch individually. The global approach you linked to requires 
> that
> all users of the classes implement the fix, which is not working in a library.
> 
> * When I asked for announcements, I did not mean an information of the type 
> "we
> plan to add more optional bits soonish". This ancient wiki page of yours that
> mentions that some kind of change should happen at some point is even more
> vague. It is more helpful to learn about changes when you know how they will
> look and when they will happen. My assumption is that this is a "low cost"
> improvement that is not too much to ask for.
> 
> * I did not follow what you want to make an "official policy" for. Software
> won't behave any differently just because there is a policy saying that it 
> should.
> 
> Markus
> 
> 
> On 04.08.2016 16:48, Daniel Kinzler wrote:
>> Hi Markus!
>>
>> I would like to elaborate a little on what Lydia said.
>>
>> Am 04.08.2016 um 09:27 schrieb Markus Kroetzsch:
>>> It seems that some changes have been made to the JSON serialization 
>>> recently:
>>>
>>> https://github.com/Wikidata/Wikidata-Toolkit/issues/237
>>
>> This specific change has been announced in our JSON spec for as long as the
>> document exists.
>> <https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#wikibase-entityid> 
>> sais:
>>
>>> WARNING: wikibase-entityid may in the future change to be represented as a
>>> single string literal, or may even be dropped in favor of using the string
>>> value type to reference entities.
>>>
>>> NOTE: There is currently no reliable mechanism for clients to generate a
>>> prefixed ID or a URL from the information in the data value.
>>
>> That was the problem: With the current format, all clients needed a hard 
>> coded

Re: [Wikidata] Breaking change in JSON serialization?

2016-08-04 Thread Daniel Kinzler

Hi Markus!

I would like to elaborate a little on what Lydia said.

Am 04.08.2016 um 09:27 schrieb Markus Kroetzsch:
> It seems that some changes have been made to the JSON serialization recently:
>
> https://github.com/Wikidata/Wikidata-Toolkit/issues/237

This specific change has been announced in our JSON spec for as long as the
document exists.
<https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON#wikibase-entityid> sais:

> WARNING: wikibase-entityid may in the future change to be represented as a
> single string literal, or may even be dropped in favor of using the string
> value type to reference entities.
>
> NOTE: There is currently no reliable mechanism for clients to generate a
> prefixed ID or a URL from the information in the data value.

That was the problem: With the current format, all clients needed a hard coded
mapping of entity types to prefixes, in order to construct ID strings from the
JSON serialization of ID values. That means no entity types can be added without
breaking clients. This has now been fixed.


Of course, it would have been good to announce this in advance. However, it is
not a breaking change, and we do not plan to treat additions as breaking 
changes.

Adding something to a public interface is not a breaking change. Adding a method
to an API isn't, adding an element to XML isn't, and adding a key to JSON isn't
- unless there is a spec that explicitly states otherwise.

These are "mix and match" formats, in which anything that isn't forbidden is
allowed. It's the responsibility of the client to accommodate such changes. This
is simple best practice - a HTTP client shouldn't choke on header fields it
doesn't know, etc. See <https://en.wikipedia.org/wiki/Robustness_principle>.


If you use a library that is touchy about extra data per default, configure it
to be more accommodating, see for instance
<https://stackoverflow.com/questions/14343477/how-do-you-globally-set-jackson-to-ignore-unknown-properties-within-spring>.

> Could somebody from the dev team please comment on this? Is this going to be 
> in
> the dumps as well or just in the API?

Yes, we use the same basic serialization for the API and the dumps. For the
future, note that some parts (such as sitelink URLs) are optional, and we plan
to add more optional bits (such as normalized quantities) soonish.

> Are further changes coming up?

Yes. The next one in the pipeline is Quantities without upperBound and
lowerBound, see <https://phabricator.wikimedia.org/T115270>. That IS a breaking
change, and the implementation is thus blocked on announcing it, see
<https://gerrit.wikimedia.org/r/#/c/302248/>.

Furthermore, we will probably remove the entity-type and numeric-id fields from
the serialization of EntityIdValues eventually. But there is no concrete plan
for that at the moment.

When we remove the old fields for ItemId and PropertyId, that IS a breaking
change, and will be announced as such.

> Are we ever
> going to get email notifications of API changes implemented by the team rather
> than having to fix the damage after they happened?

We aspire to communicate early, and we are sorry we did not announce this change
ahead of time.

However, this is not a breaking change by the common understanding of the term,
and will not be treated as such. We have argued about that on this list before,
see
<https://www.mail-archive.com/wikidata-tech@lists.wikimedia.org/msg00902.html>.
I have made it clear back then what we consider a breaking change and what not,
and I have advised you that being accommodating in what your client code accepts
will avoid headaches in the future.

To make this even more clear, we will enact and document something similar to my
email from February as official policy soon. Watch for an announcement on this 
list.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] An attribute for "famous person"

2016-08-02 Thread Daniel Kinzler

Am 02.08.2016 um 20:19 schrieb Markus Kroetzsch:
> Oh, there is a little misunderstanding here. I have not suggested to create a
> property "number of sitelinks in this document". What I propose instead is to
> create a property "number of sitelinks for the document associated with this
> entity". The domain of this suggested property is entity. The advantage of 
> this
> proposal over the thing that you understood is that it makes queries much
> simpler, since you usually want to sort items by this value, not documents. 
> One
> could also have a property for number of sitelinks per document, but I don't
> think it has such a clear use case.

"number of sitelinks for the document associated with this entity" strikes me as
semantically odd, which was the point of my earlier mail. I'd much rather have
"number of sitelinks in this document". You are right that the primary use would
be to "rank" items, and that it would be more conveniant to have the count
assocdiated directly with the item (the entity), but I fear it will lead to a
blurring of the line between information about the entity, and information about
the document. That is already a common point of confusion, and I'd rather keep
that separation very clear. I also don't think that one level of indirection
would be orribly complicated.

To me it's just natural to include the sitelink info on the same level as we
provide a timestmap or revision id: for the document.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] An attribute for "famous person"

2016-08-02 Thread Daniel Kinzler

Am 02.08.2016 um 18:41 schrieb Andrew Gray:
> I'd agree with both interpretations - the majority of people in Wikidata are
> Using the existence of Wikipedia articles as a threshold, as suggested, seems 
> a
> pretty good test - it's flawed, of course, but it's easy to check for and 
> works
> as a first approximation of "probably is actually famous".

If we want to have the number of sidelinks in RDF, let's please make sure that
this number is associated with the item *document* uri, not with the concept
uri. After all, the person doesn't have links, the item document does.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Grammatical display of units

2016-08-01 Thread Daniel Kinzler

Am 28.07.2016 um 12:26 schrieb Lydia Pintscher:
> The discussion about how to do this is happening in
> https://phabricator.wikimedia.org/T86528 The basic problem is that we
> do use items for the units. I think this is the right thing to do but
> it does make this particular part a bit tricky.

Well, I think we could sidestep the grammar issue by using unit symbols. We
would have to get them from statements, and they would have to be multilingual
values (or mutliple mono-lingual values), but that is still much less
complicated than trying to apply plural rules.

An alternative is to use MediaWiki i18n messages instead of entity labels. E.g.
if the unit is Q11573, we could check if MediaWiki:wikibase-unit-Q11573 exists,
and if it does, use it. We'd get internationalization including support for
plurals for free.

We could actually combine all of these approaches: first check for a system
message, then check for a symbol statement, then use the label, and if all
fails, use the ID.

I'll comment on the ticket.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Controversy around Wikimania talks

2016-07-31 Thread Daniel Kinzler

Am 31.07.2016 um 17:04 schrieb Gerard Meijssen:
> Hoi,
> I am not to judge what conferences will be deemed relevant for an item in
> Wikidata. When a conference is relevant, it is the talks and particularly the
> registrations of the talks, the papers and the presentations that make the
> conference relevant after the fact.

So you think that for every relevant conference, all talks and speakers should
automatically be considered relevant? Does the same aregument apply to all
courses and theachers at all relevant universities and schools?

I'm trying to understand your point. To me it's a question of granularity. We
can't manage arbitrarily fine grained information, so we have to stop at some
point. What do you think, where should that point be for Wikimania, for other
(relevant) conferences, for universities, for schools?

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Controversy around Wikimania talks

2016-07-31 Thread Daniel Kinzler

Am 31.07.2016 um 16:28 schrieb Gerard Meijssen:
> Hoi,
> Really? It is a source for the talks that were given. It contains the papers
> that were the basis for granting a spot on the program. 

To clarify - would the same apply for any talk at any conference? Or do you
think Wikimania schould be especially relevant to Wikidata, because it's a
Wikimedia thing?

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] URL strategy

2016-06-18 Thread Daniel Kinzler

Am 13.06.2016 um 12:12 schrieb Richard Light:
> returns a list of person URLs.  So I'm happy.  However, I am still intrigued 
> as
> to the logic behind the redirection of the statement URL to the URL for the
> person about whom the statement is being made.

The reason is a practical one: the statement data is part of the data about that
person. It's stored and addressed as part of that person's information. We
currently do not have an API that would return only the statement data itself,
so if you dereference the statement URI, you get all the data we have on the
subject, which includes the statement.

This is formally acceptable: dereferencing the statement URI should give you the
RDF representation of that statement (and possibly more - which is the case
here). The statement URI does not resolve to the the subject or the object, but
to Statement itself, which is an RFC resource in it's own right.

Perhaps the confusion arises from the fact that the SPARQL endpoint offers two
views on Statements: the "direct" or "naive" mapping (using the wds prefix) in
which a Statement is modeled as a single triple, and does not have a URI of it's
own. And the "full" or "deep" mapping, where the statement is a resource in it's
own right, and we use several triples to describe its type, value, rank,
qualifiers, references, etc.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] language fallbacks on Wikipedia and co

2016-06-16 Thread Daniel Kinzler

Am 15.06.2016 um 23:53 schrieb Gerard Meijssen:
> Hoi,
> Wil it work using the #babel templates?

No, because that would be inconsistent with the fallback that is applied when
using Lua or {{#property}} in wikitext. The fallback is based on the fallback
that is defined by MediaWiki for the interface labnguages.

In wikitext, we cannot use the Babel templates, because that would break
caching. The rendering can depend on a few user specific settings, but caching a
rendered version of every page for every possible combination of babel templates
is not feasible.

We could in theory use a different fallback mechanism on Special:AboutTopic, but
that would be quite confusing - why does it look differently in articles? Also,
when talking to others about the output of Special:AboutTopic, this might get
confusing: if someone complains that e.g. some label they see there is wrong,
and you go to the page but what you see is different, it becomes hard to discuss
the issue. There would be no way to link to the page as you see it. Everyone
would potentially see different output.

-- daniel


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Fwd: Using sparql to query for labels with matching regex

2016-04-19 Thread Daniel Kinzler

Hi Mike!

I'm no SPARQL expert, but regular expressions in queries are often not optimized
using indexes. So *all* labels would need to be checked against the regular
expression, which of course times out.

But there are other options. Perhaps
instead of FILTER regex(?label, "^apparel")
try FILTER (STRSTARTS(?label,"apparel"))

See <https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#func-strstarts>

Another option would be Blazegraph's full text index:

WHERE {
  ?label bds:search "apparel*" .
  
}

This woudl match any label that contains a word that starts with apparel.

See <https://wiki.blazegraph.com/wiki/index.php/FullTextSearch>

HTH

Am 29.03.2016 um 22:47 schrieb mike white:
> 
> Hi all
> 
> I am trying to query the wiki data for entities with labels that matches a
> regex. I am new in the sparql world. So could you please help me with it. Here
> is what I have for now.
> 
> https://gist.github.com/anonymous/2810eb5747e51a9ae746183a43f20771
> 
> But I don't think it is the right way. Any help will be much appreciate. 
> Thanks
> 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Wordnet mappings

2016-04-12 Thread Daniel Kinzler

Am 12.04.2016 um 08:42 schrieb Stas Malyshev:
> Hi!
> 
>> Is there a property for WordnetId?

More mappings are always good. The case of WordNet is a bit tricky though, since
WordNet is about words, not concepts. Wikidata items can perhaps be mapped to
SynSets, but we still have to be careful not to get confused about the 
semantics.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] MathML is dead, long live MathML

2016-04-07 Thread Daniel Kinzler

Am 07.04.2016 um 20:00 schrieb Moritz Schubotz:
> Hi Daniel,
> 
> Ok. Let's discuss!

Great! But let's keep the discussion in one place. I made a mess by
cross-posting this to two lists, now it's three, it seems. Can we agree on
 as the venue of discussion? At least for the
discussion of MathML in the context of Wikimedia, that would be the best place,
I think.

-- daniel


___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

[Wikidata-tech] MathML is dead, long live MathML

2016-04-07 Thread Daniel Kinzler

Peter Krautzberger, maintainer of MathJax, apparently thinks that MathML has
failed as a web standard (even though it succeeded as an XML standard), and
should be removed from HTML5. Here's the link:

https://www.peterkrautzberger.org/0186/

It's quite a rant. Here's a quick TL;DR:

> It doesn’t matter whether or not MathML is a good XML language. Personally, I
> think it’s quite alright. It’s also clearly a success in the XML publishing
> world, serving an important role in standards such as JATS and BITS.
> 
> The problem is: MathML has failed on the web.

> Not a single browser vendor has stated an intent to work on the code, not a
> single browser developer has been seen on the MathWG. After 18 years, not a
> single browser vendor is willing to dedicate even a small percentage of a
> developer to MathML.

> Math layout can and should be done in CSS and SVG. Let’s improve them
> incrementally to make it simpler.
> 
> It’s possible to generate HTML+CSS or SVG that renders any MathML content –
> on the server, mind you, no client-side JS required (but of course possible).

> Since layout is practically solved (or at least achievable), we really need
> to solve the semantics. Presentation MathML is not sufficient, Content MathML
> is just not relevant.
> 
> We need to look where the web handles semantics today – that’s ARIA and HTML
> but also microdata, rdfa etc.

I think both, the rendering as well as the semantics, are well worth thinking
about. Perhaps Wikimedia should reach out to Peter Krautzberger, and discuss
some ideas of how math (and physics, and chemistry) content should be handled by
Wikipedia, Wikidata, and friends. This seems like a cross roads, and we should
have a hand in where things are going from here.

-- daniel (not a MathML expert all all)

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

1 2 3 4 >

1 - 100 of 395 matches

Mail list logo