Re: [Wikidata-l] Reasonator ignores of qualifier

2014-06-14 Thread Jane Darnell
Derric,
Good luck with your edit-a-thon today [1] and it would be awesome if you
could introduce Reasonator and Wikidata while discussing Wikipedia. I think
it is very interesting that you just discovered Magnus through Reasonator
instead of the other way around - discovering more of Magnus through his
Wikipedia tools. So few people venture outside of the text-oriented
Wikipedia projects, it's always refreshing to see someone in a
public-facing capacity getting involved in Wikidata.

[1] https://en.wikipedia.org/wiki/User:Zellfaze/Edit-a-thon_Planning

Jane


On Fri, Jun 13, 2014 at 5:27 PM, Derric Atzrott 
datzr...@alizeepathology.com wrote:

 FYI, I maintain Reasonator, as a click on Other/About Reasonator would
 have revealed...

 Oh, so it does. I have no idea how I missed that.  I promise I did try to
 look around.

 Absolutely. Do I take it that you volunteer? The code is at:
 
 https://bitbucket.org/magnusmanske/reasonator
 I can add you to the codebase there, and/or make you co-maintainer on the
 tool.

 I'll volunteer at the very least I'll take a look and see if I can find
 what needs to be modified and then possibly submit a patch.  This seems
 like one of those things that ought to be a pretty simple fix.  No need to
 make me co-maintainer; I'm not looking for that kind of responsibility.

 Because I sure don't have the time at the moment to fiddle with some edge
 cases in a secondary component of one of my 50+ tools.

 Sorry if I came off rude or demanding.  That was not my intention.  With
 that many tools, I'm sure there are more important bugs.  I just wanted to
 add this to the list.  I'll take a look at the source and see if I can find
 where the issue is and how to fix it.

 Thank you,
 Derric Atzrott


 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Reasonator ignores of qualifier

2014-06-14 Thread Thomas Douillard
Hi Derrick, I think it's vain to ask yourself if some concept deserves an
item. It does not make much sense. There is much more value in the
regularity in how we express the same kind of data : this makes really much
simpler to develop tools and to help newbies on how to do things if we
decide one way on expressing that somebody is the mayor of somewhere and
stick to it.




2014-06-13 16:57 GMT+02:00 Derric Atzrott datzr...@alizeepathology.com:

  It does crop up in many places.. What you see is not the Reasonator
 per-se it
  is the script used to generate a text. Compare it with the results in
 most
  other languages, you will not see this text.

 That's fair.  In that case the script to generate the text should be
 modified.

  Arguably Reasonator expects a different pattern. You make him mayor..
 while
  Reasonator expects him to have office held mayor of Frederick.
 Compare
  Ronald Reagan where you find qualifiers for start and end date..
 
  http://tools.wmflabs.org/reasonator/?q=9960

 For a while I was actually using Mayor of Frederick as the position held.
 You can find that item as Q17167581, but I nominated it for deletion once I
 discovered the of qualifier.  The position is itself not really noteworthy
 enough to merits its own item, (like President of the United States or
 Mayor of
 New York City are) and because we have the Of qualifier I said, why not
 use it.

 Where does one submit bugs for Reasonator, or is this list a good spot to
 do so?

 Thank you,
 Derric Atzrott


 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata RDF exports

2014-06-14 Thread Markus Krötzsch

Eric,

Two general remarks first:

(1) Protege is for small and medium ontologies, but not really for such 
large datasets. To get SPARQL support for the whole data, you could to 
install Virtuoso. It also comes with a simple Web query UI. Virtuoso 
does not do much reasoning, but you can use SPARQL 1.1 transitive 
closure in queries (using * after properties), so you can find all 
subclasses there too. (You could also try this in Protege ...)


(2) If you want to explore the class hierarchy, you can also try our new 
class browser:


http://tools.wmflabs.org/wikidata-exports/miga/?classes

It has the whole class hierarchy, but without the leaves (=instances 
of classes + subclasses that have no own subclasses/instances). For 
example, it tells you that lepton has 5 direct subclasses, but shows 
only one:


http://tools.wmflabs.org/wikidata-exports/miga/?classes#_item=3338

On the other hand, it includes relationships of classes and properties 
that are not part of the RDF (we extract this from the data by 
considering co-occurrence). Example:


Classes that have no superclasses but at least 10 instances, and which 
are often used with the property 'sex or gender':


http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Direct%20superclasses=__null/Number%20of%20direct%20instances=10%20-%202/Related%20properties=sex%20or%20gender

I already added superclasses for some of those in Wikidata now -- data 
in the browser is updated with some delay based on dump files.



More answers below:

On 14/06/14 05:52, emw wrote:

Markus,

Thank you very much for this.  Translating Wikidata into the language of
the Semantic Web is important.  Being able to explore the Wikidata
taxonomy [1] by doing SPARQL queries in Protege [2] (even primitive
queries) is really neat, e.g.

SELECT ?subject
WHERE
{
?subject rdfs:subClassOf http://www.wikidata.org/entity/Q82586 .
}

This is more of an issue of my ignorance of Protege, but I notice that
the above query returns only the direct subclasses of Q82586.  The full
set of subclasses for Q82586 (lepton) is visible at
http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q82586rp=279lang=en
-- a few of the 2nd-level subclasses (muon neutrino, tau neutrino,
electron neutrino) are shown there but not returned by that SPARQL
query.  It seems rdfs:subClassOf isn't being treated as a transitive
property in Protege.  Any ideas?


You need a reasoner to compute this properly. For a plain class 
hierarchy as in our case, ELK should be a good choice [1]. You can 
install the ELK Protege plugin and use it to classify the ontology [2]. 
Protege will then show the copmuted class hierarchy in the browser; I am 
not sure what happens to the SPARQL queries (it's quite possible that 
they don't use the reasoner).


[1] https://code.google.com/p/elk-reasoner/
[2] https://code.google.com/p/elk-reasoner/wiki/ElkProtege



Do you know when the taxonomy data in OWL will have labels available?


We had not thought of this as a use case. A challenge is that the label 
data is quite big because of the many languages. Should we maybe create 
an English label file for the classes? Descriptions too or just labels?




Also, regarding the complete dumps, would it be possible to export a
smaller subset of the faithful data?  The files under Complete Data
Dumps in
http://tools.wmflabs.org/wikidata-exports/rdf/exports/20140526/ look too
big to load into Protege on most personal computers, and would likely
require adjusting JVM settings on higher-end computers to load.  If it's
feasible to somehow prune those files -- and maybe even combine them
into one file that could be easily loaded into Protege -- that would be
especially nice.


What kind of pruning do you have in mind? You can of course take a 
subset of the data, but then some of the data will be missing.


A general remark on mixing and matching RDF files. We use N3 format, 
where every line in the ontology is self-contained (no multi-line 
constructs, no header, no namespaces). Therefore, any subset of the 
lines of any of our files is still a valid file. So if you want to have 
only a slice of the data (maybe to experiment with), then you could 
simply do something like:


gunzip -c wikidata-statements.nt.gz | head -1  partial-data.nt

head simply selects the first 1 lines here. You could also use 
grep to select specific triples instead, such as:


zgrep http://www.w3.org/2000/01/rdf-schema#label; wikidata-terms.nt.gz 
| grep @en .  en-labels.nt


This selects all English labels. I am using zgrep here for a change; you 
can also use gunzip as above. Similar methods can also be used to count 
things in the ontology (use grep -c to count lines = triples).


Finally, you can combine multiple files into one by simply concatenating 
them in any order:


cat partial-data-1.nt  mydata.nt
cat partial-data-2.nt  mydata.nt
...

Maybe you can experiment a bit and let us know if there is any export 
that would be particularly 

[Wikidata-l] RFC about part of

2014-06-14 Thread David Cuenca
This RFC was pending after several discussions. How to clarify the property
part of?
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Refining_%22part_of%22

Cheers,
Micru
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata RDF exports

2014-06-14 Thread emw
Markus,

Thanks for the thorough reply!

you can use SPARQL 1.1 transitive closure in queries (using * after
 properties), so you can find all subclasses there too. (You could also
 try this in Protege ...)


I had a feeling I was missing something basic.  (I'm also new to SPARQL.)
Using * after the property got me what I was looking for by default in
Protege.  That is,

SELECT ?subject
WHERE
{
   ?subject rdfs:subClassOf* http://www.wikidata.org/entity/Q82586 .
}

-- with an asterisk after rdfs:subClassOf -- got me the transitive closure
and returned all subclasses of Q82586 / lepton.

Should we maybe create an English label file for the classes? Descriptions
 too or just labels?


A file with English labels and descriptions for classes would be great and,
I think, address this use case.  Per your note, I suppose one would simply
concatenate that English terms file and wikidata-taxonomy.nt into a new .nt
file, then import that into Protege to explore the class hierarchy.
(Having every line in the ontology be self-contained in N3 is very
convenient!)

Regarding the pruned subset, I think the command-line approach in your
examples is enough for me to get started making my own.

I won't have time to experiment with these things for a few weeks, but I
will return to this then and let you know any interesting findings.

Cheers,
Eric


On Sat, Jun 14, 2014 at 4:41 AM, Markus Krötzsch 
mar...@semantic-mediawiki.org wrote:

 Eric,

 Two general remarks first:

 (1) Protege is for small and medium ontologies, but not really for such
 large datasets. To get SPARQL support for the whole data, you could to
 install Virtuoso. It also comes with a simple Web query UI. Virtuoso does
 not do much reasoning, but you can use SPARQL 1.1 transitive closure in
 queries (using * after properties), so you can find all subclasses
 there too. (You could also try this in Protege ...)

 (2) If you want to explore the class hierarchy, you can also try our new
 class browser:

 http://tools.wmflabs.org/wikidata-exports/miga/?classes

 It has the whole class hierarchy, but without the leaves (=instances of
 classes + subclasses that have no own subclasses/instances). For example,
 it tells you that lepton has 5 direct subclasses, but shows only one:

 http://tools.wmflabs.org/wikidata-exports/miga/?classes#_item=3338

 On the other hand, it includes relationships of classes and properties
 that are not part of the RDF (we extract this from the data by considering
 co-occurrence). Example:

 Classes that have no superclasses but at least 10 instances, and which
 are often used with the property 'sex or gender':

 http://tools.wmflabs.org/wikidata-exports/miga/?
 classes#_cat=Classes/Direct%20superclasses=__null/Number%
 20of%20direct%20instances=10%20-%202/Related%
 20properties=sex%20or%20gender

 I already added superclasses for some of those in Wikidata now -- data in
 the browser is updated with some delay based on dump files.


 More answers below:


 On 14/06/14 05:52, emw wrote:

 Markus,

 Thank you very much for this.  Translating Wikidata into the language of
 the Semantic Web is important.  Being able to explore the Wikidata
 taxonomy [1] by doing SPARQL queries in Protege [2] (even primitive
 queries) is really neat, e.g.

 SELECT ?subject
 WHERE
 {
 ?subject rdfs:subClassOf http://www.wikidata.org/entity/Q82586 .
 }

 This is more of an issue of my ignorance of Protege, but I notice that
 the above query returns only the direct subclasses of Q82586.  The full
 set of subclasses for Q82586 (lepton) is visible at
 http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q82586rp=279lang=en
 -- a few of the 2nd-level subclasses (muon neutrino, tau neutrino,
 electron neutrino) are shown there but not returned by that SPARQL
 query.  It seems rdfs:subClassOf isn't being treated as a transitive
 property in Protege.  Any ideas?


 You need a reasoner to compute this properly. For a plain class hierarchy
 as in our case, ELK should be a good choice [1]. You can install the ELK
 Protege plugin and use it to classify the ontology [2]. Protege will then
 show the copmuted class hierarchy in the browser; I am not sure what
 happens to the SPARQL queries (it's quite possible that they don't use the
 reasoner).

 [1] https://code.google.com/p/elk-reasoner/
 [2] https://code.google.com/p/elk-reasoner/wiki/ElkProtege



 Do you know when the taxonomy data in OWL will have labels available?


 We had not thought of this as a use case. A challenge is that the label
 data is quite big because of the many languages. Should we maybe create an
 English label file for the classes? Descriptions too or just labels?



 Also, regarding the complete dumps, would it be possible to export a
 smaller subset of the faithful data?  The files under Complete Data
 Dumps in
 http://tools.wmflabs.org/wikidata-exports/rdf/exports/20140526/ look too
 big to load into Protege on most personal computers, and would likely
 require adjusting JVM settings on 

[Wikidata-l] Weekly Summary #113

2014-06-14 Thread John Lewis
Here is the latest summary of what has been happening around Wikidata! As
always, feedback is appreciated.

Discussions

   - Wikidata participation in Wiki Loves Pride
   
https://www.wikidata.org/wiki/Wikidata:Project_chat#Wikidata_participation_in_Wiki_Loves_Pride
   - Discussion: Delete as a new user group
   https://www.wikidata.org/wiki/Wikidata:Project_chat#Delete_as_own_usergroup
   - RfC: Refining part of
   
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Refining_%22part_of%22
   - Open RfAs: Calak
   
https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Administrator/Calak
   , Andre Engels
   
https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Administrator/Andre_Engels
   - Closed RfAs: Taketa
   
https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Administrator/Taketa
(successful) Jiangui67
   
https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Administrator/Jianhui67
(successful) 555
   
https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Administrator/555
(successful)

Other Noteworthy Stuff

   - The preview gadget
   https://www.wikidata.org/wiki/MediaWiki:Gadget-Preview.js has been
   rewritten and now comes with a completely new user interface. You can
   enable it here
   https://www.wikidata.org/wiki/Special:Preferences#mw-prefsection-gadgets
   .
   - Wikiquote received access to the data on Wikidata (aka phase 2) on
   Tuesday.
   - Try out the new Wikidata Property Browser
   https://lists.wikimedia.org/pipermail/wikidata-l/2014-June/004013.html.
   - The Merge gadget
   https://www.wikidata.org/wiki/MediaWiki:Gadget-Merge.js has been
   updated. It is now faster and only one edit per merge is performed.

Did you know?

   - Newest properties: original spelling
   https://www.wikidata.org/wiki/Property:P1353, ranking
   https://www.wikidata.org/wiki/Property:P1352, number of points/goals
   scored https://www.wikidata.org/wiki/Property:P1351, number of matches
   played https://www.wikidata.org/wiki/Property:P1350, ploidy
   https://www.wikidata.org/wiki/Property:P1349,AlgaeBase URL
   https://www.wikidata.org/wiki/Property:P1348, military casualty
   classification https://www.wikidata.org/wiki/Property:P1347, winner
   https://www.wikidata.org/wiki/Property:P1346, number of victims
   https://www.wikidata.org/wiki/Property:P1345, participated in
   https://www.wikidata.org/wiki/Property:P1344, described by source
   https://www.wikidata.org/wiki/Property:P1343, number of seats
   https://www.wikidata.org/wiki/Property:P1342
   - Newest task forces https://www.wikidata.org/wiki/Wikidata:Task_forces
   : Sports task forces
   https://www.wikidata.org/wiki/Wikidata:Sports_task_forces

Development

   - Continued with the implementation of redirects. There are a lot of
   corner cases to work out...
   - Created new data value handlers for quantities and geo coordinates,
   bringing queries closer
   - Moved browser tests to a new repository at
   https://github.com/wmde/WikidataBrowserTests to make the main Wikibase
   repo smaller and cleaner
   - Fixed a bunch of issues with the monobook skin
   - Made some more jQuery 1.9 compatibility fixes
   - More cleanup for the coming switch to WikibaseDataModel 1.0.
   - Icinga Dispatch Lag monitoring scripts, including IRC notifier bot,
   have been tested and are ready for Ops implementation. This should give us
   quicker notifications in case the notifications to Wikipedia and co about
   changes on Wikidata are slow again.
   - Enabled data access for Wikiquote

See current sprint items
https://bugzilla.wikimedia.org/buglist.cgi?list_id=218716resolution=---resolution=LATERresolution=DUPLICATEemailtype1=substringemailassigned_to1=1query_format=advancedbug_status=ASSIGNEDemail1=wikidata
for
what we’re working on next.

You can view the commits currently in review here
https://gerrit.wikimedia.org/r/#/q/(+project:mediawiki/extensions/Wikibase+OR+project:mediawiki/extensions/Diff+OR+project:mediawiki/extensions/DataValues+OR+project:mediawiki/extensions/WikibaseSolr+OR+project:mediawiki/extensions/Ask+OR+project:mediawiki/extensions/WikibaseQuery+OR+project:mediawiki/extensions/WikibaseDatabase+OR+project:mediawiki/extensions/WikibaseQueryEngine+OR+project:mediawiki/extensions/WikibaseDataModel+)+status:open,n,z
and
the ones that have been merged here
https://gerrit.wikimedia.org/r/#/q/(+project:mediawiki/extensions/Wikibase+OR+project:mediawiki/extensions/Diff+OR+project:mediawiki/extensions/DataValues+OR+project:mediawiki/extensions/WikibaseSolr+OR+project:mediawiki/extensions/Ask+OR+project:mediawiki/extensions/WikibaseQuery+OR+project:mediawiki/extensions/WikibaseDatabase+OR+project:mediawiki/extensions/WikibaseQueryEngine+OR+project:mediawiki/extensions/WikibaseDataModel+)+status:merged,n,z
.

You can see all open bugs related to Wikidata here

[Wikidata-l] New template for Wikipedia articles about Wikidata properties

2014-06-14 Thread Andy Mabbett
I have created {{Wikidata property}} [1] (example on [2]), for
suitable en.Wikipedia articles about subjects for which we have a
property in Wikidata. Please help to improve and apply it (can anyone
generate a list of relevant articles?), and to migrate it to
other-language Wikipedias.

[1] https://en.wikipedia.org/wiki/Template:Wikidata_property

[2] https://en.wikipedia.org/wiki/ORCID

-- 
Andy Mabbett
@pigsonthewing
http://pigsonthewing.org.uk

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l