Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-18 Thread Pine W
I would appreciate clarification what is proposed with regard to exposing 
problematic Wikidata ontology on Wikipedia. If the idea involves inserting 
poor-quality information onto English Wikipedia in order to spur us to fix 
problems with Wikidata, then I am likely to oppose it. English Wikipedia is not 
an endless resource for free labor, and we have too few skilled and good-faith 
volunteers to handle our already enormous scope of work.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
null___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-18 Thread James Heald

On 18/10/2018 22:33, Markus Kroetzsch wrote:


And, on another note, there is also a huge misunderstanding exposed in 
the discussion on th search-related tracker item [1]: Cparle there 
speaks about "traversing the subclass hierarchy" but is actually looking 
at *super*classes of, e.g., "Clarinet", which he mostly finds irrelevant 
to users who care about clarinets. But surely that's the wrong 
direction! You have to look for *sub*classes to find special cases of 
what you are looking for. Looking downwards will often lead to much 
saner ontologies than when turning your head towards the dizzy heights 
of upper ontology. Yes, the few of us looking for instances of "logical 
consequence" will still get clarinets, but those who look for instances 
of clarinet merely will see instances of alto clarinet, piccolo 
clarinet, basset horn, Saxonette, and so on [2]. So instead of trying to 
suggest to Commons editors meaningful "upper concepts", one could simply 
enable the use of lower concepts in search. It does not work in all 
cases yet, but it many.


Not really.

Cparle wants to make sure that people searching for "clarinet" also get 
shown images of "piccolo clarinet" etc.


To make this possible, where an image has been tagged "basset horn" he 
is therefore looking to add "clarinet" as an additional keyword, so that 
if somebody types "clarinet" into the search box, one of the images 
retrieved by ElasticSearch will be the basset horn one.


I imagine there are pluses and minuses both ways, whether you try to 
make sure one search returns more hits, or try to run multiple searches 
each returning fewer hits.


Your suggestion of the latter approach may not involve so much 
pre-investigation of the top of the tree, which may be terms that people 
are less likely to search for; but on the other hand, the actual 
searching may be less efficient than a single indexed search.




There are still problems (such as the biological taxonomy being modelled 
as a hierarchy of names rather than animal classes, placing dog far away 
from mammal), but it is still always much easier to come up with a sane 
organisation for the *sub*classes of a concrete class.


For what it's worth, there's currently quite a lively discussion on 
Project Chat about issues with the current modelling of biological 
taxonomies,

https://www.wikidata.org/wiki/Wikidata:Project_chat#Taxonomy:_concept_centric_vs_name_centric

People on this thread might like to comment on some of the less 
fortunate elements of current practice, and the appropriateness of some 
of the thoughts that have been suggested.


But the taxo project has become such a walled garden, answerable only to 
itself, that people with comments may need to be quite forceful to get 
their message through, if we are to deal eg with some of the 
difficulties Cparle describes in the ticket at

 https://phabricator.wikimedia.org/T199119

  -- James.

---
This email has been checked for viruses by AVG.
https://www.avg.com


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-18 Thread Markus Kroetzsch

+1 to Daniel

And, on another note, there is also a huge misunderstanding exposed in 
the discussion on th search-related tracker item [1]: Cparle there 
speaks about "traversing the subclass hierarchy" but is actually looking 
at *super*classes of, e.g., "Clarinet", which he mostly finds irrelevant 
to users who care about clarinets. But surely that's the wrong 
direction! You have to look for *sub*classes to find special cases of 
what you are looking for. Looking downwards will often lead to much 
saner ontologies than when turning your head towards the dizzy heights 
of upper ontology. Yes, the few of us looking for instances of "logical 
consequence" will still get clarinets, but those who look for instances 
of clarinet merely will see instances of alto clarinet, piccolo 
clarinet, basset horn, Saxonette, and so on [2]. So instead of trying to 
suggest to Commons editors meaningful "upper concepts", one could simply 
enable the use of lower concepts in search. It does not work in all 
cases yet, but it many.


There are still problems (such as the biological taxonomy being modelled 
as a hierarchy of names rather than animal classes, placing dog far away 
from mammal), but it is still always much easier to come up with a sane 
organisation for the *sub*classes of a concrete class.


FYI, I recently gave a talk about ontological modelling in Wikidata that 
discussed some of the current issues: 
https://iccl.inf.tu-dresden.de/web/Misc3058/en (audience were ontology 
design pattern researchers there).


Cheers,

Markus

[1] https://phabricator.wikimedia.org/T199119
[2] http://tinyurl.com/y7tvkuzk

On 17/10/2018 16:04, Daniel Kinzler wrote:

My (very belated) thoughts on this issue:

Wiki content grows in a messy way, and it stays messy until the messiness causes
problems. Once it causes problems, people are motivated to clean it up.

I propose to implement hierarchical search based on very simple, predictable
rules, e.g. by having a configurable list of transitive relationships that get
evaluated to a certain depth. I'd go for subclasses, geographical inclusion, and
subspecies at first.

Doing this will NOT produce good results. You would have to implement a lot of
special cases and heuristics to work around dirty data. I say: let it produce
bad results, tell people why the results are bad, and what they can do about it!

The Wikimedia community is AMAZING at making good use of whatever capabilities
the software, and adapting content to make the software produce the results they
want. By providing limited but clearly defined software support for hierarchical
search, we allow the community to optimize the content to work with that search.
Keeping the rules simple means that other consumers can then follow the same
rules, and the content will work for them as well.

-- daniel

Am 29.09.2018 um 19:25 schrieb Gerard Meijssen:

Hoi,
There is also the age old conundrum where some want to enforce their rules for
the good all all because (argument of the day follows).

First of all, Wikidata is very much a child of Wikipedia. It has its own
structures and people have endeavoured to build those same structures in
Wikidata never mind that it is a very different medium and never mind that there
are 280+ Wikipedias that might consider things to be different.  The start of
Wikidata was also an auspicious occasion where it was thought to be OK to adopt
an external German authority. That proved to be a disaster and there are still
residues of this awful decision. It took not long to show the short comings of
this schedule and it was replaced by something more sensible.

However, we got something really Wiki and it was all too wild. It took not long
for me to ask for someone to explain the current structures and nobody
volunteered. So I did what I do best, I largely ignored the results of the
classes and subclasses. It does not work for me. It works against me so me
current strategy is to ignore this nonsense and concentrate on including data.
The reason is simple; once data is included, it is easy to slice it and dice
it.structure it as we see fit at a later date.

So when our priority becomes to make our data reusable, more open we should
agree on it. So far we have not because we choose to fight each other. Some have
ideas, some have invested too much in what we have at this time. When we are to
make our data reusable, we should agree on what it is exactly we aim to achieve.
Is it to support Commons, it is to support some external standard that is
academically sound. I would always favour what is practical and easily measured.

I would support Commons first. It has the benefit that it will bring our
communities together in a clear objective. It has the benefit that changes in
the operations of Wikidata support the whole of the Wikimedia universe and
consequentially financial, technical and operational needs and investments are
easily understood. It also means that all the bureaucracy that has materialised
will show

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-18 Thread Daniel Kinzler
Am 18.10.2018 um 19:05 schrieb Peter F. Patel-Schneider:
> On 10/17/18 7:04 AM, Daniel Kinzler wrote:
>> My (very belated) thoughts on this issue:
>>
> [...]
>> I say: let it produce> bad results, tell people why the results are bad, and
> what they can do about it!
> [...]
>>
>> -- daniel
> My view is that there is a big problem with this for industrial use of 
> Wikidata.
> 
[...]
> What is the biggest problem I see in Wikidata?  It is the poor organization of
> the Wikidata ontology.  To fix the ontology, beyond doing point fixes, is
> going to require some commitment from the Wikidata community.

I agree. And I think the best way to achieve this is to start using the ontology
as an ontology on wikimedia projects, and thus expose the fact that the ontology
is broken. This gives incentive to fix it, and examples as to what things should
be possible using that ontology (namely, some level of basic inference).

-- 
Daniel Kinzler
Principal Software Engineer, MediaWiki Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-18 Thread Peter F. Patel-Schneider
On 10/17/18 7:04 AM, Daniel Kinzler wrote:
> My (very belated) thoughts on this issue:
> 
[...]
> I say: let it produce> bad results, tell people why the results are bad, and
what they can do about it!
[...]
> 
> -- daniel
My view is that there is a big problem with this for industrial use of Wikidata.

I would very much like to use Wikidata more in my company.  However, I view it
as my duty in my company to point out problems with the use of any technology.
  So whenever I talk about Wikidata I also have to talk about the problems I
see in the Wikidata ontology and how they will affect use of Wikidata in my
company.

If Wikidata is going to have significant use in my company there needs to be
at least some indication that the problems in Wikidata are being addressed.  I
don't see that happening at the moment.


What is the biggest problem I see in Wikidata?  It is the poor organization of
the Wikidata ontology.  To fix the ontology, beyond doing point fixes, is
going to require some commitment from the Wikidata community.


Peter F. Patel-Schneider
Nuance Communications

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-18 Thread Luca Martinelli
Il mer 17 ott 2018, 16:04 Daniel Kinzler  ha scritto:
> I say: let it produce bad results, tell people why the results are bad, and 
> what they can do about it!

TL;DR: let's produce bad results, and let's analyse those results to
find the best practical solution we can come up with.

I totally agree with Daniel here. It is definitely a red flag that we
should tackle head-first, but we need data first. We need to know
*where* ontology fails, *why* it fails, and *how* can we fix it.

Now it's probably the best time to talk about this, not just because
we have a potential big application such as Structured Data, but also
because we focused on other not-so-easy problems such as dealing with
isolated sitelinks/projects and try to establish relations between
items, and between items and other databases.

What we need to do IMHO is to find whatever best practical solution we
have at hand, in order to primarily use it on Wikimedia projects. My
only fear is that such discussions may end up in a swamp because of
"that one user" who doesn't want to apply that particular solution
(not accusing anyone in particular, I've been that user too in some
discussions). Anyway, if we start from data, we can come up with some
solution.

L.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Help us teaching ORES how to better detect vandalism

2018-10-18 Thread Ettore RIZZA
Hello Léa,

This is an extremely useful tool. Just a detail to improve its usability:
could you remove the confirmation popup when skipping a modification? Many
edits are made in languages that everyone doesn't speak, we would like to
move faster to the next.

Cheers,

Ettore Rizza

On Thu, 18 Oct 2018 at 10:10, Léa Lacroix  wrote:

> Hello all,
> Just a reminder, we still need your help to complete this campaign! A few
> minutes of your time can really help ORES to be smarter in detecting
> vandalism. Thanks a lot :)
>
> Cheers, Léa
>
> On Wed, 18 Jul 2018 at 14:28, Léa Lacroix 
> wrote:
>
>> Hello all,
>>
>> As you may know, ORES is a tool analyzing edits to detect vandalism,
>> providing a score per edit. You can see the result on Recent Changes, you
>> can also let us know when you find something wrong
>> .
>>
>> But do you know that you can also directly help ORES to improve? We just
>> launched a new labeling campaign
>> : after authorizing your
>> account with OAuth, you will see some real edits, and you will be asked if
>> you find them damaging or not, good faith or bad faith. Completing a set
>> will take you around 10 minutes.
>>
>>
>> ​
>>
>> The last time we run this campaign was in 2015. Since then, the way of
>> editing Wikidata changed, some vandalism patterns as well (for example,
>> there are more vandalism on companies). So, if you're familiar with the
>> Wikidata rules and you would be willing to give a bit of time to help
>> fighting against vandalism, please participate
>>  :)
>>
>> If you encounter any problem or have question about the tool, feel free
>> to contact Ladsgroup .
>>
>> Cheers,
>> --
>> Léa Lacroix
>> Project Manager Community Communication for Wikidata
>>
>> Wikimedia Deutschland e.V.
>> Tempelhofer Ufer 23-24
>> 10963 Berlin
>> www.wikimedia.de
>>
>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>
>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>
>
>
> --
> Léa Lacroix
> Project Manager Community Communication for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Senses are now part of Lexicographical Data

2018-10-18 Thread Léa Lacroix
Hello all,

As previously announced, the next big piece of Lexicographical Data on
Wikidata  is
now deployed: Senses.

Senses will allow you to describe, for each Lexeme, the different meanings
of the word. By using multilingual glosses, very short phrase giving an
idea of the meaning. In addition, each of these Senses can have statements
to indicate synonyms, antonyms, refers-to-concept and more. By connecting
Senses to other Senses and to Items, you will be able to describe precisely
the meaning of words with structured and linked data. But the most
important thing is that Senses will be able to do is collect translations
of words between languages.

Thanks to Senses, you will be able to organize and connect the existing
Lexemes better, and to provide a very important layer of information. With
Senses support, we now have all the basic technical building blocks to
allow structured machine-readable lexicographical data, that can be
reusable within and Wikimedia projects and by other stakeholders.

Feel free to try editing Senses. You can use the sandbox
 to make some tests. Let us know
if you have questions or find bugs.

Note: there are still issues with sorting the IDs of Senses, Forms and
sorting the glosses, that will be solved later this week. Thanks for your
understanding.

Cheers,
-- 
Léa Lacroix
Project Manager Community Communication for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Help us teaching ORES how to better detect vandalism

2018-10-18 Thread Léa Lacroix
Hello all,
Just a reminder, we still need your help to complete this campaign! A few
minutes of your time can really help ORES to be smarter in detecting
vandalism. Thanks a lot :)

Cheers, Léa

On Wed, 18 Jul 2018 at 14:28, Léa Lacroix  wrote:

> Hello all,
>
> As you may know, ORES is a tool analyzing edits to detect vandalism,
> providing a score per edit. You can see the result on Recent Changes, you
> can also let us know when you find something wrong
> .
>
> But do you know that you can also directly help ORES to improve? We just
> launched a new labeling campaign
> : after authorizing your
> account with OAuth, you will see some real edits, and you will be asked if
> you find them damaging or not, good faith or bad faith. Completing a set
> will take you around 10 minutes.
>
>
> ​
>
> The last time we run this campaign was in 2015. Since then, the way of
> editing Wikidata changed, some vandalism patterns as well (for example,
> there are more vandalism on companies). So, if you're familiar with the
> Wikidata rules and you would be willing to give a bit of time to help
> fighting against vandalism, please participate
>  :)
>
> If you encounter any problem or have question about the tool, feel free to
> contact Ladsgroup .
>
> Cheers,
> --
> Léa Lacroix
> Project Manager Community Communication for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>


-- 
Léa Lacroix
Project Manager Community Communication for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata