Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Pine W
On Sat, Oct 20, 2018 at 4:41 PM Daniel Kinzler 
wrote:

> Hi Pine, sorry for the misleading wording. Let me clarify below.
>
> Am 19.10.18 um 9:51 nachm. schrieb Pine W:
> > Hi Markus, I seem to be missing something. Daniel said, "And I think the
> best
> > way to achieve this is to start using the ontology as an ontology on
> wikimedia
> > projects, and thus expose the fact that the ontology is broken. This
> gives
> > incentive to fix it, and examples as to what things should be possible
> using
> > that ontology (namely, some level of basic inference)." I think that I
> > understand the basic idea behind structured data on Commons. I also
> think that I
> > understand your statement above. What I'm not understanding is how
> Daniel's
> > proposal to "start using the ontology as an ontology on wikimedia
> projects, and
> > thus expose the fact that the ontology is broken." isn't a proposal to
> add poor
> > quality information from Wikidata onto Wikipedia and, in the process,
> give
> > Wikipedians more problems to fix. Can you or Daniel explain this?
>
> What I meant in concrete terms was: let's start using wikidata items for
> tagging
> on commons, even though search results based on such tags will currently
> not
> yield very good results, due to the messy state of the ontology, and hope
> people
> fix the ontology to get better search results. If people use "poodle" to
> tag an
> image and it's not found when searching for "dog", this may lead to people
> investigating why that is, and coming up with ontology improvements to fix
> it.
>
> What I DON'T mean is "let's automatically generate navigation boxes for
> wikipedia articles based on an imperfect  ontology, and push them on
> everyone".
> I mean, using the ontology to generate navigation boxes for some kinds of
> articles may be a nice idea, and could indeed have the same effect - that
> people
> notice problems in the ontology, and fix them. But that would be something
> the
> local wiki communities decide to do, not something that comes from
> Wikidata or
> the Structured Data project.
>
> The point I was trying to make is: the Wiki communities are rather good in
> creating structures that serve their purpose, but they do so pragmatically,
> along the behavior of the existing tools. So, rather than trying to work
> around
> the quirks of the ontology in software, the software should use very simply
> rules (such as following the subclass relation), and let people adopt the
> data
> to this behavior, if and when they find it useful to do so. This approach,
> over
> time, provides better results in my opinion.
>
> Also, keep in mind that I was referring to an imperfect *improvement* of
> search.
> the alternative being to only return things tagged with "dog" when
> searching for
> "dog". I was not suggesting to degrade user experience in order to
> incentivize
> editors. I'm rather suggesting the opposite: let's NOT give people a
> reason tag
> images that show poodles with "poodle" and "dog" and "mammal" and "animal"
> and
> "pet" and...
>
> --
> Daniel Kinzler
> Principal Software Engineer, Core Platform
> Wikimedia Foundation
>

Hi Daniel,

Thanks for the explanation. I think that I now better understand what
you're proposing. This explanation of the proposal sounds reasonable to me
in a way that my earlier understanding of the proposal did not.

By the way, I don't know what your normal work schedule is, but I usually
don't expect staff to respond to non-urgent emails over the weekend,
although I appreciate it. :) Waiting until Monday is usually fine.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Peter F. Patel-Schneider
On 10/20/18 11:57 AM, Ettore RIZZA wrote:

> From Peter F. Patel-Schneider
> Hi,
> 
> I see no reason that this [adding subclass relationships sanctioned by 
> corresponding Wikipedia pages]
>  should not be done for other groups of living
> organisms where subclass relationships are missing.  
> 
> 
> It seems very simple to me. Maybe too simple. Perhaps I am intimidated by the
> kilometers of discussions I'm reading about the taxon-centric aspect of
> Wikidata, when I'm not a biologist. So, there is no problem if we add
> that Cetacea  is a subclass of aquatic
> mammals , as indicated by
> its Wikipedia page ?
> 
> Cheers,
> 
> Ettore

How can there be any effective counter to adding these relationships?  Many
Wikidata items correspond to Wikipedia pages.   If the true information about
the Wikidata item in the corresponding pages cannot be added to the Wikidata
items, then the correspondence is not correct and should be removed.

peter

PS:  Of course, determining truth may be contentious in some cases, but these
will be a small minority.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Ettore RIZZA
Hi,

I see no reason that this should not be done for other groups of living
> organisms where subclass relationships are missing.


It seems very simple to me. Maybe too simple. Perhaps I am intimidated by
the kilometers of discussions I'm reading about the taxon-centric aspect of
Wikidata, when I'm not a biologist. So, there is no problem if we add that
Cetacea  is a subclass of aquatic
mammals , as indicated by its Wikipedia
page ?

Cheers,

Ettore

On Sat, 20 Oct 2018 at 19:20, Peter F. Patel-Schneider <
pfpschnei...@gmail.com> wrote:

> On 10/20/18 6:29 AM, Ettore RIZZA wrote:
> > For most people, ants are insects, not instances of taxon.
>
> Sure, but Wikidata doesn't have ants being instances of taxon.  Instead,
> Formicidae (aka ant) is an instance of taxon, which seems right to me.
>
> Here are some extracts from Wikidata as of a few minutes ago, also showing
> the English Wikipedia page for the Wikidata item.
>
> https://www.wikidata.org/wiki/Q7386 Formicidae  ant
> https://en.wikipedia.org/wiki/Ant
> instance of taxon
> no subclass of statement
>
> https://www.wikidata.org/wiki/Q1390 insect
> https://en.wikipedia.org/wiki/Insect
> subclass of animal
> instance of taxon
>
> What is missing is that Q7386 is a subclass of Q1390, which is sanctioned
> by
> the "Ants are eusocial insects" phrase at the start of
> https://en.wikipedia.org/wiki/Ant.  I added that statement and put as
> source
> English Wikipedia.  (By the way, how can I source a statement to a
> particular
> Wikipedia page?)
>
>
> I see no reason that this should not be done for other groups of living
> organisms where subclass relationships are missing.
>
> peter
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Peter F. Patel-Schneider
On 10/20/18 6:29 AM, Ettore RIZZA wrote:
> For most people, ants are insects, not instances of taxon.

Sure, but Wikidata doesn't have ants being instances of taxon.  Instead,
Formicidae (aka ant) is an instance of taxon, which seems right to me.

Here are some extracts from Wikidata as of a few minutes ago, also showing
the English Wikipedia page for the Wikidata item.

https://www.wikidata.org/wiki/Q7386 Formicidae  ant
https://en.wikipedia.org/wiki/Ant
instance of taxon
no subclass of statement

https://www.wikidata.org/wiki/Q1390 insect
https://en.wikipedia.org/wiki/Insect
subclass of animal
instance of taxon

What is missing is that Q7386 is a subclass of Q1390, which is sanctioned by
the "Ants are eusocial insects" phrase at the start of
https://en.wikipedia.org/wiki/Ant.  I added that statement and put as source
English Wikipedia.  (By the way, how can I source a statement to a particular
Wikipedia page?)


I see no reason that this should not be done for other groups of living
organisms where subclass relationships are missing.

peter

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Daniel Kinzler
Hi Pine, sorry for the misleading wording. Let me clarify below.

Am 19.10.18 um 9:51 nachm. schrieb Pine W:
> Hi Markus, I seem to be missing something. Daniel said, "And I think the best
> way to achieve this is to start using the ontology as an ontology on wikimedia
> projects, and thus expose the fact that the ontology is broken. This gives
> incentive to fix it, and examples as to what things should be possible using
> that ontology (namely, some level of basic inference)." I think that I
> understand the basic idea behind structured data on Commons. I also think 
> that I
> understand your statement above. What I'm not understanding is how Daniel's
> proposal to "start using the ontology as an ontology on wikimedia projects, 
> and
> thus expose the fact that the ontology is broken." isn't a proposal to add 
> poor
> quality information from Wikidata onto Wikipedia and, in the process, give
> Wikipedians more problems to fix. Can you or Daniel explain this?

What I meant in concrete terms was: let's start using wikidata items for tagging
on commons, even though search results based on such tags will currently not
yield very good results, due to the messy state of the ontology, and hope people
fix the ontology to get better search results. If people use "poodle" to tag an
image and it's not found when searching for "dog", this may lead to people
investigating why that is, and coming up with ontology improvements to fix it.

What I DON'T mean is "let's automatically generate navigation boxes for
wikipedia articles based on an imperfect  ontology, and push them on everyone".
I mean, using the ontology to generate navigation boxes for some kinds of
articles may be a nice idea, and could indeed have the same effect - that people
notice problems in the ontology, and fix them. But that would be something the
local wiki communities decide to do, not something that comes from Wikidata or
the Structured Data project.

The point I was trying to make is: the Wiki communities are rather good in
creating structures that serve their purpose, but they do so pragmatically,
along the behavior of the existing tools. So, rather than trying to work around
the quirks of the ontology in software, the software should use very simply
rules (such as following the subclass relation), and let people adopt the data
to this behavior, if and when they find it useful to do so. This approach, over
time, provides better results in my opinion.

Also, keep in mind that I was referring to an imperfect *improvement* of search.
the alternative being to only return things tagged with "dog" when searching for
"dog". I was not suggesting to degrade user experience in order to incentivize
editors. I'm rather suggesting the opposite: let's NOT give people a reason tag
images that show poodles with "poodle" and "dog" and "mammal" and "animal" and
"pet" and...

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Ettore RIZZA
Hello,

It is interesting to note that what Cparle wants are "is a" relationships
based on common sense. For most people, ants are insects, not instances of
taxon. A clarinet is a woodwind instrument, and woodwind instruments are
musical instruments, not an instance of "first order metaclass".

One of the best sources of "common sense" hypernymy is probably the first
sentence of a Wikipedia page. Whether in English, French, Italian, a woman
is always "a female *human *being."

For "poodle", this would look like (following the links in the English
version of Wikipedia):

- The poodle is a group of formal *dog breeds*

- Dog breeds are *dogs* that...

- The domestic dog (...) is a member of the genus *Canis* (canines)

- Canis is a genus of the *Canidae*

- The biological family Canidae (...) is a lineage of *carnivorans*

- Carnivora (...) is a diverse *scrotiferan *order

- Scrotifera is a clade of *placental mammals*

- Placentalia ("Placentals") is one of the three extant subdivisions of the
class of animals *Mammalia*...

- Mammals are the *vertebrates *within the class Mammalia...


>From my point of view, this classification looks much better than the
current relationships in Wikidata's ontology.

The automatic extraction of hypernymic relationships from English texts
(especially Wikipedia) has been studied for a long time and gives good
results, even with simple methods based on hand-crafted rules. In the case
of Wikipedia, the hypernym often has a page itself (and therefore a link to
Wikidata), which could simplify the NLP extraction and the mapping with
Wikidata items.

Of course, the extracted relationships will not always be "subclass of" or
"instance of". But if someone proposed a new property called "Wikipedia
Hypernyms" (and its symmetric property "Wikipedia Hyponyms"), I would use
it more willingly and with more confidence than the current system. This
would also better respect the logic of Wikidata's descriptions.

I mean, if the description of Zoroastrianism (Q9601) says this is an
"Ancient Iranian *religion *founded by Zoroaster", one would expect the
class "religion" to appear much earlier in the hierarchy of superclasses of
this item. If there was this property "Wikipedia Hypernyms", we could
mention it in the same page - since Wikipedia describes Zoroastrianism as
"one of the world's oldest *religions *that remains active." And a SPARQL
query looking for 'all items that have "religion" as "Wikipedia hypernyms"
property' would be much much faster.

Note: sorry if this reflection is naive or if it has already been
discussed/tested.

Cheers,

Ettore

On Thu, 27 Sep 2018 at 23:35, James Heald  wrote:

> This recent announcement by the Structured Data team perhaps ought to be
> quite a heads-up for us:
>
>
> https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#Searching_Commons_-_how_to_structure_coverage
>
> Essentially the team has given up on the hope of using Wikidata
> hierarchies to suggest generalised "depicts" values to store for images
> on Commons, to match against terms in incoming search requests.
>
> i.e.  if an image is of a German Shepherd dog, and identified as such,
> the team has given up on trying to infer in general from Wikidata that
> 'dog' is also a search term that such an image should score positively
> with.
>
> Apparently the Wikidata hierarchies were simply too complicated, too
> unpredictable, and too arbitrary and inconsistent in their design across
> different subject areas to be readily assimilated (before one even
> starts on the density of bugs and glitches that then undermine them).
>
> Instead, if that image ought to be considered in a search for 'dog', it
> looks as though an explicit 'depicts:dog' statement may be going to be
> needed to be specifically present, in addition to 'depicts:German
> Shepherd'.
>
> Some of the background behind this assessment can be read in
> https://phabricator.wikimedia.org/T199119
> in particular the first substantive comment on that ticket, by Cparle on
> 10 July, giving his quick initial read of some of the issues using
> Wikidata would face.
>
> SDC was considered a flagship end-application for Wikidata.  If the data
> in Wikidata is not usable enough to supply the dogfood that project was
> expected to be going to be relying on, that should be a serious wake-up
> call, a red flag we should not ignore.
>
> If the way data is organised across different subjects is currently too
> inconsistent and confusing to be usable by our own SDC project, are
> there actions we can take to address that?  Are there design principles
> to be chosen that then need to be applied consistently?  Is this
> something the community can do, or is some more active direction going
> to need to be applied?
>
> Wikidata's 'ontology' has grown haphazardly, with little oversight, like
> an untended bank of weeds.  Is some more active gardening now required?
>
>-- James.
>
>
>
> ---
> This email has been checked for viruses by AVG.

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Markus Kroetzsch

Hi Pine,

As I understood Daniel, he did not talk about inserting low quality 
content into any project, Wikipedia or other. What I believe he meant 
with "using the ontology" is to use it for improving search/discovery 
services that help editors to find something (i.e., technical 
infrastructure, not editorial content). Doing so could lead to an 
additional amount of mostly useful results, but it will not yet be 
enough to get all results that a user would intuitively expect. Maybe 
his wording made this sound a bit too dramatic -- I think he just wanted 
to emphasize the point that any actual use will immediately provide 
motivation and guidance for Wikidata editors to improve things that are 
currently imperfect.


I agree with him in that I think we need to identify ways of moving 
gradually forward, offering the small benefits we can already provide 
while creating an environment that allows the community to improve 
things step by step. If we ask for perfection before even starting, we 
will get into a deadlock where we bind editor resources in redundant 
tagging tasks instead of empowering the community to improve the 
situation in a sustainable way.


Cheers,

Markus


On 20/10/2018 06:51, Pine W wrote:



On Fri, Oct 19, 2018 at 9:47 AM Markus Kroetzsch 
mailto:markus.kroetz...@tu-dresden.de>> 
wrote:


On 19/10/2018 07:09, Pine W wrote:
 > I would appreciate clarification what is proposed with regard to
 > exposing problematic Wikidata ontology on Wikipedia. If the idea
 > involves inserting poor-quality information onto English
Wikipedia in
 > order to spur us to fix problems with Wikidata, then I am likely to
 > oppose it. English Wikipedia is not an endless resource for free
labor,
 > and we have too few skilled and good-faith volunteers to handle our
 > already enormous scope of work.

You are right, and thankfully this is not what is proposed. The
proposal
was to offer people who search for Commons media the (maybe optional)
possibility to find more results by letting the search engine traverse
the "more-general-than" links stored in Wikidata. People have
discovered
cases where some of these links are not correct (surprise! it's a wiki
;-), and the suggestion was that such glitches would be fixed with
higher priority if there would be an application relying on it. But
even
with some wrong links, the results a searcher would get would still
include mostly useful hits. Also, at least half of the currently
observed problems with this approach would lead to fewer results (e.g.,
dogs would be hard to include automatically to a search for all
mammals), but in such cases the proposed extension would simply do what
the baseline approach (ignoring the links) would do anyway, so service
would not get any worse. Also, the manual workarounds suggested by some
(adding "mammal" to all pictures of some "dog") would be compatible
with
this, so one could do both to improve search experience on both ends.

Best regards,

Markus


Hi Markus, I seem to be missing something. Daniel said, "And I think the 
best way to achieve this is to start using the ontology as an ontology 
on wikimedia projects, and thus expose the fact that the ontology is 
broken. This gives incentive to fix it, and examples as to what things 
should be possible using that ontology (namely, some level of basic 
inference)." I think that I understand the basic idea behind structured 
data on Commons. I also think that I understand your statement above. 
What I'm not understanding is how Daniel's proposal to "start using the 
ontology as an ontology on wikimedia projects, and thus expose the fact 
that the ontology is broken." isn't a proposal to add poor quality 
information from Wikidata onto Wikipedia and, in the process, give 
Wikipedians more problems to fix. Can you or Daniel explain this?


Separately, someone wrote to me off list to make the point that 
Wikipedians who are active in non-English Wikipedias also wouldn't 
appreciate having their workloads increased by having a large quantity 
poor-quality information added to their edition of Wikipedia. I think 
that one of the person's concerns is that my statement could have been 
interpreted as implying something like "it's okay to insert poor-quality 
information on non-English Wikipedias because their standards are 
lower". I apologize if I gave the impression that I would approve of a 
non-English language edition of Wikipedia being on the receiving end of 
an unwelcome large addition of information that requires significant 
effort to clean up. Hopefully my response here will address the concerns 
that I heard off list, and if not then I welcome additional feedback.


Thanks,

Pine
( https://meta.wikimedia.org/wiki/User:Pine )

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Thomas Douillard
There is already stuffs to handle this kind of « mutex » on Wikidata :
"disjoint union of", see for example in usage on htps://
www.wikidata.org/wiki/Q180323 . The statements are used on the talk page by
templates that uses them to generate queries to find instances that violate
the mutex : https://www.wikidata.org/wiki/Talk:Q180323 (for example This
query

, that does not find anything unsurprisingly because I don’t expect to find
a lot of vertebra instances on Wikidata :) )

Le sam. 20 oct. 2018 à 12:09, Thad Guidry  a écrit :

> Hi All,
>
> Just to address what Markus was hinting at with inference rules. Both
> positive and negative rules could be stored.  Back in the Freebase days, we
> had those and were called "mutex's".  We used them for "type incompatible"
> hints to users and stored those "type incompatible" mutex rules in the
> knowledge graph. (Freebase being a Type based system along with having
> Properties under each Type)
>
> Such as:  ORGANIZATION != SPORT
>
> You actually have all those type incompatible mutexs in the Freebase dumps
> handed to you where you could start there.  The biggest one was called "Big
> Momma Mutex".
> Here is an archived email thread to give further context:
> https://freebase.markmail.org/thread/z5o7nlnb62n5t22o
>
> Anyways, the point is that those rules worked well for us in Freebase and
> I can see rules also working wonders in various ways in Wikidata as well.
> Maybe its just a mutex at each class ? Where multiple statements could
> hold rules ?
>
> Thad
> +ThadGuidry 
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

2018-10-20 Thread Thad Guidry
Hi All,

Just to address what Markus was hinting at with inference rules. Both
positive and negative rules could be stored.  Back in the Freebase days, we
had those and were called "mutex's".  We used them for "type incompatible"
hints to users and stored those "type incompatible" mutex rules in the
knowledge graph. (Freebase being a Type based system along with having
Properties under each Type)

Such as:  ORGANIZATION != SPORT

You actually have all those type incompatible mutexs in the Freebase dumps
handed to you where you could start there.  The biggest one was called "Big
Momma Mutex".
Here is an archived email thread to give further context:
https://freebase.markmail.org/thread/z5o7nlnb62n5t22o

Anyways, the point is that those rules worked well for us in Freebase and I
can see rules also working wonders in various ways in Wikidata as well.
Maybe its just a mutex at each class ? Where multiple statements could hold
rules ?

Thad
+ThadGuidry 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata