[I worry we're talking about operational details, which should be a wider
discussion, rather than a technology/feasibility conversation to which this
list is more suited. Perhaps moving this on-wiki would be best?]

On 9 May 2013 09:28, Brad Jorsch <bjor...@wikimedia.org> wrote:

> On Wed, May 8, 2013 at 10:47 PM, James Forrester
> <jforres...@wikimedia.org> wrote:
> > * Pages are implicitly in the parent categories of their explicit
> categories
> > * -> Pages in <Politicians from the Netherlands> are in <People from the
> > Netherlands by profession> (its first parent) and <People from the
> > Netherlands> (its first parent's parent) and <Politicians> (its second
> > parent) and <People> (its second parent's parent) and …
> > * -> Yes, this poses issues given the sometimes cyclic nature of
> > categories' hierarchies, but this is relatively trivial to code around
>
> Category cycles are the least of it. The fact that the existing
> category hierarchy isn't based on any sensible-for-inference ontology
> is a bigger problem.
>
> Let's consider what would happen to one of my favorite examples on enwiki:
> * The article for Romania is in <Black Sea countries>. Ok.
> * And that category is in <Black Sea>, so Romania is in that too.
> Which is a little strange, but not too bad.
> * And <Black Sea> is in <Seas of Russia> and <Landforms of Ukraine>.
> Huh? Romania doesn't belong in either of those, despite that being
> equivalent to your example where pages in <Politicians from the
> Netherlands> also end up in <People> via <Politicians>.
>
> And it gets worse the further up you go. You would have Romania in
> <Liquids> a few more levels up.
>
> For this to work, each wiki would have to redo its category hierarchy
> as a real ontology based on is-a relationships, rather than the
> current is-somehow-related-to. Or we would have to introduce some
> magic word or something to tell MediaWiki that <Politicians> is-a
> <People> is a valid inference while <Black Sea countries> is-a <Black
> Sea> isn't.
>
> In other words, code-wise adding "tags" to an article is the same as
> categories with inference and querying. But trying to use the existing
> category setup as it exists on something like enwiki as "tags" for
> inference (or querying, to a lesser extent) seems like GIGO.
>

Quite - the bit of my proposal where the categories would get created on
Wikidata from scratch as a synthesis of the needs of the editing community.
:-)

Implicitly, these would have clear semantics about the correctitude of
their usage governed by something analogous to how Wikidata's community are
managing the roll-out of statements on the system. In terms of tools to
prevent this becoming an issue, Wikidata's nature means we could easily
make sure that the domain of a category would be limited (e.g. "Fluids"
maps to "substances", not "instances of substances").



> > * Readers can search, querying across categories regardless of whether
> > they're implicit or explicit
> > * -> A search for the intersection of <People from the Netherlands> with
> > <Politicians> will effectively return results for <Politicians from the
> > Netherlands> (and the user doesn't need to know or care that this is an
> > extant or non-extant category)
>
> A person who is originally from the Netherlands but moved to Germany
> and became a politician there would be in <People from the
> Netherlands> and <Politicians>, but maybe should not be in
> <Politicians from the Netherlands> depending on how exactly you define
> that category.
>

​Indeed; I deliberately chose to use <Politicians from the Netherlands>
rather than <Politicians of the Netherlands>​ or <Politicians in the
Netherlands> which are distinct categories with entirely different
semantics, but you're right that semantics would need to be clear.

​J.
-- 
James D. Forrester
Product Manager, VisualEditor
Wikimedia Foundation, Inc.

jforres...@wikimedia.org | @jdforrester
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to