There's another road to ontology of labels which is connected with the
kind of roles that labels play in systems.

One need is that a system wants to mention something or draw something
and otherwise refer to something and it needs to know what to call it.
 Another need is that you have a phrase and you want to find things
with a matching label.  Then there's the more general problem that the
user has something in his head and you want to specify it.

In terms of acceptance of labels you want the system to accept a wide
range of possible names people would use for something (I think in
Wikidata scope) but to make the most of that you need a good estimator
of the probability that a particular surface form used in a particular
context refers to this or that and that is probably out of scope.

You want to accept labels you wouldn't want to generate.  A tendency
to generate ethnic, racial and other kinds of slurs is a showstopper
for any public commercial application.  A.I.'s are like people;  some
of them are more prone to potty mouth than others,  you can't count on
good behavior unless you train your animals.  Thus,  offensive labels
should be tagged.

Similar choices appear in different contexts.  I live in New York and
if you look at legal documents they always say "New York State" or
"New York City" but if you drive onto the Thruway from Pennsylvania
you will see "Welcome to New York" and then a distance sign that says
New York is 490 miles away.  Sometimes you want the latin name of an
organism and sometimes you want the common name.  You might want to
speak of pharmaceuticals always using the generic name (Omeprazole)
rather than a brand (Prilosec).  Sometimes you want to use
abbreviations (RDF) and other times you want to spell things out
(Resource Description Framework).  If you want to make something
visually tight you need to control label length

http://carpictures.cc/cars/photo/

A superhuman system would certainly contain statistical models,  but a
lot of the knowledge needed to do the above could be encoded as
properties of the labels.

ᐧ

On Fri, Jun 6, 2014 at 1:57 PM, Gerard Meijssen
<[email protected]> wrote:
> Hoi,
> In a different conversation it was put like this: "Wikipedia is what it is
> and Wikidata is what it is". This was in the context of assumptions.
> Thanks,
>       GerardM
>
>
> On 6 June 2014 16:59, Daniel Kinzler <[email protected]> wrote:
>>
>> Am 06.06.2014 15:44, schrieb Gerard Meijssen:
>> > Hoi,
>> > That is exactly the point. Once you assume that they are the same you
>> > ignore the
>> > extend to which they are not. Many, many items have articles pointing to
>> > items
>> > resulting in labels that are not exactly the same subject.
>>
>> And these are mistakes that should be fixed. So?
>>
>>
>> --
>> Daniel Kinzler
>> Senior Software Developer
>>
>> Wikimedia Deutschland
>>
>> Gesellschaft zur Förderung Freien Wissens e.V.
>>
>> _______________________________________________
>> Wikidata-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>
>
> _______________________________________________
> Wikidata-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>



-- 
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   [email protected]

_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to