There's another road to ontology of labels which is connected with the kind of roles that labels play in systems.
One need is that a system wants to mention something or draw something and otherwise refer to something and it needs to know what to call it. Another need is that you have a phrase and you want to find things with a matching label. Then there's the more general problem that the user has something in his head and you want to specify it. In terms of acceptance of labels you want the system to accept a wide range of possible names people would use for something (I think in Wikidata scope) but to make the most of that you need a good estimator of the probability that a particular surface form used in a particular context refers to this or that and that is probably out of scope. You want to accept labels you wouldn't want to generate. A tendency to generate ethnic, racial and other kinds of slurs is a showstopper for any public commercial application. A.I.'s are like people; some of them are more prone to potty mouth than others, you can't count on good behavior unless you train your animals. Thus, offensive labels should be tagged. Similar choices appear in different contexts. I live in New York and if you look at legal documents they always say "New York State" or "New York City" but if you drive onto the Thruway from Pennsylvania you will see "Welcome to New York" and then a distance sign that says New York is 490 miles away. Sometimes you want the latin name of an organism and sometimes you want the common name. You might want to speak of pharmaceuticals always using the generic name (Omeprazole) rather than a brand (Prilosec). Sometimes you want to use abbreviations (RDF) and other times you want to spell things out (Resource Description Framework). If you want to make something visually tight you need to control label length http://carpictures.cc/cars/photo/ A superhuman system would certainly contain statistical models, but a lot of the knowledge needed to do the above could be encoded as properties of the labels. ᐧ On Fri, Jun 6, 2014 at 1:57 PM, Gerard Meijssen <[email protected]> wrote: > Hoi, > In a different conversation it was put like this: "Wikipedia is what it is > and Wikidata is what it is". This was in the context of assumptions. > Thanks, > GerardM > > > On 6 June 2014 16:59, Daniel Kinzler <[email protected]> wrote: >> >> Am 06.06.2014 15:44, schrieb Gerard Meijssen: >> > Hoi, >> > That is exactly the point. Once you assume that they are the same you >> > ignore the >> > extend to which they are not. Many, many items have articles pointing to >> > items >> > resulting in labels that are not exactly the same subject. >> >> And these are mistakes that should be fixed. So? >> >> >> -- >> Daniel Kinzler >> Senior Software Developer >> >> Wikimedia Deutschland >> >> Gesellschaft zur Förderung Freien Wissens e.V. >> >> _______________________________________________ >> Wikidata-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > > _______________________________________________ > Wikidata-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype [email protected] _______________________________________________ Wikidata-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-l
