On 01/07/14 22:33, David Cuenca wrote:
Markus, could your algorithm work together with human direction? Like, if we entered which properties are common for a class, and then a user creates an instance of that class, would the algorithm be able to sort those properties based on how often they appear on the database?
My algorithm is all about *detecting* "which properties are common for a class". If you want this to be entered by humans instead, that's fine too, but then you don't need an algorithm. Sorting a list of properties by how often they appear in the database is easy to do. My algorithm does not do this though, because the most often used property is usually not the most intersting one (for instance, many classes are related with Freebase IDs, but you don't want this to be the first suggestion you get; I want the things that are "special" for the instances of a class as compared to the rest of the data, not the things that are most common overall).
Cheers, Markus
Thanks, Micru On Tue, Jul 1, 2014 at 10:23 PM, Markus Krötzsch <[email protected] <mailto:[email protected]>> wrote: On 01/07/14 22:14, Markus Krötzsch wrote: ... (2) "Grade I listed building" http://tools.wmflabs.org/__wikidata-exports/miga/?__classes#_cat=Classes/Id=__Q15700818 <http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id=Q15700818> Related properties: English Heritage list number, masts, Minor Planet Center observatory code, home port, coordinate location, OS grid reference, mother house, architect, manager/director, Emporis ID, MusicBrainz place ID, country, architectural style, visitors per year, Commons category, Structurae ID (structure), officially opened by, floors above ground, inspired by, religious order, number of platforms, street, owned by, diocese These are computed fully automatically from the data, with no manual filtering or user input. But don't get me wrong -- great work! Brilliant to have such a thing integrated into the UI. In any case, my algorithm for computing the related properties is certainly very different from theirs; I am sure it also has its glitches. P.S. One weakness of my algorithm you can already see: it has troubles estimating the relevance of very rare properties, such as "Minor Planet Center observatory code" above. A single wrong annotation may then lead to wrong suggestions. Also, it seems from my list under (2) that some Grade I listed buildings are ships. This seems to be an error that is amplified by the fact that property "masts" is used only 11 times in the dataset I evaluated (last week's data). I guess the new property suggester rather errs on the other side, being tricked into suggesting very frequent properties even in places that don't need them. -- Markus _________________________________________________ Wikidata-l mailing list [email protected] <mailto:[email protected]> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l> -- Etiamsi omnes, ego non
_______________________________________________ Wikidata-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-l
