Oh, real-live example for "short automatic descriptions" (same code as the
API) vs. manual ones: Searching for "Peter" on Wikidata, with autodesc
gadget:
https://twitter.com/MagnusManske/status/564782161845551104


On Mon Feb 09 2015 at 13:09:27 Magnus Manske <[email protected]>
wrote:

> On Mon Feb 09 2015 at 13:00:35 Daniel Kinzler <[email protected]>
> wrote:
>
>>
>> Since wb_terms has one row per term, and a field for the term type, it
>> would be
>> simple enough to inject "auto-descriptions". The only issue is that
>> wb_terms is
>> already pretty huge, and adding automatic descriptions in *all* languages
>> would
>> likely bloat it a lot more. Language variants could be omitted, but still
>> -
>> that's a lot of data...
>>
>> It would be a quick'n'dirty solution. But it highlights an issue: We'd
> have the same problem with manual descriptions, if they were to arrive in
> large numbers.
>
> There's always Yet Another Table. Maybe a description would be generated
> on-the-fly only if a Wikidata page is visited in a language, and removed
> after ~1 month of "non-viewing"? That should keep the table short enough,
> but would require extra effort for API calls and dumps, provided those
> should show descriptions for /all/ languages.
>
> Then again there's the Labs hadoop cluster, used for Analytics IIRC. That
> sounds like a way to process and store vast amounts of small,
> self-contained datasets (description strings). Would tie the solution to
> Wikimedia, though, and require a lot of engineering effort to get started.
>
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to