Hoi,
It is pointless to include automated descriptions when they are then saved
in a fixed form. The point of automated descriptions is exactly that they
change as new statements are made. This is one reason why they are superior
to manual descriptions. The other is that when one label is added in a
language, it immediately affects all items that include the associated item.

When the argument is that external users need the best descriptions
available at whatever time, it is best to have the automated descriptions
separate. We have enough experience of the disruption caused by failing
dumps. Given that there is a need for descriptions for off line usage, it
makes sense to consider caching such a file and removing the content that
is changed and have it regenerated in a batch process. When a description
is needed it can always be generated there and then. These can be used
interactively as well.
Thanks,
      GerardM

On 9 February 2015 at 13:21, Markus Kroetzsch <
[email protected]> wrote:

> Hi Magnus, hi Daniel,
>
> I don't think file size should be our primary concern here. What may seem
> big today will be negligible in a few years. Having all data in one place
> is just easier to work with. I am happy to wait for another 30min for a
> download if it saves me from implementing another Web service connector in
> my own code. Compute time is cheap, disk space is cheap, human labour is
> expensive.
>
> Maybe the whole size discussion is a bit of a red herring here anyway. If
> we are worried about file size, there would maybe be better ways of
> reducing it. We can split the contents into several smaller dump files, not
> just for descriptions. We are already doing this when creating RDF dumps,
> and it would be easy for us to do the same for JSON. We could do this
> immediately if someone needs it (just let me know and we will set it up for
> you). However, if we want to provide smaller files, a more effective method
> would be to split by language rather than by term type: all labels in all
> languages would still be much bigger than labels+descriptions+aliases in
> English only, and many applications will not need labels in 300 languages.
>
> Anyway, as I said, I do not mind whether the auto-descriptions are stored
> like normal descriptions or whether they are added to the dump files "last
> minute" when generating them. I just need the descriptions in the dumps.
>
> Cheers,
>
> Markus
>
> On 09.02.2015 12:28, Daniel Kinzler wrote:
>
>> Am 09.02.2015 um 12:25 schrieb Magnus Manske:
>>
>>> But wouldn't it be better to keep the dump as it is, for those who don't
>>> want
>>> triple size (just inventing a number here), and have one separate, or
>>> even
>>> per-language, dump with just the automated descriptions, for those who
>>> want that?
>>>
>>
>> Possibly. Depends on how much more data this would actually be. Which also
>> depends on whether we would omit descriptions in languages that can
>> easily be
>> covered by language fallback (e.g. no separate descriptions in de-ch and
>> de-at).
>>
>>
>>
>
> --
> Markus Kroetzsch
> Faculty of Computer Science
> Technische Universität Dresden
> +49 351 463 38486
> http://korrekt.org/
>
> _______________________________________________
> Wikidata-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to