hoo added a comment.
> While I understand your point, I fair that isolating some data from the main dump is only a temporary solution to its size growth. Sooner or later, it will weight 1 TB (even compressed), and we'll have to deal with this (as producer or as consumer). We indeed might need to take further steps due to the size of the dumps in the future, but nevertheless not all consumers will be interested in Lexemes (or even Items), thus it makes sense to distribute them separately. Also as "merging" the individual dumps is fairly trivial for a consumer, I don't think this will be very obstructive. > Will the lexemes dumps contain the P namespace, or will the consumer have to additionally download the other complete dump to get the data about properties? It will not contain properties, thus the other JSON dump will still be needed. In the future we might also add a properties only JSON dump. > Will there be one dump per namespace (one for P, one for Q, one for L)? In the long run, we will probably have on dump per entity type, yes. TASK DETAIL https://phabricator.wikimedia.org/T220883 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: Aklapper, hoo, Lydia_Pintscher, Envlh, alaa_wmde, Nandana, Mringgaard, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Jonas, Wikidata-bugs, aude, Svick, Darkdadaah, Mbch331, jeremyb
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
