[Wikidata-bugs] [Maniphest] [Commented On] T220883: Wikidata JSON dumps should include Lexemes

hoo Tue, 23 Apr 2019 00:50:48 -0700

hoo added a comment.


  > While I understand your point, I fair that isolating some data from the 
main dump is only a temporary solution to its size growth. Sooner or later, it 
will weight 1 TB (even compressed), and we'll have to deal with this (as 
producer or as consumer).
  
  We indeed might need to take further steps due to the size of the dumps in 
the future, but nevertheless not all consumers will be interested in Lexemes 
(or even Items), thus it makes sense to distribute them separately. Also as 
"merging" the individual dumps is fairly trivial for a consumer, I don't think 
this will be very obstructive.
  
  > Will the lexemes dumps contain the P namespace, or will the consumer have 
to additionally download the other complete dump to get the data about 
properties?
  
  It will not contain properties, thus the other JSON dump will still be 
needed. In the future we might also add a properties only JSON dump.
  
  > Will there be one dump per namespace (one for P, one for Q, one for L)?
  
  In the long run, we will probably have on dump per entity type, yes.

TASK DETAIL
  https://phabricator.wikimedia.org/T220883

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Aklapper, hoo, Lydia_Pintscher, Envlh, alaa_wmde, Nandana, Mringgaard, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, 
Jonas, Wikidata-bugs, aude, Svick, Darkdadaah, Mbch331, jeremyb

_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Commented On] T220883: Wikidata JSON dumps should include Lexemes

Reply via email to