Neunhoef added a comment.

@Smalyshev: Please also note: in my tests today the 4 indexes for 3M documents 
needed 672 MB of data, which is a reasonable amount of 226 bytes per document. 
That should be about 3GB (optimistically!) for your 16M documents (forgetting 
about edges!) If you really have 2000 indexes then you would have to calculate 
with 1.5 TB of index data, which is probably totally impossible. Therefore it 
will only be possible with sparse indexes and your attriibutes really have to 
be sparse (for each attribute only a low percentage of documents has a value 
set). Then you will have considerably lower memory usage for the indexes. 
However, this will obviously also reduce the insertion time. So your 30s per 
index (for 16M) and thus the 16h are unrealistic in this scenario. I would 
imagine that the insertion time will be linear with the amount of memory the 
indexes use, and thus under control.

So in short: Either your sparsity assumptions hold and you use sparse indexes, 
or you are doomed anyway, simply because of the memory requirements. With 
sparsity, the startup time should not be such an issue.


TASK DETAIL
  https://phabricator.wikimedia.org/T88549

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Neunhoef
Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, 
Wikidata-bugs, aude, GWicke, daniel



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to