Neunhoef added a comment. @Smalyshev: Please also note: in my tests today the 4 indexes for 3M documents needed 672 MB of data, which is a reasonable amount of 226 bytes per document. That should be about 3GB (optimistically!) for your 16M documents (forgetting about edges!) If you really have 2000 indexes then you would have to calculate with 1.5 TB of index data, which is probably totally impossible. Therefore it will only be possible with sparse indexes and your attriibutes really have to be sparse (for each attribute only a low percentage of documents has a value set). Then you will have considerably lower memory usage for the indexes. However, this will obviously also reduce the insertion time. So your 30s per index (for 16M) and thus the 16h are unrealistic in this scenario. I would imagine that the insertion time will be linear with the amount of memory the indexes use, and thus under control.
So in short: Either your sparsity assumptions hold and you use sparse indexes, or you are doomed anyway, simply because of the memory requirements. With sparsity, the startup time should not be such an issue. TASK DETAIL https://phabricator.wikimedia.org/T88549 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, Neunhoef Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, aude, GWicke, daniel _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
