Neunhoef added a comment. Disclaimer: Sorry, I forgot to introduce myself: My name is Max and I also work for ArangoDB.
Analysis The 3M documents need around 11 GB of main memory. If you have less, then you see a lot of swapping, because the insert operation in the indexes will essentially do random accesses to the data files, since the indexed attribute data are not copied into the index but remain in the data files. This explains why you needed over an hour on an 8GB machine (which almost certainly does not have 8GB free!). Given enough RAM, this effect does not happen and the building of the indexes is considerably faster, well below the 10 minutes given as upper limit. Interesting is that the second experiment suggests that it is the skiplist index that is essentially taking the time, which is not surprising since inserting into a skiplist of length N has complexity O(log(N)). My guess is that the "sitelist.enwiki.badges" attribute is considerably sparser in this dataset, therefore the skiplist will quite often insert in the first position (inserting "null"). Once we have sparse indexes (we try to show you soon the first version of this to experiment with), the time for insertion of a document without a certain attribute into the corresponding index should be considerably faster. TASK DETAIL https://phabricator.wikimedia.org/T88549 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, Neunhoef Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, aude, GWicke, daniel _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
