Neunhoef added a comment.

Disclaimer: Sorry, I forgot to introduce myself: My name is Max and I also work 
for ArangoDB.

Analysis

The 3M documents need around 11 GB of main memory. If you have less, then you 
see a lot of swapping, because the insert operation in the indexes will 
essentially do random accesses to the data files, since the indexed attribute 
data are not copied into the index but remain in the data files.  This explains 
why you needed over an hour on an 8GB machine (which almost certainly does not 
have 8GB free!).

Given enough RAM, this effect does not happen and the building of the indexes 
is considerably faster, well below the 10 minutes given as upper limit. 
Interesting is that the second experiment suggests that it is the skiplist 
index that is essentially taking the time, which is not surprising since 
inserting into a skiplist of length N has complexity O(log(N)).

My guess is that the "sitelist.enwiki.badges" attribute is considerably sparser 
in this dataset, therefore the skiplist will quite often insert in the first 
position (inserting "null"). Once we have sparse indexes (we try to show you 
soon the first version of this to experiment with), the time for insertion of a 
document without a certain attribute into the corresponding index should be 
considerably faster.


TASK DETAIL
  https://phabricator.wikimedia.org/T88549

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Neunhoef
Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, 
Wikidata-bugs, aude, GWicke, daniel



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to