Manybubbles added a comment. In https://phabricator.wikimedia.org/T88549#1020635, @Neunhoef wrote:
> Disclaimer: Sorry, I forgot to introduce myself: My name is Max and I also > work for ArangoDB. Thanks! > The 3M documents need around 11 GB of main memory. If you have less, then you > see a lot of swapping, because the insert operation in the indexes will > essentially do random accesses to the data files, since the indexed attribute > data are not copied into the index but remain in the data files. This > explains why you needed over an hour on an 8GB machine (which almost > certainly does not have 8GB free!). OK. Maybe its just a function of not using a super nice machine for testing. We really do want the system to scale down to work with less ram and cheap, big spinning disks so folks can run it on their laptop. That'll encourage experimentation. Assuming we're OK with just planning for large server deployments: Does the memory requirement scale linearly with the size of the data? How does that play with sharding and replication? How large are the largest ArangoDB clusters? Do you think sparse indexes are going to give us enough performance thousands of these indexes with similar startup times to what we see now? Is that something we can hack around by using fewer indexes and searching them in more interesting ways? Say we just make one index for all the badges and for every document with a badge we index an entry for its wiki, the badge name, and its wiki_its badge name. That way I can query all entries with "enwiki" badges. Or all entries with "featured" badges. Or all entries with "enwiki_featured" badges. We might be able to play similar clever tricks with the attributes. TASK DETAIL https://phabricator.wikimedia.org/T88549 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, Manybubbles Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, aude, GWicke, daniel _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
