Manybubbles added a comment.

In https://phabricator.wikimedia.org/T88549#1020635, @Neunhoef wrote:

> Disclaimer: Sorry, I forgot to introduce myself: My name is Max and I also 
> work for ArangoDB.


Thanks!

> The 3M documents need around 11 GB of main memory. If you have less, then you 
> see a lot of swapping, because the insert operation in the indexes will 
> essentially do random accesses to the data files, since the indexed attribute 
> data are not copied into the index but remain in the data files.  This 
> explains why you needed over an hour on an 8GB machine (which almost 
> certainly does not have 8GB free!).


OK.  Maybe its just a function of not using a super nice machine for testing.  
We really do want the system to scale down to work with less ram and cheap, big 
spinning disks so folks can run it on their laptop.  That'll encourage 
experimentation.

Assuming we're OK with just planning for large server deployments: Does the 
memory requirement scale linearly with the size of the data?  How does that 
play with sharding and replication?  How large are the largest ArangoDB 
clusters?

Do you think sparse indexes are going to give us enough performance thousands 
of these indexes with similar startup times to what we see now?  Is that 
something we can hack around by using fewer indexes and searching them in more 
interesting ways?  Say we just make one index for all the badges and for every 
document with a badge we index an entry for its wiki, the badge name, and its 
wiki_its badge name.  That way I can query all entries with "enwiki" badges.  
Or all entries with "featured" badges.  Or all entries with "enwiki_featured" 
badges.  We might be able to play similar clever tricks with the attributes.


TASK DETAIL
  https://phabricator.wikimedia.org/T88549

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Manybubbles
Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, 
Wikidata-bugs, aude, GWicke, daniel



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to