Fceller added a comment. Maybe we can take a step back and ignore the ArangoDB specifics for the moment. I'm also organising NoSQL conferences and consulting NoSQL in general.
Still, I must admit that I'm not familiar with the internal data model of Wikipedia. I've checked with George Washington (Q23) that he as a lot of properties associated with him. However, I fail to see how the traversals you mentioned are defined. For example "Give me the list of countries sorted by population?". How does the data model look like? "population" is an attribute of "country"? All countries are connected to a "special" node via an edge? Or are country identified by a special property? If there is a special "world" node containing edges to all countries, then there is no need for indexes. If there is a "world" node connecting everything and you need to filter the edges, indexes might help. In general, graph model are very useful if you have paths of different length occurring in your query. For example, find a descendent with a given property. On the other hand, if your path always has a fixed length, it will be much faster to use some sort of indexes. Graph queries are fast, if you have natural start node. If you have to find nodes with a given property, it is much better to use document databases (see you example "find a city with a female mayor". Sometimes it is possible to combine both approaches. For examples, find cities with female mayors and then do a traversal from these cities. That is what I coin "multi-model". To be able to switch between models in an query. It is different from multi-personality approaches, where you have a database engine, that can be used as a document store or as a graph store - but not as both. Having said that, I currently would not know which solution I would recommend to you. I'm sure I do not completely understand you data model and where graph are useful and where they are a hindrance. The same is true for the hardware. On one hand you want cheap hardware and spinning disk preferably even on a single node, on the other hand you dataset might require a cluster setup. Some of these requirements could be fulfilled by ArangoDB, some we would need to improve stuff (like finishing the spare indexes). On the other hand you might be better of with something like Elastic Search (if the graph searches are mostly fixed paths) or a combination. TASK DETAIL https://phabricator.wikimedia.org/T88549 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, Fceller Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, aude, GWicke, daniel _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
