Smalyshev added a comment. @Neunhoef In current data model, each edge carries a primary value, a boolean flag and a small set (usually well under 10, in most cases 1-3 or none) secondary values, each of which need to be indexed. It also can keep a set of auxiliary values for each of those, which are not indexed but may be used when filtering the results in complex lookups. The edges can, of course, be converted to nodes linked by property-less (or almost property-less) edges - the current model is used because of Titan storage model that stores nodes and edges together, so looking up edge property is much cheaper than traversing to a different node.
The query load would probably be both targeted lookups (is X a human? Is X alive or dead? Who is the current president of country X?), wider traversals (Give me the list of all ape species? Give me the list of countries sorted by population?) and even more wider lists (Give me the list of people born before 1800 that have no date of death? Give me the list of all female British writers?), etc. So the intent it to make indexed lookups very fast, but there may be a need to navigate a significant number of nodes, and unfortunately there is no any "natural" way of sharding the data as far as I can see. TASK DETAIL https://phabricator.wikimedia.org/T88549 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, aude, GWicke, daniel _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
