Smalyshev added a comment.

@Neunhoef In current data model, each edge carries a primary value, a boolean 
flag and a small set (usually well under 10, in most cases 1-3 or none) 
secondary values, each of which need to be indexed. It also can keep a set of 
auxiliary values for each of those, which are not indexed but may be used when 
filtering the results in complex lookups.  The edges can, of course, be 
converted to nodes linked by property-less (or almost property-less) edges - 
the current model is used because of Titan storage model that stores nodes and 
edges together, so looking up edge property is much cheaper than traversing to 
a different node.

The query load would probably be both targeted lookups (is X a human? Is X 
alive or dead? Who is the current president of country X?), wider traversals 
(Give me the list of all ape species? Give me the list of countries sorted by 
population?) and even more wider lists (Give me the list of people born before 
1800 that have no date of death? Give me the list of all female British 
writers?), etc. So the intent it to make indexed lookups very fast, but there 
may be a need to navigate a significant number of nodes, and unfortunately 
there is no any "natural" way of sharding the data as far as I can see.


TASK DETAIL
  https://phabricator.wikimedia.org/T88549

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, 
Wikidata-bugs, aude, GWicke, daniel



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to