Smalyshev added a comment. @JanZerebecki We can query against any data but some queries I assume would be more frequent than others. For example, if we have "country population" stored as array of qualified numbers, then the query "what are the most populous countries" would require scanning each such array for each country. Regular Titan indexes would not support such scans, most probably, because they can not make sense of a complex data structure that population property is now - with multiple values, qualifiers, preferences, etc. Not sure if Elastic can handle such structures efficiently either, since it would require non-trivial logic to extract indexable values. That's why I am thinking about having a single value - probably in addition to having the full data - which can be used for such queries, indexed, etc.
Of course, you are correct in noting that not every property would have such unique value - some of them, like "educated at", would not have any dedicated values and as such will be always handled using the full set of data. I think we still can handle such cases (query examples for verifications are most welcome). However, I think this should not preclude us from having optimization for the common case in case we do know how to handle it more efficiently. It would work for some cases and be useless for some others - so discretion should still be used when writing the actual queries - but at least we would have an option for the fast way. TASK DETAIL https://phabricator.wikimedia.org/T76373 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. To: Smalyshev Cc: Smalyshev, Manybubbles, GWicke, JanZerebecki, aude, Lydia_Pintscher, jkroll, Wikidata-bugs, daniel _______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs