Smalyshev added a comment.

@JanZerebecki We can query against any data but some queries I assume would be 
more frequent than others. For example, if we have "country population" stored 
as array of qualified numbers, then the query "what are the most populous 
countries" would require scanning each such array for each country. Regular 
Titan indexes would not support such scans, most probably, because they can not 
make sense of a complex data structure that population property is now - with 
multiple values, qualifiers, preferences, etc. Not sure if Elastic can handle 
such structures efficiently either, since it would require non-trivial logic to 
extract indexable values. That's why I am thinking about having a single value 
- probably in addition to having the full data - which can be used for such 
queries, indexed, etc. 

Of course, you are correct in noting that not every property would have such 
unique value - some of them, like "educated at", would not have any dedicated 
values and as such will be always handled using the full set of data. I think 
we still can handle such cases (query examples for verifications are most 
welcome). However, I think this should not preclude us from having optimization 
for the common case in case we do know how to handle it more efficiently. It 
would work for some cases and be useless for some others - so discretion should 
still be used when writing the actual queries - but at least we would have an 
option for the fast way.

TASK DETAIL
  https://phabricator.wikimedia.org/T76373

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

To: Smalyshev
Cc: Smalyshev, Manybubbles, GWicke, JanZerebecki, aude, Lydia_Pintscher, 
jkroll, Wikidata-bugs, daniel



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to