Lucas_Werkmeister_WMDE added a comment.

  I looked a bit more into this, and it turns out that SearchEntities doesn’t 
support continuation all that well – basically, it asks the underlying search 
backend for //offset + limit + 1// results, then returns the //[offset, 
offset+limit)// slice of that. Clearly, this isn’t very efficient for larger 
and larger offsets, which is why the API won’t //return// offsets higher than 
the standard API limit (50) for continuation. However, it won’t stop you from 
specifying larger limits yourself, potentially asking the search backend for 
arbitrarily large numbers of results.
  
  Fortunately, this isn’t actually a denial-of-service vulnerability, because 
both search backends cap the limit: ElasticSearch to 10,000 (reference 
<https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-from-size>),
 `wb_terms` to 2500 (source 
<https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Wikibase/+/c8fbefa4fd9ce869ffd228ac1fe236c54fdb78bf/lib/includes/Interactors/TermSearchOptions.php#14>).
 (Depending on backend, you might get some more results than that if you search 
in a non-English language and don’t specify the `strictlanguage` parameter, in 
which case the search may be retried in fallback languages.) Also, in the 
`wb_terms`-backed search, that limit doesn’t directly correspond to the SQL 
`LIMIT` anyways, because `getTopMatchingTerms()` always searches for 2500 terms 
in the database and then sorts and limits them in PHP. ElasticSearch, 
meanwhile, supports a separate `offset` parameter, and we really should be 
using it instead of just adding our offset to the limit we pass into the search 
backend.
  
  We don’t //have// to fix this as part of this task, but if we don’t, then 
continuation isn’t really appealing: we can duplicate what `EntitySearch` does 
right now, and artificially abort the continuation at the standard API limit 
(50), in which case it’s no better than just specifying `limit=max` (in fact, 
if you’re a bot, that would get you //more// results, as your `max` is higher); 
or we can always offer continuation, and requests will start taking longer and 
longer (in production, 
`action=wbsearchentities&search=e&language=en&continue=9999` takes between 10 
and 15 seconds to return the final result).

TASK DETAIL
  https://phabricator.wikimedia.org/T229460

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Smalyshev, Lucas_Werkmeister_WMDE, Aklapper, darthmon_wmde, DannyS712, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to