Lucas_Werkmeister_WMDE added a comment.
I looked a bit more into this, and it turns out that SearchEntities doesn’t support continuation all that well – basically, it asks the underlying search backend for //offset + limit + 1// results, then returns the //[offset, offset+limit)// slice of that. Clearly, this isn’t very efficient for larger and larger offsets, which is why the API won’t //return// offsets higher than the standard API limit (50) for continuation. However, it won’t stop you from specifying larger limits yourself, potentially asking the search backend for arbitrarily large numbers of results. Fortunately, this isn’t actually a denial-of-service vulnerability, because both search backends cap the limit: ElasticSearch to 10,000 (reference <https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-from-size>), `wb_terms` to 2500 (source <https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Wikibase/+/c8fbefa4fd9ce869ffd228ac1fe236c54fdb78bf/lib/includes/Interactors/TermSearchOptions.php#14>). (Depending on backend, you might get some more results than that if you search in a non-English language and don’t specify the `strictlanguage` parameter, in which case the search may be retried in fallback languages.) Also, in the `wb_terms`-backed search, that limit doesn’t directly correspond to the SQL `LIMIT` anyways, because `getTopMatchingTerms()` always searches for 2500 terms in the database and then sorts and limits them in PHP. ElasticSearch, meanwhile, supports a separate `offset` parameter, and we really should be using it instead of just adding our offset to the limit we pass into the search backend. We don’t //have// to fix this as part of this task, but if we don’t, then continuation isn’t really appealing: we can duplicate what `EntitySearch` does right now, and artificially abort the continuation at the standard API limit (50), in which case it’s no better than just specifying `limit=max` (in fact, if you’re a bot, that would get you //more// results, as your `max` is higher); or we can always offer continuation, and requests will start taking longer and longer (in production, `action=wbsearchentities&search=e&language=en&continue=9999` takes between 10 and 15 seconds to return the final result). TASK DETAIL https://phabricator.wikimedia.org/T229460 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucas_Werkmeister_WMDE Cc: Smalyshev, Lucas_Werkmeister_WMDE, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
