[MediaWiki-commits] [Gerrit] mediawiki...CirrusSearch[master]: Port update_general_api.feature to nodejs
jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/393694 ) Change subject: Port update_general_api.feature to nodejs .. Port update_general_api.feature to nodejs * Dropped chunkification of batch jobs as it may cause weird errors if a redirect and its target page are assigned to different chunks. * Adapted the waitForOperation to support multiple operations. This is hackish but I could not find a better solution. If mwbot had a a way to hook-up individual job completion perhaps waiting could be more straightforward. * Rework cirrusdoc api query to source page ids from archive and page table, whichever is more recent. This allows figuring out when deletes have made it into elasticsearch. * Rewrite all the 'within ...' clauses to direct query/check after waiting for the previous operation to go through * The only operation we can't directly wait for is the template update, which for unrelated reasons is broken on my MWV so commented out * The archive search is only exposed via browser, so its test uses the browser. As Special:Undelete requires special rights this meant rigging up a login method for the browser. * Adjust waitForOperation to take a revision id, and make step helpers editPage method pass the new revision id into waitForOperation. Without this an edit to page that already exists is not waited for and fails. * Adding a revision id for edits broke the frozen tests. Also add a way to skip waiting for the operation if we know the operation wont go through (because writes are frozen). * Implement step helpers movePage(). Waiting for the move to make it into cirrus required adding an additional check to the cirrusdoc query that the requested page matches the elastic page. This probably still has issues if a redirect points to the moved page, but we don't test that. * Moved all transformations into steps. This required normalizing all parameters that should support this to (.+), removing uses of (.*) as cucumber-js doesn't have a generic transformation step, only one on individual capture patterns. * Moved back to hooks some page creation (relevancy_api.feature) I believe it helps to wait for inc links to be properly computed. Change-Id: I99c0ef1e3453fedea5f3afbe29e5e8f9dd73d7e4 --- M includes/Api/QueryCirrusDoc.php M tests/integration/config/wdio.conf.js M tests/integration/features/frozen_index_api.feature M tests/integration/features/more_like_api.feature M tests/integration/features/relevancy_api.feature M tests/integration/features/step_definitions/page_step_helpers.js M tests/integration/features/step_definitions/page_steps.js M tests/integration/features/step_definitions/search_steps.js M tests/integration/features/support/hooks.js M tests/integration/features/support/pages/page.js A tests/integration/features/support/pages/special_undelete.js M tests/integration/features/support/world.js A tests/integration/features/update_general_api.feature 13 files changed, 515 insertions(+), 175 deletions(-) Approvals: EBernhardson: Looks good to me, approved jenkins-bot: Verified DCausse: Looks good to me, but someone else must approve diff --git a/includes/Api/QueryCirrusDoc.php b/includes/Api/QueryCirrusDoc.php index 27e2216..76ecd3e 100644 --- a/includes/Api/QueryCirrusDoc.php +++ b/includes/Api/QueryCirrusDoc.php @@ -3,7 +3,8 @@ namespace CirrusSearch\Api; use CirrusSearch\Searcher; -use CirrusSearch\Updater; +use CirrusSearch\SearchConfig; +use PageArchive; use Title; /** @@ -11,7 +12,13 @@ * * This was primarily written for the integration tests, but may be useful * elsewhere. This is functionally similar to web action=cirrusdump but - * available and discoverable over the API. + * available and discoverable over the API. Compared to cirrusdump this + * also takes pain to try and ensure if there is a related elastic document, + * even if its not in-sync with the sql database, we return it. Similarly + * if a document in elasticsearch should, but does not, match the requested + * page (perhaps a redirect has been created but not indexed yet) it will + * not be returned. In this way this tries to faithfully return the document + * in elasticsearch that represents the requested page. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -31,51 +38,141 @@ class QueryCirrusDoc extends \ApiQueryBase { use ApiTrait; + /** @var SearchConfig */ + private $config; + /** @var Searcher */ + private $searcher; + public function __construct( \ApiQuery $query, $moduleName ) { parent::__construct( $query, $moduleName, 'cd' ); } public function execute() { $conn = $this->getCirrusConnection(); - $config = $this->getSearchConfig(); - $updater
[MediaWiki-commits] [Gerrit] mediawiki...CirrusSearch[master]: Port update_general_api.feature to nodejs
EBernhardson has uploaded a new change for review. ( https://gerrit.wikimedia.org/r/393694 ) Change subject: Port update_general_api.feature to nodejs .. Port update_general_api.feature to nodejs * Rework cirrusdoc api query to source page ids from archive and page table, whichever is more recent. This allows figuring out when deletes have made it into elasticsearch. * Rewrite all the 'within ...' clauses to direct query/check after waiting for the previous operation to go through * The only operation we can't directly wait for is the template update, which for unrelated reasons is broken on my MWV so commented out * The archive search is only exposed via browser, so its test uses the browser. As Special:Undelete requires special rights this meant rigging up a login method for the browser. * Changed baseurl from dev.wiki -> cirrustest.wiki. This probably needs to be handled more generically though to support browser with multiple wikis. * Adjust waitForOperation to take a revision id, and make step helpers editPage method pass the new revision id into waitForOperation. Without this an edit to page that already exists is not waited for and fails. * Implement step helpers movePage(). Waiting for the move to make it into cirrus required adding an additional check to the cirrusdoc query that the requested page matches the elastic page. This probably still has issues if a redirect points to the moved page, but we don't test that. * Support %{epoch} transformation in steps. This required normalizing all parameters that should support this to (.+), removing uses of (.*) as cucumber-js doesn't have a generic transformation step, only one on individual capture patterns. Change-Id: I99c0ef1e3453fedea5f3afbe29e5e8f9dd73d7e4 --- M includes/Api/QueryCirrusDoc.php M tests/integration/config/wdio.conf.js M tests/integration/features/step_definitions/page_step_helpers.js M tests/integration/features/step_definitions/page_steps.js M tests/integration/features/support/hooks.js M tests/integration/features/support/pages/page.js A tests/integration/features/support/pages/special_undelete.js M tests/integration/features/support/world.js A tests/integration/features/update_general_api.feature 9 files changed, 373 insertions(+), 66 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch refs/changes/94/393694/1 diff --git a/includes/Api/QueryCirrusDoc.php b/includes/Api/QueryCirrusDoc.php index 27e2216..e03ee1c 100644 --- a/includes/Api/QueryCirrusDoc.php +++ b/includes/Api/QueryCirrusDoc.php @@ -3,7 +3,7 @@ namespace CirrusSearch\Api; use CirrusSearch\Searcher; -use CirrusSearch\Updater; +use PageArchive; use Title; /** @@ -31,51 +31,139 @@ class QueryCirrusDoc extends \ApiQueryBase { use ApiTrait; + private $config; + private $searcher; + public function __construct( \ApiQuery $query, $moduleName ) { parent::__construct( $query, $moduleName, 'cd' ); } public function execute() { $conn = $this->getCirrusConnection(); - $config = $this->getSearchConfig(); - $updater = new Updater( $conn, $config ); - $searcher = new Searcher( $conn, 0, 0, $config, [], $this->getUser() ); - $result = []; + $this->config = $this->getSearchConfig(); + $this->searcher = new Searcher( $conn, 0, 0, $this->config, [], $this->getUser() ); foreach ( $this->getPageSet()->getGoodTitles() as $origPageId => $title ) { - list( $page, $redirects ) = $updater->traceRedirects( $title ); - - $result = []; - if ( $page ) { - $docId = $config->makeId( $page->getId() ); - // could be optimized by implementing multi-get but not - // expecting much usage except debugging/tests. - $esSources = $searcher->get( [ $docId ], true ); - if ( $esSources->isOK() ) { - foreach ( $esSources->getValue() as $i => $esSource ) { - // If we have followed redirects only report the - // article dump if the redirect has been indexed. If it - // hasn't been indexed this document does not represent - // the original title. - if ( count( $redirects ) && - !$this->hasRedirect( $esSource->getData(), $title ) - ) { - continue; -