[MediaWiki-commits] [Gerrit] mediawiki...CirrusSearch[master]: Port update_general_api.feature to nodejs

2017-12-08 Thread jenkins-bot (Code Review)
jenkins-bot has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/393694 )

Change subject: Port update_general_api.feature to nodejs
..


Port update_general_api.feature to nodejs

* Dropped chunkification of batch jobs as it may cause weird errors
 if a redirect and its target page are assigned to different chunks.
* Adapted the waitForOperation to support multiple operations. This
 is hackish but I could not find a better solution. If mwbot had a
 a way to hook-up individual job completion perhaps waiting could
 be more straightforward.
* Rework cirrusdoc api query to source page ids from archive and
 page table, whichever is more recent. This allows figuring out when
 deletes have made it into elasticsearch.
* Rewrite all the 'within ...' clauses to direct query/check after
 waiting for the previous operation to go through
* The only operation we can't directly wait for is the template
 update, which for unrelated reasons is broken on my MWV so commented
 out
* The archive search is only exposed via browser, so its test
 uses the browser. As Special:Undelete requires special rights this
 meant rigging up a login method for the browser.
* Adjust waitForOperation to take a revision id, and make step
 helpers editPage method pass the new revision id into waitForOperation.
 Without this an edit to page that already exists is not waited for and
 fails.
* Adding a revision id for edits broke the frozen tests. Also add a way
 to skip waiting for the operation if we know the operation wont go
 through (because writes are frozen).
* Implement step helpers movePage(). Waiting for the move to make it
 into cirrus required adding an additional check to the cirrusdoc query
 that the requested page matches the elastic page. This probably still
 has issues if a redirect points to the moved page, but we don't test
 that.
* Moved all transformations into steps. This required normalizing
 all parameters that should support this to (.+), removing uses of (.*)
 as cucumber-js doesn't have a generic transformation step, only one on
 individual capture patterns.
* Moved back to hooks some page creation (relevancy_api.feature) I believe
 it helps to wait for inc links to be properly computed.

Change-Id: I99c0ef1e3453fedea5f3afbe29e5e8f9dd73d7e4
---
M includes/Api/QueryCirrusDoc.php
M tests/integration/config/wdio.conf.js
M tests/integration/features/frozen_index_api.feature
M tests/integration/features/more_like_api.feature
M tests/integration/features/relevancy_api.feature
M tests/integration/features/step_definitions/page_step_helpers.js
M tests/integration/features/step_definitions/page_steps.js
M tests/integration/features/step_definitions/search_steps.js
M tests/integration/features/support/hooks.js
M tests/integration/features/support/pages/page.js
A tests/integration/features/support/pages/special_undelete.js
M tests/integration/features/support/world.js
A tests/integration/features/update_general_api.feature
13 files changed, 515 insertions(+), 175 deletions(-)

Approvals:
  EBernhardson: Looks good to me, approved
  jenkins-bot: Verified
  DCausse: Looks good to me, but someone else must approve



diff --git a/includes/Api/QueryCirrusDoc.php b/includes/Api/QueryCirrusDoc.php
index 27e2216..76ecd3e 100644
--- a/includes/Api/QueryCirrusDoc.php
+++ b/includes/Api/QueryCirrusDoc.php
@@ -3,7 +3,8 @@
 namespace CirrusSearch\Api;
 
 use CirrusSearch\Searcher;
-use CirrusSearch\Updater;
+use CirrusSearch\SearchConfig;
+use PageArchive;
 use Title;
 
 /**
@@ -11,7 +12,13 @@
  *
  * This was primarily written for the integration tests, but may be useful
  * elsewhere. This is functionally similar to web action=cirrusdump but
- * available and discoverable over the API.
+ * available and discoverable over the API. Compared to cirrusdump this
+ * also takes pain to try and ensure if there is a related elastic document,
+ * even if its not in-sync with the sql database, we return it. Similarly
+ * if a document in elasticsearch should, but does not, match the requested
+ * page (perhaps a redirect has been created but not indexed yet) it will
+ * not be returned. In this way this tries to faithfully return the document
+ * in elasticsearch that represents the requested page.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -31,51 +38,141 @@
 class QueryCirrusDoc extends \ApiQueryBase {
use ApiTrait;
 
+   /** @var SearchConfig */
+   private $config;
+   /** @var Searcher */
+   private $searcher;
+
public function __construct( \ApiQuery $query, $moduleName ) {
parent::__construct( $query, $moduleName, 'cd' );
}
 
public function execute() {
$conn = $this->getCirrusConnection();
-   $config = $this->getSearchConfig();
-   $updater 

[MediaWiki-commits] [Gerrit] mediawiki...CirrusSearch[master]: Port update_general_api.feature to nodejs

2017-11-27 Thread EBernhardson (Code Review)
EBernhardson has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/393694 )

Change subject: Port update_general_api.feature to nodejs
..

Port update_general_api.feature to nodejs

* Rework cirrusdoc api query to source page ids from archive and
 page table, whichever is more recent. This allows figuring out when
 deletes have made it into elasticsearch.
* Rewrite all the 'within ...' clauses to direct query/check after
 waiting for the previous operation to go through
* The only operation we can't directly wait for is the template
 update, which for unrelated reasons is broken on my MWV so commented
 out
* The archive search is only exposed via browser, so its test
 uses the browser. As Special:Undelete requires special rights this
 meant rigging up a login method for the browser.
* Changed baseurl from dev.wiki -> cirrustest.wiki. This probably
 needs to be handled more generically though to support browser with
 multiple wikis.
* Adjust waitForOperation to take a revision id, and make step
 helpers editPage method pass the new revision id into waitForOperation.
 Without this an edit to page that already exists is not waited for and
 fails.
* Implement step helpers movePage(). Waiting for the move to make it
 into cirrus required adding an additional check to the cirrusdoc query
 that the requested page matches the elastic page. This probably still
 has issues if a redirect points to the moved page, but we don't test
 that.
* Support %{epoch} transformation in steps. This required normalizing
 all parameters that should support this to (.+), removing uses of (.*)
 as cucumber-js doesn't have a generic transformation step, only one on
 individual capture patterns.

Change-Id: I99c0ef1e3453fedea5f3afbe29e5e8f9dd73d7e4
---
M includes/Api/QueryCirrusDoc.php
M tests/integration/config/wdio.conf.js
M tests/integration/features/step_definitions/page_step_helpers.js
M tests/integration/features/step_definitions/page_steps.js
M tests/integration/features/support/hooks.js
M tests/integration/features/support/pages/page.js
A tests/integration/features/support/pages/special_undelete.js
M tests/integration/features/support/world.js
A tests/integration/features/update_general_api.feature
9 files changed, 373 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch 
refs/changes/94/393694/1

diff --git a/includes/Api/QueryCirrusDoc.php b/includes/Api/QueryCirrusDoc.php
index 27e2216..e03ee1c 100644
--- a/includes/Api/QueryCirrusDoc.php
+++ b/includes/Api/QueryCirrusDoc.php
@@ -3,7 +3,7 @@
 namespace CirrusSearch\Api;
 
 use CirrusSearch\Searcher;
-use CirrusSearch\Updater;
+use PageArchive;
 use Title;
 
 /**
@@ -31,51 +31,139 @@
 class QueryCirrusDoc extends \ApiQueryBase {
use ApiTrait;
 
+   private $config;
+   private $searcher;
+
public function __construct( \ApiQuery $query, $moduleName ) {
parent::__construct( $query, $moduleName, 'cd' );
}
 
public function execute() {
$conn = $this->getCirrusConnection();
-   $config = $this->getSearchConfig();
-   $updater = new Updater( $conn, $config );
-   $searcher = new Searcher( $conn, 0, 0, $config, [], 
$this->getUser() );
-   $result = [];
+   $this->config = $this->getSearchConfig();
+   $this->searcher = new Searcher( $conn, 0, 0, $this->config, [], 
$this->getUser() );
foreach ( $this->getPageSet()->getGoodTitles() as $origPageId 
=> $title ) {
-   list( $page, $redirects ) = $updater->traceRedirects( 
$title );
-
-   $result = [];
-   if ( $page ) {
-   $docId = $config->makeId( $page->getId() );
-   // could be optimized by implementing multi-get 
but not
-   // expecting much usage except debugging/tests.
-   $esSources = $searcher->get( [ $docId ], true );
-   if ( $esSources->isOK() ) {
-   foreach ( $esSources->getValue() as $i 
=> $esSource ) {
-   // If we have followed 
redirects only report the
-   // article dump if the redirect 
has been indexed. If it
-   // hasn't been indexed this 
document does not represent
-   // the original title.
-   if ( count( $redirects ) &&
-   !$this->hasRedirect( 
$esSource->getData(), $title )
-   ) {
-   continue;
-