[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2018-03-07 Thread gerritbot
gerritbot added a comment.
Change 413899 merged by jenkins-bot:
[operations/mediawiki-config@master] Add configuration for CirrusSearch to instantly index new Wikidata items

https://gerrit.wikimedia.org/r/413899TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Avner, Lewizho99, Maathavan, Gehel, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2018-03-01 Thread gerritbot
gerritbot added a comment.
Change 413492 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Allow some wikis to instantly index newly created articles

https://gerrit.wikimedia.org/r/413492TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Avner, Lewizho99, Maathavan, Gehel, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2018-02-23 Thread gerritbot
gerritbot added a comment.
Change 413899 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/mediawiki-config@master] Add configuration for CirrusSearch to instantly index new Wikidata items

https://gerrit.wikimedia.org/r/413899TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Avner, Lewizho99, Maathavan, Gehel, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2018-02-22 Thread gerritbot
gerritbot added a comment.
Change 413492 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/CirrusSearch@master] Allow some wikis to instantly index newly created articles

https://gerrit.wikimedia.org/r/413492TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Avner, Gehel, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2018-01-02 Thread gerritbot
gerritbot added a comment.
Change 399466 merged by jenkins-bot:
[operations/mediawiki-config@master] Lower ElasticSearch index refresh interval for Wikidata to 5s

https://gerrit.wikimedia.org/r/399466TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Avner, Lewizho99, Maathavan, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2018-01-02 Thread EBernhardson
EBernhardson added a comment.
sending the basic info semi-synchronously (from DeferredUpdates, which will run in the same process as the edit but after closing the connection to the user) should be ok.  Actually generating a "basic" set instead of the full thing might be more difficult than necessary though, i would be tempted to add a called to Updater::updateFromTitle(...) and let it do the full thing. Since article creates should be relatively (compared to total edit rate) rare i don't think the extra computation expense out-weights the maintenance cost of keeping an extra bit of code to generate partial updates, including getting the labels from wikidata, without also calculating the rest of it.

I'm not sure exactly what hook that should be attached to though.It looks like we can perhaps hook EditPage::attemptSave:after when $status->value == EditPage::AS_SUCCESS_NEW_ARTICLE, although might need to play with it to see if it does as expected.TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: EBernhardsonCc: gerritbot, Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Avner, Lewizho99, Maathavan, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2018-01-02 Thread Smalyshev
Smalyshev added a comment.
We could maybe just sent basic info to ES when saving a new article, synchronously (not sure if it's a good idea, just putting it out there) and then let the jobs update it with full data. In the minus side, we'll get one extra document write which is then immediately overwritten. On the plus side, at least Qid and initial label are available near-instantly.TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Avner, Lewizho99, Maathavan, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2018-01-02 Thread EBernhardson
EBernhardson added a comment.
We may still need to look into the special-case of newly created pages being indexed from the web request, rather than being punted into the job queue. cirrusSearchLinksUpdatePrioritized, which performs the actual generation of a document and write to elasticsearch, looks to have a p99 that regularly varies from 30 to 60 seconds. This is on top of however long it takes for the refreshLinksPrioritized job which is another 20s - 2 minutes. For some fraction of requests, even when the queue is healthy, there will be a couple minutes between the edit being performed and the two necessary jobs making it through the job queue and turned into a write in elasticsearch.TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: EBernhardsonCc: gerritbot, Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Avner, Lewizho99, Maathavan, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2018-01-02 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-01-02T18:37:19Z]  T183053 update index.refresh_interval for wikidatawiki_{content,general} on eqiad to 5sTASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: gerritbot, Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Avner, Lewizho99, Maathavan, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-20 Thread Smalyshev
Smalyshev added a comment.
Looks to me that 5s is working fine. I'll add a config patch.TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Cpaulf30, Lahi, Gq86, Baloch007, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Avner, Lewizho99, Maathavan, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-20 Thread gerritbot
gerritbot added a comment.
Change 399466 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/mediawiki-config@master] Lower refresh interval for Wikidata to 5s

https://gerrit.wikimedia.org/r/399466TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Avner, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-20 Thread EBernhardson
EBernhardson added a comment.
Took some measurements of refresh rate averaged over 5 minutes pre and post-deployment.  Overall it's perhaps a 15% increase in refresh/minute across the cluster. Disk IO graphs don't show anything particularly interesting. There will certainly be more merge volume as well but elasticsearch should be able to bundle up the merges enough that these tiny merges are irrelelvant compared to the major merges that happen on many-GB segments.


refresh intervalcluster refresh/minindex refresh/min
30s128090
5s1514263

TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: EBernhardsonCc: Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Avner, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-20 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2017-12-20T20:31:14Z]  T183053 update elasticsearch settings for wikidatawiki_content on codfw to use: index.refresh_interval=5sTASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Stashbot, debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Avner, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-20 Thread dcausse
dcausse added a comment.
Same for me I'd be for trying to increase the refresh rate on wikidata_content.TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: dcausseCc: debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Avner, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-19 Thread Smalyshev
Smalyshev added a comment.
I agree that we should try to lower the refresh rate to 5s and see whether it works.TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Avner, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-19 Thread EBernhardson
EBernhardson added a comment.
It seems there are a couple options here, my thoughts:

Reduce the default refresh interval

By default we use a 30 second refresh interval for all wikis. This means that 30 seconds worth of updates get bundled together into a single update. Updates are not searchable until they have been refreshed. While it may not be particularly important on an individual wiki level, across 9k shards in the cluster this saves us considerable IO. We could potentially reduce the refresh rate only for wikidata to 5 seconds, or maybe even 1 second (the elasticsearch default). Trying to estimate the effect this has on the cluster is difficult, but my gut feeling is that a 5 second refresh for only 21 wikidatawiki_content shards it would probably go un-noticed.

It looks like our data collection around refresh is busted, graphite has intermittent data for some reason. Our indexing rate is fairly constant through the day though so i pulled some numbers directly for 2 minutes worth of activity (at ~23:20UTC)  and saw 19 refreshes/second (1157/minute) across the full eqiad cluster.  Worst case on increasing wikidatawiki_content with 21 shards from 30s to 5s would be from current 0.7/sec (42/minute) to 4.2/sec (252/minute) or an 18% increase. Likely not every shard is flushed on every opportunity.

Force refreshes from the cirrus codebase

Rather than increasing the default refresh rate, we could explicitly issue refreshes in the limited cases that we know it's important. Conceptually (i may have simply not thought about it enough) de-bouncing these to keep from issuing 100 refreshes in the same second seems non-trivial. We could certainly throttle the actions, but ensuring it happens after the throttle time runs out might not be so easy.

Best option?

In general I think i'm in favor of the less complicated, and likely more robust solution, adjusting the refresh rate for wikidatawiki_content index down to 5s. Based on the current rates i think this will be reasonable. We can test on the codfw cluster first which receives all the same updates as eqiad. If 5s isn't fast enough I would have to rethink the forced refreshes, as worst case of a 1s refresh would double the refresh rate across the cluster. That might also be acceptable but a little harder to guesstimate.TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: EBernhardsonCc: debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Avner, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-19 Thread Smalyshev
Smalyshev added a comment.
So item creation rate is about 85k per day, or very close to one per second. Bots seem to dominate that though, so for real users it will be lower. Also, some of those are probably tools like QuickStatements for which it also could be fine to have the regular delay - maybe only force sync for those that come from browser pages?

Inspecting RC page, I see about 2-3 new items every second now, from non-bot accounts (it varies, but that's a common case), some of them have script tags but most don't have any special tags or markers.TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, Avner, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-19 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.
The first graph on https://grafana.wikimedia.org/dashboard/db/wikidata-datamodel?refresh=30m=1 shows the number of new items created over time.
For the particular problem indeed bots could be taken out. The make up the biggest part of new page creations on Wikidata (see https://stats.wikimedia.org/v2/#/wikidata.org/contributing/new-pages if you split by editor type).TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lydia_PintscherCc: debt, jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, Avner, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-18 Thread dcausse
dcausse added a comment.
If a large majority of such usecases involve searching the entity id (QXXX) of the newly created item we can perform an additional db match to compensate the lag of the search index.
It's what we do for normal wikis, a db match is run in addition to the query sent to the search index.
If users search for the label or aliases of the newly created then this solution is pointless.TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: dcausseCc: jhsoby, Lydia_Pintscher, EBernhardson, dcausse, Aklapper, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, Avner, Gehel, FloNight, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs