[Wikidata-bugs] [Maniphest] T239931: Reduce the impact of the sanitizer on wikidata

2021-01-04 Thread EBernhardson
EBernhardson added a comment. With the holidays over and everyone back, i think we can turn this on? TASK DETAIL https://phabricator.wikimedia.org/T239931 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Gehel

[Wikidata-bugs] [Maniphest] T268864: WikibaseCirrusSearch uses Elastica's Match class

2020-11-30 Thread EBernhardson
EBernhardson triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T268864 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, Reedy, Wilmanbeno, CBogen, Akuckartz, DannyS712, Nan

[Wikidata-bugs] [Maniphest] T268865: WikibaseLexemeCirrusSearch uses Elastica's Match class

2020-11-30 Thread EBernhardson
EBernhardson triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T268865 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Reedy, Wilmanbeno, CBogen, Akuckartz, DannyS712, Nandana,

[Wikidata-bugs] [Maniphest] T266495: Create Debian Package for Flink

2020-10-26 Thread EBernhardson
EBernhardson added a comment. I don't know if it's relevant at all, but anlytics is in the process of switching to apache bigtop (from cloudera hadoop). That includes flink debs, would we want to use that? TASK DETAIL https://phabricator.wikimedia.org/T266495 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T239931: Reduce the impact of the sanitizer on wikidata

2020-10-26 Thread EBernhardson
EBernhardson added a comment. Saneitizer was turned back on last week, everything there is working well and wikidata can be reenabled any time. TASK DETAIL https://phabricator.wikimedia.org/T239931 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T265452: Add a configurable restart strategy to the streaming updater

2020-10-19 Thread EBernhardson
EBernhardson set the point value for this task to "3". TASK DETAIL https://phabricator.wikimedia.org/T265452 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, dcausse, CBogen, Akuckartz, darthmon_wmde, Nandana, N

[Wikidata-bugs] [Maniphest] T264659: Update BAG & BRT SPARQL endpoint in the whitelist

2020-10-19 Thread EBernhardson
EBernhardson set the point value for this task to "1". TASK DETAIL https://phabricator.wikimedia.org/T264659 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Gehel, Denengelse, RhinosF1, Aklapper, Multichill, CBogen,

[Wikidata-bugs] [Maniphest] T265452: Add a configurable restart strategy to the streaming updater

2020-10-19 Thread EBernhardson
EBernhardson triaged this task as "High" priority. TASK DETAIL https://phabricator.wikimedia.org/T265452 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, dcausse, CBogen, Akuckartz, darthmon_wmde, Nandana, Namenlos

[Wikidata-bugs] [Maniphest] T265452: Add a configurable restart strategy to the streaming updater

2020-10-19 Thread EBernhardson
EBernhardson moved this task from All WDQS-related tasks to Current work on the Wikidata-Query-Service board. EBernhardson added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T265452 WORKBOARD https://phabricator.wikimedia.org/project/board/891

[Wikidata-bugs] [Maniphest] T239931: Reduce the impact of the sanitizer on wikidata

2020-10-13 Thread EBernhardson
EBernhardson added a comment. All seems reasonable to me. Note that the saneitizer is globally turned off as of a week ago due to a separate incident. Expecting to turn that back on this week. TASK DETAIL https://phabricator.wikimedia.org/T239931 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T264566: Redirected entity still present in search results after 6 months

2020-10-05 Thread EBernhardson
EBernhardson added subscribers: dcausse, EBernhardson. EBernhardson added a comment. The process that fixes these was turned off at wikidata's request. Mostly that means if somehow a delete is missed it will never be fixed. @dcausse any idea if we can turn it back on? TASK DETAIL

[Wikidata-bugs] [Maniphest] T263125: Check for errors on wdqs1009 disks

2020-09-28 Thread EBernhardson
EBernhardson added a comment. A historical unix program, shipped with many distros, for writing patterns to a disk and reading them back to verify correctness: https://wiki.archlinux.org/index.php/Badblocks TASK DETAIL https://phabricator.wikimedia.org/T263125 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T260276: cirrussearch-backend-error: Called from Wikibase\Search\Elastic\EntitySearchElastic::getRankedSearchResults

2020-09-24 Thread EBernhardson
EBernhardson merged a task: T255658: PHP Warning: {type:error,message:cirrussearch-backend-error,params:[]} [Called from Wikibase\Search\Elastic\EntitySearchElastic::getRankedSearchResults in /srv/mediawiki/php-1.35.0-wmf.37/extensions/WikibaseCirrusSearch/src/EntitySearchElastic.php at line

[Wikidata-bugs] [Maniphest] T255658: PHP Warning: {"type":"error", "message":"cirrussearch-backend-error", "params":[]} [Called from Wikibase\Search\Elastic\EntitySearchElastic::getRankedSearchResults

2020-09-24 Thread EBernhardson
EBernhardson closed this task as a duplicate of T260276: cirrussearch-backend-error: Called from Wikibase\Search\Elastic\EntitySearchElastic::getRankedSearchResults. TASK DETAIL https://phabricator.wikimedia.org/T255658 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T253169: Clean up PSR12.Properties.ConstantVisibility.NotFound phpcs exclusion in WMF Prod

2020-09-21 Thread EBernhardson
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T253169 WORKBOARD https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T257938: License type filter for media search

2020-09-11 Thread EBernhardson
EBernhardson added a subscriber: dcausse. EBernhardson added a comment. I might suggest integrating license filtering a little closer to the backend, such that searching by license is something a user could plausibly type into the search box. The expansion to verbose terms would then happen

[Wikidata-bugs] [Maniphest] T260276: cirrussearch-backend-error: Called from Wikibase\Search\Elastic\EntitySearchElastic::getRankedSearchResults

2020-08-20 Thread EBernhardson
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T260276 WORKBOARD https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T260276: cirrussearch-backend-error: Called from Wikibase\Search\Elastic\EntitySearchElastic::getRankedSearchResults

2020-08-20 Thread EBernhardson
EBernhardson claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T260276 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: EBernhardson, Aklapper, thcipriani, Wilmanbeno, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz

[Wikidata-bugs] [Maniphest] T260276: cirrussearch-backend-error: Called from Wikibase\Search\Elastic\EntitySearchElastic::getRankedSearchResults

2020-08-20 Thread EBernhardson
EBernhardson added a comment. This bit of code includes the comment // FIXME: this is a hack, we need to return Status upstream instead The value that is being logged here was intended for end users, it provides an i18n message to tell the user what went wrong. In this case

[Wikidata-bugs] [Maniphest] T259637: Gather information about the volume of queries on WCQS

2020-08-06 Thread EBernhardson
EBernhardson added a comment. It's hard to get this into superset because it doesn't live in the production network, the only access it has to get data to superset would be through our public endpoints. Off the top of my head, the only thing i can think of is submitting events to the public

[Wikidata-bugs] [Maniphest] T258715: Score fails in 1.31 due to WikibaseCirrusSearch

2020-07-23 Thread EBernhardson
EBernhardson added a comment. WikibaseCirrusSearch didn't exist when 1.31 was released, it shouldn't be installed there. WBCS was first deployed to WMF in feb 2019, 1.31 was released in april 2018. TASK DETAIL https://phabricator.wikimedia.org/T258715 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T240328: Slow indexing of Lexemes for wbsearchentities

2020-07-21 Thread EBernhardson
EBernhardson added a comment. > The same doesn't happen with items - they can be used immediately Items used to have the same problem, looking back through the code history it looks like we added an 'instant index new' option to CirrusSearch, but that still wasn't suffici

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T251488: [epic] Create minimal SPARQL Endpoint for Commons on WMCS

2020-07-08 Thread EBernhardson
EBernhardson added a subscriber: Zbyszko. EBernhardson added a comment. @Zbyszko @dcausse I stood up the new significantly larger instance (120G mem, 16 cores) for sdoc today as `wcqs-beta-01.wikidata-query.eqiad.wmflabs`. As far as I can tell blazegraph came up correctly (on :

[Wikidata-bugs] [Maniphest] [Commented On] T257336: Request increased quota for wikidata-query Cloud VPS project

2020-07-08 Thread EBernhardson
EBernhardson added a comment. @Andrew for the moment we only need the one, so no rush on the others. I'm pretty sure there are testing plans for the others, but they aren't at the top of the stack currently. TASK DETAIL https://phabricator.wikimedia.org/T257336 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Edited] T257336: Request increased quota for Cloud VPS project

2020-07-07 Thread EBernhardson
EBernhardson updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T257336 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, RKemper, dcausse, EBernhardson, CBogen, Nintendofan885, Akuckartz

[Wikidata-bugs] [Maniphest] [Created] T257336: Request increased quota for Cloud VPS project

2020-07-07 Thread EBernhardson
EBernhardson created this task. EBernhardson added projects: Cloud-VPS (Quota-requests), Discovery, Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION Project Name: wikidata-query Type of quota

[Wikidata-bugs] [Maniphest] [Commented On] T254232: New clusters for SPARQL Endpoint for Commons

2020-07-07 Thread EBernhardson
EBernhardson added a comment. Talked with @dcausse about this on IRC today, based on the data we are now seeing[1] the estimates in here are too low. We are thinking that a 16G ganetti instance will not be sufficient for the growth we are seeing. I'm fairly suspicious of our "mi

[Wikidata-bugs] [Maniphest] [Commented On] T240328: Slow indexing for wbsearchentities

2020-07-06 Thread EBernhardson
EBernhardson added a comment. I think we've talked once or twice before about a time to update metric that can identify these issues. We have a document hinting process that informs the DataSender how to ship things, we could add an additional hint with timestamp on documents created

[Wikidata-bugs] [Maniphest] [Commented On] T221631: Dedicated servers on WMCS to test WDQS scalability strategy

2020-05-18 Thread EBernhardson
EBernhardson added a comment. Thanks! TASK DETAIL https://phabricator.wikimedia.org/T221631 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Andrew, EBernhardson Cc: EBernhardson, JHedden, Iamamz3, GreenReaper, RazShuty, Krenair, Smalyshev

[Wikidata-bugs] [Maniphest] [Changed Project Column] T237089: Create CQS puppet configs by applying query_service module

2020-05-14 Thread EBernhardson
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T237089 WORKBOARD https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T237089: Create CQS puppet configs by applying query_service module

2020-05-08 Thread EBernhardson
EBernhardson added a comment. Instructions for booting a new instance. Currently this requires pointing the instance at a puppetmaster in the wikidata-query project. 1. Start new instance in horizon. Use debian-stretch and m1.large 2. Set the puppetmaster <ht

[Wikidata-bugs] [Maniphest] [Commented On] T237089: Create CQS puppet configs by applying query_service module

2020-05-06 Thread EBernhardson
EBernhardson added a comment. Matt's initial work has gotten us most of the way there. In reviewing whats available now, and booting a test instance to see if it can fully setup a new instance from scratch (hint: no). Background info - The current sdcquery instance applies

[Wikidata-bugs] [Maniphest] [Commented On] T251489: Validate that we have enough resources on WMCS for a SPARQL Endpoint for Commons

2020-05-05 Thread EBernhardson
EBernhardson added a comment. Quota for wikidata-query wmcs project is currently 50GB ram / 17 cores. It currently has a commons query service instance running with 4 cores and 24GB of ram. The next instance size is 36GB ram and 8 cores, we are 2 GB shy of enough quota to boot

[Wikidata-bugs] [Maniphest] [Commented On] T221631: Dedicated servers on WMCS to test WDQS scalability strategy

2020-05-05 Thread EBernhardson
EBernhardson added a comment. Priorities have changed again, commons query service is back at the top of the stack. How are we in terms of a timeline making these available? Have we finished up the ceph testing? TASK DETAIL https://phabricator.wikimedia.org/T221631 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Changed Project Column] T246524: Current time displayed as modification date

2020-04-16 Thread EBernhardson
EBernhardson moved this task from To Be Deployed to Done on the Discovery-Search (Current work) board. EBernhardson added a comment. wmf.28 went to wikidata today, example query in description looks correct now. TASK DETAIL https://phabricator.wikimedia.org/T246524 WORKBOARD https

[Wikidata-bugs] [Maniphest] [Merged] T238686: Deepcat search returns incomplete results

2020-03-25 Thread EBernhardson
EBernhardson closed this task as a duplicate of T246568: Deepcategory returns only very few results. TASK DETAIL https://phabricator.wikimedia.org/T238686 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: SD0001

[Wikidata-bugs] [Maniphest] [Claimed] T246524: Current time displayed as modification date

2020-03-17 Thread EBernhardson
EBernhardson claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T246524 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: EBernhardson, Lydia_Pintscher, Aklapper, Bugreporter, Alter-paule, Beast1978, CBogen, Un1tY

[Wikidata-bugs] [Maniphest] [Declined] T247242: index monolingual text statements

2020-03-12 Thread EBernhardson
EBernhardson closed this task as "Declined". EBernhardson added a comment. haswbstatement is an exact match filter, it's poorly suited to any kind of text content. I'm declining this in favor of T240334 <https://phabricator.wikimedia.org/T240334> which is about making

[Wikidata-bugs] [Maniphest] [Commented On] T243693: Querying the URL datatype with haswbstatement

2020-03-05 Thread EBernhardson
EBernhardson added a comment. Forced a reindex on Q842858, it is now returned by the example query. As mentioned it will be two to three months before this has the urls indexed for all wikidata items. TASK DETAIL https://phabricator.wikimedia.org/T243693 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T243693: Querying the URL datatype with haswbstatement

2020-03-05 Thread EBernhardson
EBernhardson added a comment. For actually rolling this out, it will be between 2 and 3 months after the config change is shipped for the wikidata index to be fully populated with urls TASK DETAIL https://phabricator.wikimedia.org/T243693 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T243693: Querying the URL datatype with haswbstatement

2020-03-05 Thread EBernhardson
EBernhardson added a comment. For wikidata properties types they need to be explicitly whitelisted. I poked through our data and the field that currently holds this data has ~241M unique values. Adding the url's should only increase it by 1-2M, so probably fine. Or not entirely fine

[Wikidata-bugs] [Maniphest] [Changed Project Column] T243693: Querying the URL datatype with haswbstatement

2020-03-05 Thread EBernhardson
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T243693 WORKBOARD https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Triaged] T244485: Include WikibaseLexemeCirrusSearch into CirrusSearch tests

2020-03-03 Thread EBernhardson
EBernhardson triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T244485 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Addshore, Totolinototo3, darthmon_wmde, Redabr4, Zanziii, Sa

[Wikidata-bugs] [Maniphest] [Changed Project Column] T246524: Current time displayed as modification date

2020-03-03 Thread EBernhardson
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T246524 WORKBOARD https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Triaged] T246524: Current time displayed as modification date

2020-03-03 Thread EBernhardson
EBernhardson triaged this task as "Medium" priority. EBernhardson added a comment. Not sure what changed, but looking at the search query the timestamp isn't being requested. This likely results in a null being passed to some time handling function that interprets null as "now

[Wikidata-bugs] [Maniphest] [Updated] T239931: Reduce the impact of the sanitizer on wikidata

2019-12-10 Thread EBernhardson
EBernhardson added a comment. In T239931#5726538 <https://phabricator.wikimedia.org/T239931#5726538>, @Ladsgroup wrote: > Hey, Thanks for the comment. I'm planning to turn it back on ASAP, right now, we are at middle of the migration and it puts too much pressure on

[Wikidata-bugs] [Maniphest] [Retitled] T239004: Regex search return incorrect number of results when highlighter throws exception

2019-12-09 Thread EBernhardson
EBernhardson renamed this task from "Search return incorrect number of results" to "Regex search return incorrect number of results when highlighter throws exception". TASK DETAIL https://phabricator.wikimedia.org/T239004 EMAIL PREFERENCES https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] [Triaged] T239004: Regex search return incorrect number of results when highlighter throws exception

2019-12-09 Thread EBernhardson
EBernhardson triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T239004 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: EBernhardson, Aklapper, Bugreporter, darthmon_wmde, DannyS712, Nand

[Wikidata-bugs] [Maniphest] [Triaged] T238686: Deepcat search returns incomplete results

2019-12-09 Thread EBernhardson
EBernhardson triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T238686 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Lucas_Werkmeister_WMDE, EBernhardson, halfeatenscone, Aklapper, dar

[Wikidata-bugs] [Maniphest] [Updated] T239165: PHP Warning: count(): Parameter must be an array or an object that implements Countable in WikibaseMediaInfo

2019-12-09 Thread EBernhardson
EBernhardson removed a project: Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T239165 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Cparle, Aklapper, brennen, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Ramsey

[Wikidata-bugs] [Maniphest] [Triaged] T239284: Show number of senses in lexeme search results

2019-12-09 Thread EBernhardson
EBernhardson triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T239284 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Bugreporter, darthmon_wmde, DannyS712, Nandana, Mringgaard,

[Wikidata-bugs] [Maniphest] [Commented On] T239931: Reduce the impact of the sanitizer on wikidata

2019-12-09 Thread EBernhardson
EBernhardson added a comment. To be clear, we do not have the ability to reindex wikidata from scratch due to size, so turning off the sanity checker means no new fields can be added to wikidata. TASK DETAIL https://phabricator.wikimedia.org/T239931 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T239950: Consider rank in haswbstatement search

2019-12-09 Thread EBernhardson
EBernhardson added a comment. Sorry but i simply don't understand what this is. What is a preferred rank? normal? deprecated? Sorry, but I just don't know what this is TASK DETAIL https://phabricator.wikimedia.org/T239950 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T222321: Make /entity/ alias work for Commons

2019-12-03 Thread EBernhardson
EBernhardson added a comment. In summary, it seems we need to merge the patch[1] for the /entity/ endpoint, and this should be resolved? [1] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/526757/ TASK DETAIL https://phabricator.wikimedia.org/T222321 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T239002: Regex search in labels, descriptions and statements

2019-11-25 Thread EBernhardson
EBernhardson added a comment. labels and descriptions are covered with the existing `insource://` functionality, statements might be considered but wikidata already has indices that are almost too big to manage, and use more fields than elasticsearch supports. It's not likely we

[Wikidata-bugs] [Maniphest] [Triaged] T239003: New keyword for exact match of label/description

2019-11-25 Thread EBernhardson
EBernhardson triaged this task as "Normal" priority. TASK DETAIL https://phabricator.wikimedia.org/T239003 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, Bugreporter, darthmon_wmde, DannyS712, Nandana,

[Wikidata-bugs] [Maniphest] [Triaged] T239002: Regex search in labels, descriptions and statements

2019-11-25 Thread EBernhardson
EBernhardson triaged this task as "Low" priority. TASK DETAIL https://phabricator.wikimedia.org/T239002 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, Bugreporter, darthmon_wmde, DannyS712, Nandana,

[Wikidata-bugs] [Maniphest] [Commented On] T239004: Search return incorrect number of results

2019-11-25 Thread EBernhardson
EBernhardson added a comment. Looks like it's hitting this: https://gerrit.wikimedia.org/r/c/search/highlighter/+/435282/5/experimental-highlighter-lucene/src/main/java/org/wikimedia/highlighter/experimental/lucene/hit/AutomatonHitEnum.java#189 TASK DETAIL https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T239004: Search return incorrect number of results

2019-11-25 Thread EBernhardson
EBernhardson added a comment. Hmm, the request with only 222 items is interesting, will look at that TASK DETAIL https://phabricator.wikimedia.org/T239004 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: EBernhardson, Aklapper

[Wikidata-bugs] [Maniphest] [Commented On] T239004: Search return incorrect number of results

2019-11-25 Thread EBernhardson
EBernhardson added a comment. While it doesn't say timeout, the request spins for some time and eventually the backend gives up and reports a failure. As i said the error messages shown to users could be improved, but the request would still fail. TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T239004: Search return incorrect number of results

2019-11-25 Thread EBernhardson
EBernhardson added a comment. The request basically asks to run a regex against 76M titles and fails with timeouts. While the error messages could be improved, this is such a niche thing that I don't think it's particularly important. This happens to fail the timeout in a different way than

[Wikidata-bugs] [Maniphest] [Triaged] T238498: index date statements

2019-11-21 Thread EBernhardson
EBernhardson triaged this task as "Normal" priority. TASK DETAIL https://phabricator.wikimedia.org/T238498 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, Bugreporter, darthmon_wmde, DannyS712, Nandana,

[Wikidata-bugs] [Maniphest] [Triaged] T238362: Blazegraph write performance tuning

2019-11-21 Thread EBernhardson
EBernhardson triaged this task as "Normal" priority. TASK DETAIL https://phabricator.wikimedia.org/T238362 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Tarrow, dcausse, Igorkim78, Gehel, Aklapper, darthmon_wmde,

[Wikidata-bugs] [Maniphest] [Changed Project Column] T238362: Blazegraph write performance tuning

2019-11-21 Thread EBernhardson
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T238362 WORKBOARD https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T238686: Deepcat search returns incomplete results

2019-11-19 Thread EBernhardson
EBernhardson added a comment. The SPARQL query endpoint that provides the categories to search against doesn't appear to be returning all expected sub-categories.: ebernhardson@mwmaint1002:~$ curl -s -XPOST http://wdqs-internal.discovery.wmnet/bigdata/namespace/categories/sparql

[Wikidata-bugs] [Maniphest] [Triaged] T236992: Order Wikidata search result by number of statements/labels/sitelinks/identifiers

2019-10-31 Thread EBernhardson
EBernhardson triaged this task as "Normal" priority. TASK DETAIL https://phabricator.wikimedia.org/T236992 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, Bugreporter, darthmon_wmde, DannyS712, Nandana,

[Wikidata-bugs] [Maniphest] [Commented On] T235942: "somevalue" SDOC statements not visible in search index

2019-10-30 Thread EBernhardson
EBernhardson added a comment. In T235942#5609408 <https://phabricator.wikimedia.org/T235942#5609408>, @Multichill wrote: > In T235942#5604854 <https://phabricator.wikimedia.org/T235942#5604854>, @EBernhardson wrote: > >> It looks like the curren

[Wikidata-bugs] [Maniphest] [Commented On] T194144: Find a solution for SpecialEntitiesWithoutPage (EntitiesWithoutTermFinder)

2019-10-29 Thread EBernhardson
EBernhardson added a comment. Elastic already has filters for things such as "pages with labels in language x" and these can be negated. I'm not entirely sure, but i think the incoming links count is at least related to how often the item is used. Due to the way wikidata is

[Wikidata-bugs] [Maniphest] [Updated] T235942: "somevalue" SDOC statements not visible in search index

2019-10-24 Thread EBernhardson
EBernhardson added a comment. It looks like the current values for commonswiki are: searchIndexProperties: P180 <https://phabricator.wikimedia.org/P180> (depicts) searchIndexTypes: string, external-id, wikibase-item, wikibase-property, wikibase-lexeme, wikibase-form, wikibase

[Wikidata-bugs] [Maniphest] [Commented On] T235942: "somevalue" SDOC statements not visible in search index

2019-10-24 Thread EBernhardson
EBernhardson added a comment. This is an intentional limitation. There is a configuration sent to the builder: - searchIndexProperties: List of property IDs to index - searchIndexTypes: List of property types to index. Property of this type will be indexed regardless of $propertyIds

[Wikidata-bugs] [Maniphest] [Triaged] T235496: EPIC: Adapt "did you mean suggestions" using the phrase suggester for wikibase

2019-10-17 Thread EBernhardson
EBernhardson triaged this task as "Normal" priority. TASK DETAIL https://phabricator.wikimedia.org/T235496 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana,

[Wikidata-bugs] [Maniphest] [Updated] T230862: Create a way to filter only WB-related changes from Commons recentchanges

2019-09-19 Thread EBernhardson
EBernhardson removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T230862 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Cparle, EBernhardson Cc: EBernhardson, Cparle, dcausse, Anomie, daniel, Aklapper, Lydia_Pintscher

[Wikidata-bugs] [Maniphest] [Commented On] T230862: Create a way to filter only WB-related changes from Commons recentchanges

2019-09-19 Thread EBernhardson
EBernhardson added a comment. Indeed I've completely mixed the two, sorry for confusion! TASK DETAIL https://phabricator.wikimedia.org/T230862 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Cparle, EBernhardson Cc: EBernhardson, Cparle, dcausse

[Wikidata-bugs] [Maniphest] [Commented On] T230862: Create a way to filter only WB-related changes from Commons recentchanges

2019-09-19 Thread EBernhardson
EBernhardson added a comment. I started writing a patch for this, but got stuck trying to get mw vagrant back into working order. In this patch MediaInfo essentially always provides its fields to the NS_FILE namespace, but when no mediainfo is present it provides appropriate empty values

[Wikidata-bugs] [Maniphest] [Changed Project Column] T230175: Provide search functionality to find all files that have at least 1 structured data statement

2019-08-15 Thread EBernhardson
EBernhardson moved this task from watching / waiting to Current work on the Discovery-Search board. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T230175 WORKBOARD https

[Wikidata-bugs] [Maniphest] [Commented On] T229037: Phrase search should not match terms in different fields

2019-08-01 Thread EBernhardson
EBernhardson added a comment. Actually, it looks like wikidata has two separate staements on Q3007): - the use of social media by Donald Trump, 45th President of the United States - Donald Trump's use of social media... But when wikidata tells cirrussearch about

[Wikidata-bugs] [Maniphest] [Commented On] T229037: Phrase search should not match terms in different fields

2019-08-01 Thread EBernhardson
EBernhardson added a comment. For Trump travel ban (Q30949469) we have: english description: A travel ban by the 45th President of ***the United States, Donald*** J Trump", For Donald Trump on social media (Q3007) we have: text content: ...the use of social

[Wikidata-bugs] [Maniphest] [Commented On] T229027: search of related images on wikidata (for structured data on commons)

2019-07-30 Thread EBernhardson
EBernhardson added a comment. I suppose there is also the significant terms aggregation, it's similar to the aggregation above but this tries to take into account the frequency in the total document collection vs the frequency in the result set. Essentially this orders structured data

[Wikidata-bugs] [Maniphest] [Updated] T229027: search of related images on wikidata (for structured data on commons)

2019-07-30 Thread EBernhardson
EBernhardson added a comment. Script used to collect above results. Note this needs access to elasticsearch directly as cirrussearch does not yet support this query: P8829 <https://phabricator.wikimedia.org/P8829> TASK DETAIL https://phabricator.wikimedia.org/T229027 EMAIL PREFE

[Wikidata-bugs] [Maniphest] [Changed Project Column] T228290: Fatal on Watchlist: Call to a member function getAlphadecimal() on null

2019-07-18 Thread EBernhardson
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T228290 WORKBOARD https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Triaged] T162331: Provide tools for processing obfuscated Chinese geodata (GCJ-02, BD-09)

2019-07-11 Thread EBernhardson
EBernhardson triaged this task as "Low" priority. TASK DETAIL https://phabricator.wikimedia.org/T162331 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Esc3300, Nikki, PokestarFan, Pnorman, C933103, Aklapper, Arthur2e5,

[Wikidata-bugs] [Maniphest] [Changed Project Column] T222347: wbsearchentities now returns an error with type=lexeme

2019-05-02 Thread EBernhardson
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T222347 WORKBOARD https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T220823: Use ElasticSearch for bulk Wikidata entity term lookup

2019-04-12 Thread EBernhardson
EBernhardson added a comment. As a very rough comparison, i pulled `sum(irate(elasticsearch_indices_search_query_total[5m]))` from prometheus, which gives 5 min averages for total shard queries executed per second across the cluster as 5 minute averages. We vary between about 12k and 21k

[Wikidata-bugs] [Maniphest] [Updated] T55652: Special:Search doesn't use labels and descriptions for suggestions but just the item ID

2019-04-09 Thread EBernhardson
EBernhardson edited projects, added Discovery-Search; removed Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T55652 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: gerritbot, Smalyshev, Wikidata-bugs

[Wikidata-bugs] [Maniphest] [Claimed] T124196: Fatal "cannot perform this operation with arrays" from CirrusSearch/ElasticaWrite (using JobQueueDB)

2019-04-01 Thread EBernhardson
EBernhardson claimed this task. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T124196 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc

[Wikidata-bugs] [Maniphest] [Commented On] T124196: Fatal "cannot perform this operation with arrays" from CirrusSearch/ElasticaWrite (using JobQueueDB)

2019-04-01 Thread EBernhardson
EBernhardson added a comment. > Until this ability exists, the code either needs to be disabled (e.g. not deployed on Wikitech), or the code needs to handle this error and respond in some way. E.g. avoid queuing updates of this type or this size (possibly configurable), or run t

[Wikidata-bugs] [Maniphest] [Merged] T219455: wikidata.org search uses tons of namespaces by default

2019-03-27 Thread EBernhardson
EBernhardson merged a task: T219466: Cirrussearch default namespaces destroyed. EBernhardson added a subscriber: IKhitron. TASK DETAIL https://phabricator.wikimedia.org/T219455 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: IKhitron

[Wikidata-bugs] [Maniphest] [Retitled] T219455: AdvancedSearch extension changes behaviour of default namespaces to be search for anon and logged in users

2019-03-27 Thread EBernhardson
EBernhardson renamed this task from "wikidata.org search uses tons of namespaces by default" to "AdvancedSearch extension changes behaviour of default namespaces to be search for anon and logged in users". TASK DETAIL https://phabricator.wikimedia.org/T219455 EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T218954: Default to article search on commons + wikibase (aka SDC)

2019-03-22 Thread EBernhardson
EBernhardson added a comment. Turns out WBCS wasn't deployed to testcommonswiki yet. Will be deployed monday (ish) via https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/498442/ TASK DETAIL https://phabricator.wikimedia.org/T218954 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T218954: Default to article search on commons + wikibase (aka SDC)

2019-03-22 Thread EBernhardson
EBernhardson added a comment. verified with quick hack on mwdebug1001 that shutting off the hook brings standard image search back to test-commons.wikimedia.org. Will work up a patch for WikibaseCirrusSearch and duplicate to Wikibase. TASK DETAIL https://phabricator.wikimedia.org/T218954

[Wikidata-bugs] [Maniphest] [Updated] T194968: Enable search in all wikidata namespaces combined

2019-03-21 Thread EBernhardson
EBernhardson added a comment. Directly related to this is some documentation about technical details: https://www.mediawiki.org/wiki/Extension:CirrusSearch/Query_Construction use cases: https://www.mediawiki.org/wiki/Extension:CirrusSearch/Query_Construction/Use_cases

[Wikidata-bugs] [Maniphest] [Commented On] T218954: Default to article search on commons + wikibase (aka SDC)

2019-03-21 Thread EBernhardson
EBernhardson added a comment. From convo about way forward: 14:34 <+ebernhardson> SMalyshev: they basically want to deploy SDC without breaking commons. If that means they don't have entity search on day 1 that's ok 14:34 <+SMalyshev> w

[Wikidata-bugs] [Maniphest] [Claimed] T218954: Default to article search on commons + wikibase (aka SDC)

2019-03-21 Thread EBernhardson
EBernhardson claimed this task. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T218954 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc

[Wikidata-bugs] [Maniphest] [Commented On] T218954: Default to article search on commons + wikibase (aka SDC)

2019-03-21 Thread EBernhardson
EBernhardson added a comment. If necessary, we can completely disable entity search initially. In that case any SDC stuff that is surfaced would be through the copying that happens between the structured data and the `opening_text` field TASK DETAIL https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Edited] T218954: Default to article search on commons + wikibase (aka SDC)

2019-03-21 Thread EBernhardson
EBernhardson updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T218954 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Smalyshev, Lea_WMDE, Aklapper, Jdforrester-WMF, dcausse, Cparle, EBernhardson

[Wikidata-bugs] [Maniphest] [Created] T218954: Default to article search on commons + wikibase (aka SDC)

2019-03-21 Thread EBernhardson
EBernhardson created this task. EBernhardson added projects: Discovery-Search, Wikidata. TASK DESCRIPTION When searching on commons with SDC stuff enabled we get the warning: A warning has occurred while searching: Mixing entity and article namespaces in search is currently

[Wikidata-bugs] [Maniphest] [Triaged] T217768: The entity suggester should return properties

2019-03-07 Thread EBernhardson
EBernhardson triaged this task as "Normal" priority. TASK DETAIL https://phabricator.wikimedia.org/T217768 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: EBernhardson, WiseWoman, Rehman, abian, Pintoch, Aklapper, alaa_wmde

[Wikidata-bugs] [Maniphest] [Commented On] T217768: The entity suggester should return properties

2019-03-07 Thread EBernhardson
EBernhardson added a comment. Is this specifically about Pxx, or should a search for `country` also return P17? TASK DETAIL https://phabricator.wikimedia.org/T217768 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: EBernhardson

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T215870: Docker-compose failing at installation - ES patches need backporting & new image building

2019-02-28 Thread EBernhardson
EBernhardson added a subscriber: Smalyshev. EBernhardson added a comment. In T215870#4990654 <https://phabricator.wikimedia.org/T215870#4990654>, @Addshore wrote: > Okay, looks like we need to also investigate this one: > > Creating index...Ôž╝Custom Analyzer [extr

[Wikidata-bugs] [Maniphest] [Commented On] T215870: Docker-compose failing at installation - ES patches need backporting & new image building

2019-02-26 Thread EBernhardson
EBernhardson added a comment. Since it didn't look like the CI was going to be fixed anytime soon I've force merged the patch to REL1_32 which should unblock this. TASK DETAIL https://phabricator.wikimedia.org/T215870 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

  1   2   3   >