[Wikidata-bugs] [Maniphest] T348877: Lexeme searches prefer forms over lemmas

2023-11-29 Thread EBernhardson
EBernhardson moved this task from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board. EBernhardson added a comment. The example in the ticket looks to work as expected now TASK DETAIL https://phabricator.wikimedia.org/T348877 WORKBOARD https

[Wikidata-bugs] [Maniphest] T348877: Lexeme searches prefer forms over lemmas

2023-11-14 Thread EBernhardson

[Wikidata-bugs] [Maniphest] T348877: Lexeme searches prefer forms over lemmas

2023-11-13 Thread EBernhardson
EBernhardson claimed this task. EBernhardson moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board. EBernhardson added a comment. The UI for adding statements is using wbsearchentities <https://www.wikidata.org/wiki/Special:ApiSandbox#act

[Wikidata-bugs] [Maniphest] T349519: Determine if IGUANA and TFT would fit our query analysis needs

2023-10-23 Thread EBernhardson
EBernhardson set the point value for this task to "8". TASK DETAIL https://phabricator.wikimedia.org/T349519 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, A

[Wikidata-bugs] [Maniphest] T349095: Migrate staging rdf-streaming-updater to flink operator

2023-10-23 Thread EBernhardson
EBernhardson changed the point value for this task from "8" to "13". TASK DETAIL https://phabricator.wikimedia.org/T349095 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking, EBernhardson Cc: pfischer, EBernhardson, dcauss

[Wikidata-bugs] [Maniphest] T349095: Migrate staging rdf-streaming-updater to flink operator

2023-10-23 Thread EBernhardson
EBernhardson set the point value for this task to "8". TASK DETAIL https://phabricator.wikimedia.org/T349095 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking, EBernhardson Cc: pfischer, EBernhardson, dcausse, BTullis, Aklap

[Wikidata-bugs] [Maniphest] T349519: Determine if IGUANA and TFT would fit our query analysis needs

2023-10-23 Thread EBernhardson
EBernhardson moved this task from Incoming to Current work on the Wikidata-Query-Service board. EBernhardson added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T349519 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL

[Wikidata-bugs] [Maniphest] T349512: Collect multiple sets of SPARQL queries

2023-10-23 Thread EBernhardson
EBernhardson moved this task from Incoming to Current work on the Wikidata-Query-Service board. EBernhardson added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T349512 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL

[Wikidata-bugs] [Maniphest] T349095: Migrate staging rdf-streaming-updater to flink operator

2023-10-23 Thread EBernhardson
EBernhardson moved this task from Incoming to Current work on the Wikidata-Query-Service board. EBernhardson added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T349095 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL

[Wikidata-bugs] [Maniphest] T347333: Tune process_sparql_query_hourly so that it does not get killed by yarn

2023-09-27 Thread EBernhardson
EBernhardson moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board. EBernhardson added a comment. Reran 2023-09-21T16:00:00, which was previously failing, with memory overhead unconfigured and with the new patch to repartition the input. This has

[Wikidata-bugs] [Maniphest] T347333: Tune process_sparql_query_hourly so that it does not get killed by yarn

2023-09-27 Thread EBernhardson
EBernhardson added a comment. 8g was still insufficient, one of the failed jobs passed but the other three still had trouble. Increasing to 12g made it work, but if 8g is already excessive 12g is only more of the same. Returning to the earlier idea of forcing the job to be split up more

[Wikidata-bugs] [Maniphest] T347333: Tune process_sparql_query_hourly so that it does not get killed by yarn

2023-09-26 Thread EBernhardson
EBernhardson added a comment. Unfortunately the above patch doesn't seem to have worked. Spark turned the input into three tasks. They were all assigned to the same executor, the first two finished and the third caused the container to die after another ~45s due to memory constraints. Spark

[Wikidata-bugs] [Maniphest] T346456: Improve concurrency limits configuration of the wdqs updater

2023-09-18 Thread EBernhardson
EBernhardson set the point value for this task to "3". TASK DETAIL https://phabricator.wikimedia.org/T346456 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, bking, Clement_Goubert, dcausse, Danny_Benja

[Wikidata-bugs] [Maniphest] T346456: Improve concurrency limits configuration of the wdqs updater

2023-09-18 Thread EBernhardson
EBernhardson added a project: Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T346456 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, bking, Clement_Goubert, dcausse, Danny_Benjafield_WMDE, Kappakayala

[Wikidata-bugs] [Maniphest] T344284: Rename usages of whitelist to allowlist in query service rdf repo

2023-09-12 Thread EBernhardson
EBernhardson moved this task from Needs review to To Be Deployed on the Discovery-Search (Current work) board. EBernhardson added a comment. This should be ready for deployment now. The rdf package will need to be built and then deployed with the config updates above iiuc. TASK DETAIL

[Wikidata-bugs] [Maniphest] T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf)

2023-08-28 Thread EBernhardson
EBernhardson added a comment. New dataset for 20230821 has updated permissions as expected. TASK DETAIL https://phabricator.wikimedia.org/T342416 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: dcausse, BTullis, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf)

2023-08-18 Thread EBernhardson
EBernhardson added a comment. In T342416#9101474 <https://phabricator.wikimedia.org/T342416#9101474>, @JAllemandou wrote: > In T342416#9091146 <https://phabricator.wikimedia.org/T342416#9091146>, @EBernhardson wrote: > >> Similarly we have other jobs that

[Wikidata-bugs] [Maniphest] T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf)

2023-08-17 Thread EBernhardson
EBernhardson moved this task from Needs review to To Be Deployed on the Discovery-Search (Current work) board. EBernhardson added a comment. Airflow instance has been updated. I manually changed the permissions of the existing files to 644 and dirs to 755 in `/wmf/data/discovery/wikidata/rdf

[Wikidata-bugs] [Maniphest] T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf)

2023-08-14 Thread EBernhardson
EBernhardson added a comment. It seems the CodeReviewBot doesn't update the ticket when changing the ticket in a patch on gitlab, the relevant patch is: ebernhardson opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/478 Make wikibase ttl

[Wikidata-bugs] [Maniphest] T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf)

2023-08-14 Thread EBernhardson
EBernhardson added a comment. I looked into these, the attached patch should fix it but it leaves an open question (@JAllemandou): The `core-site.xml`, along with puppet which writes it out, has the default umask of 027 since at least 2021, which prevents world readability. So why do

[Wikidata-bugs] [Maniphest] T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf)

2023-08-11 Thread EBernhardson
EBernhardson claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T342416 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: dcausse, BTullis, AndrewTavis_WMDE, Aklapper, JAllemandou, Danny_Benjafield_WMDE, Mohamed-Awnallah

[Wikidata-bugs] [Maniphest] T339347: qlever dblp endpoint for wikidata federated query nomination

2023-08-08 Thread EBernhardson
EBernhardson added a comment. In T339347#9078729 <https://phabricator.wikimedia.org/T339347#9078729>, @bking wrote: > @WolfgangFahl We've whitelisted the endpoints, but the query you linked above <https://w.wiki/6q2i> still does not work. Can you verify that is it work

[Wikidata-bugs] [Maniphest] T334470: Federated queries to Lingua Libre time out in the Commons query service

2023-05-18 Thread EBernhardson
EBernhardson moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board. EBernhardson added a comment. These queries look to be running as expected now. TASK DETAIL https://phabricator.wikimedia.org/T334470 WORKBOARD https

[Wikidata-bugs] [Maniphest] T335873: Special:Search broken on Beta Wikidata for entity namespaces

2023-05-15 Thread EBernhardson
EBernhardson moved this task from In Progress to Needs Reporting on the Discovery-Search (Current work) board. EBernhardson added a comment. reindex complete, looks to have resolved the issue as expected. TASK DETAIL https://phabricator.wikimedia.org/T335873 WORKBOARD https

[Wikidata-bugs] [Maniphest] T335873: Special:Search broken on Beta Wikidata for entity namespaces

2023-05-15 Thread EBernhardson
EBernhardson claimed this task. EBernhardson moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board. EBernhardson added a comment. > Search backend error during entity_full_text search for 'test' after 35: Parse error on Cannot search on fi

[Wikidata-bugs] [Maniphest] T334470: Federated queries to Lingua Libre time out in the Commons query service

2023-05-11 Thread EBernhardson
EBernhardson claimed this task. EBernhardson moved this task from Ready for Dev -- SRE/Ops to Needs review on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T334470 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T334823: Add https://opendata.aragon.es/sparql to the list of federated endpoints for WDQS and WCQS

2023-05-09 Thread EBernhardson
EBernhardson claimed this task. EBernhardson moved this task from Ready for Dev -- SRE/Ops to Needs review on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T334823 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T332314: Configure new WDQS servers in codfw (wdqs20[13-22])

2023-04-24 Thread EBernhardson
EBernhardson set the point value for this task to "5". EBernhardson moved this task from Incoming to Ready for Dev -- SRE/Ops on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T332314 WORKBOARD https://phabricator.wikimedia.org/project/

[Wikidata-bugs] [Maniphest] T332953: Migrate PipelineLib repos to GitLab

2023-04-24 Thread EBernhardson
EBernhardson moved this task from needs triage to Current work on the Discovery-Search board. EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T332953 WORKBOARD https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T328497: Remove unnecessary targets definitions

2023-03-30 Thread EBernhardson
EBernhardson removed a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T328497 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: KSiebert, WMDE-Fisch, StudiesWorld, Jdforrester-WMF, Aklapper

[Wikidata-bugs] [Maniphest] T321170: Wikidata query service does not allow mwapi queries to incubator.wikimedia.org

2022-11-17 Thread EBernhardson
EBernhardson moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board. EBernhardson added a comment. Example query seems to work: SELECT * WHERE { SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint

[Wikidata-bugs] [Maniphest] T321170: Wikidata query service does not allow mwapi queries to incubator.wikimedia.org

2022-11-14 Thread EBernhardson
EBernhardson claimed this task. EBernhardson moved this task from Ready for Dev -- SWE to Needs review on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T321170 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T317682: Make new Vector search navigate to search result URL when selecting search result using keyboard

2022-10-20 Thread EBernhardson
EBernhardson added a comment. Poking over the history and the related tests. There are tests in `tests/browser/SearchSatisfactionTests.php` that expect to log a -1 as the position when the user submits their own query and not something provided by the autocomplete. This seems to have been

[Wikidata-bugs] [Maniphest] T316236: Reload WCQS from dumps

2022-10-18 Thread EBernhardson
EBernhardson updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T316236 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking, EBernhardson Cc: bking, EBernhardson, HenkvD, Aklapper, dcausse, Jersione, Hellket777

[Wikidata-bugs] [Maniphest] T319136: Allow federated queries with the Eu Knowledge Graph

2022-10-13 Thread EBernhardson
EBernhardson moved this task from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board. EBernhardson added a comment. This has been deployed. If anything isn't working right please ping us here. TASK DETAIL https://phabricator.wikimedia.org/T319136 WORKBOARD

[Wikidata-bugs] [Maniphest] T319136: Allow federated queries with the Eu Knowledge Graph

2022-10-03 Thread EBernhardson
EBernhardson added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T319136 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: DD063520, Aklapper, Jersione, Hellket777, LisafBia6531

[Wikidata-bugs] [Maniphest] T317681: Make new Vector search navigate to item search results on Wikidata

2022-09-21 Thread EBernhardson
EBernhardson added a comment. I'm not sure why search results go back into the search engine to be redirected instead of going directly to the page. We return the full link in action=opensearch which is used in other contexts (browser go-bar, etc.). It has simply "always" bee

[Wikidata-bugs] [Maniphest] T316236: Reload WCQS from dumps

2022-09-19 Thread EBernhardson
EBernhardson added a comment. To move this forward one of our SRE's will need to run the following and let it go for a couple days. After that the sre.wdqs.data-transfer cookbook will need to be used. cookbook sre.wdqs.data-reload wcqs2001.codw.wmnet \ --task-id T316236

[Wikidata-bugs] [Maniphest] T316236: Reload WCQS from dumps

2022-09-19 Thread EBernhardson
EBernhardson added a comment. The reload that was started on wcqs2001 didn't quite go right. We need to drop the reload scripts from the rdf deploy repo and only use the cookbooks going forward. TASK DETAIL https://phabricator.wikimedia.org/T316236 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T317530: MediaInfo does seem to allow entities to share same statement IDs

2022-09-19 Thread EBernhardson
EBernhardson added a comment. The consumer has been updated to work, but the underlying RDF's should be fixed. Relaxing the consumer means we've disabled sanity checks and in the long term the database will take on inconsistencies. TASK DETAIL https://phabricator.wikimedia.org/T317530

[Wikidata-bugs] [Maniphest] T316236: Reload WCQS from dumps

2022-09-15 Thread EBernhardson
EBernhardson moved this task from Ready for Development to In Progress on the Discovery-Search (Current work) board. EBernhardson claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T316236 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T316236: Reload WCQS from dumps

2022-09-15 Thread EBernhardson
EBernhardson added a comment. Also stopped wcqs-updater.service on wcqs2001, and disabled puppet so it wont be restarted TASK DETAIL https://phabricator.wikimedia.org/T316236 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc

[Wikidata-bugs] [Maniphest] T316236: Reload WCQS from dumps

2022-09-15 Thread EBernhardson
EBernhardson added a comment. Started download/munge on wcqs2001 using the internal dumps.wikimedia.org, we can't use dumps.wikimedia.your.org as it's dumps are two weeks out of date. The dumps are dated 20220911 TASK DETAIL https://phabricator.wikimedia.org/T316236 EMAIL

[Wikidata-bugs] [Maniphest] T316236: Reload WCQS from dumps

2022-09-15 Thread EBernhardson
EBernhardson added a comment. Started looking into this, first problem is that dumps.wikimedia.your.org has changed their path layouts, a minor change to the data reload script will be necessary to pull from the correct paths and not 404. As long as we are revisiting this script though

[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-09-14 Thread EBernhardson
EBernhardson added a comment. data cleanup looks to now have run successfully TASK DETAIL https://phabricator.wikimedia.org/T303831 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: EBernhardson, dcausse, Gehel, JAllemandou

[Wikidata-bugs] [Maniphest] T307596: User documentation for authentication on WCQS

2022-09-12 Thread EBernhardson
EBernhardson added a comment. Proposed documentation: P34534 <https://phabricator.wikimedia.org/P34534> I'm intending to update the wiki page after WCQS deployment and re-verifying the updates work as expected. TASK DETAIL https://phabricator.wikimedia.org/T307596 EMAIL PREFE

[Wikidata-bugs] [Maniphest] T307596: User documentation for authentication on WCQS

2022-09-06 Thread EBernhardson
EBernhardson moved this task from Ready for Development to In Progress on the Discovery-Search (Current work) board. EBernhardson claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T307596 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T306899: WCQS 500 errors

2022-09-01 Thread EBernhardson
EBernhardson added a comment. Some quick testing makes this look successful. Using curl to perform a POST no longer 500's: curl 'https://commons-query.wikimedia.org/sparql' \ -XPOST \ -H 'cookie: wcqsOauth=; wcqsSession=' \ -d 'query=prefix%20schema:%20%3Chttp

[Wikidata-bugs] [Maniphest] T307596: User documentation for authentication on WCQS

2022-08-23 Thread EBernhardson
EBernhardson added a comment. I still can't see it worthwhile to document the existing workflow. It's so convoluted that I suspect anyone that's willing to follow it would simply monitor the connections in their web browsers development inspector and recreate what they see without any

[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-08-22 Thread EBernhardson
EBernhardson added a comment. @JAllemandou The one remaining piece of this ticket is cleaning up the historical data, per T303831#8081172 <https://phabricator.wikimedia.org/T303831#8081172>. Any suggestions on how we should manage droping old data from tables partitioned by a sn

[Wikidata-bugs] [Maniphest] T306899: WCQS 500 errors

2022-08-05 Thread EBernhardson
EBernhardson added a comment. I've tracked down one source of 500 errors, unclear if the original report here is for same thing. Reproduction: curl -XPOST https://commons-query.wikimedia.org/any-url-doesnt-matter -d 'foo=bar' Reason: This request includes a `Content-Length

[Wikidata-bugs] [Maniphest] T306899: WCQS 500 errors

2022-08-03 Thread EBernhardson
EBernhardson added a comment. Leaving the commons-query.wikimedia.org browser tab open for a few hours and re-running queries every 30-60 minutes or so reproduced a 500 after a few hours. Related js console errors. Timestamps are PDT. Unclear if the errors at 13:00 and 13:10 are directly

[Wikidata-bugs] [Maniphest] T306899: WCQS 500 errors

2022-08-03 Thread EBernhardson
EBernhardson added a comment. In T306899#8128904 <https://phabricator.wikimedia.org/T306899#8128904>, @Dominicbm wrote: > Experienced the same error today again, here is an exact timestamp (of the response): `Wed, 03 Aug 2022 17:15:19 GMT`. This lines up nicely with a mes

[Wikidata-bugs] [Maniphest] T307391: Enable CORS support for WCQS SPARQL endpoint access

2022-07-26 Thread EBernhardson
EBernhardson added a comment. https://commons-query.wikimedia.org/sparql returns CORS headers in the same way that https://query.wikidata.org/sparql does. What doesn't work is CORS during the authentication flow, and I'm not sure this is something we can change. I can setup

[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-25 Thread EBernhardson
EBernhardson removed a project: Patch-For-Review. EBernhardson added a comment. Double checked all linked patches, no patches remain for review. The work still to be done is to decide how to handle pruning data from the `snapshot=` partitioned tables TASK DETAIL https

[Wikidata-bugs] [Maniphest] T301336: EntitySchemas API Question

2022-07-18 Thread EBernhardson
EBernhardson removed a project: ApiFeatureUsage. EBernhardson added a comment. Removing ApiFeatureUsage, that project is specifically about recording information about requests made to api.php in mediawiki TASK DETAIL https://phabricator.wikimedia.org/T301336 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T304070: API Endpoint to search for Schemas

2022-07-18 Thread EBernhardson
EBernhardson removed a project: ApiFeatureUsage. EBernhardson added a comment. Removing ApiFeatureUsage, that project is specifically about usage of api.php in mediawiki TASK DETAIL https://phabricator.wikimedia.org/T304070 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-15 Thread EBernhardson
EBernhardson added a comment. There is actually one piece remaining, we typically use `refinery-drop-older-than` to prune our tables. That worked when we used `date=...` as the partitioning scheme, but it doesn't support `snapshot=...`. I t takes minimal work (I already have a working POC

[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-12 Thread EBernhardson
EBernhardson added a comment. All dags are now enabled and have completed at least one full execution of each dag. - Increased partition count on map_subgraph_queries to 2048, the largest shuffle is ~600GB and this gets the per-executor work down into the desired 256-512M range

[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-07 Thread EBernhardson
EBernhardson added a comment. Summary of what was done so far to deploy: - Tuned subgraph_mapping_weekly. Set spark parallelism to 4096, Increased memory to 24G (=6g per task) and reduced total executor count to keep total memory usage around 1TB. Changed `coalesce()` into `repartition

[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-07 Thread EBernhardson
EBernhardson added a comment. Stats on the final join building `topSubgraphTriples`. this is using 4096 partitions and repartition(). It works for now so probably not worth dealing with the skew, but these stats might be useful to compare against in the future if it starts failing

[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-07 Thread EBernhardson
EBernhardson added a comment. I tried a run with the three coalesce's in SubgraphMapper converted into repartitions. In this case instead of having 8 partitions where 7 finish and the 8th takes forever and then fails, now it has 200 partitions and 199 finish with the 200th taking forever

[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-07 Thread EBernhardson
EBernhardson added a comment. In T303831#8060472 <https://phabricator.wikimedia.org/T303831#8060472>, @AKhatun_WMF wrote: > In T303831#8058159 <https://phabricator.wikimedia.org/T303831#8058159>, @EBernhardson wrote: > >> the airflow patch is deployed but

[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis

2022-07-06 Thread EBernhardson
EBernhardson added a comment. the airflow patch is deployed but i only turned on *_init dags and subgraph_mapping_weekly today (ran out of time, will do rest tomorrow). subgraph_mapping_weekly failed the first time through. I updated executor memory from 8g to 12g but the second

[Wikidata-bugs] [Maniphest] T308741: Lexeme search results all have the current timestamp as last changed date

2022-06-06 Thread EBernhardson
EBernhardson moved this task from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board. EBernhardson added a comment. Link in report now correctly shows last edit timestamps. TASK DETAIL https://phabricator.wikimedia.org/T308741 WORKBOARD https

[Wikidata-bugs] [Maniphest] T306899: WCQS 500 errors

2022-06-01 Thread EBernhardson
EBernhardson added a comment. Lacking better ideas on how to align the errors with some request that causes the error I've started up `tcpdump` on all the wcqs instances. They will store up to 100 1GB files per instance before starting to overwrite the initial files. The overall goal here

[Wikidata-bugs] [Maniphest] T306899: WCQS 500 errors

2022-05-31 Thread EBernhardson
EBernhardson added a comment. Reviewed logs again looking for patterns. Not much, but at least logstash is now aggregating together logs from the various hosts. Can see that the `/oauth/check_auth java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 3/3

[Wikidata-bugs] [Maniphest] T308741: Lexeme search results all have the current timestamp as last changed date

2022-05-25 Thread EBernhardson
EBernhardson claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T308741 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Bugreporter, Michael, Fernandobacasegua34, Astuthiodit_1, 786, Suran38, Biggs657, karapayneWMDE

[Wikidata-bugs] [Maniphest] T308741: Lexeme search results all have the current timestamp as last changed date

2022-05-23 Thread EBernhardson
EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T308741 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Bugreporter, Michael

[Wikidata-bugs] [Maniphest] T308786: Track errors in the UI of commons-query.wikimedia.org

2022-05-19 Thread EBernhardson
EBernhardson created this task. EBernhardson added a project: Wikidata Query UI. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The UI used for the wiki commons query service currently collects no metrics, even though the UI has metric tracking built in. This looks

[Wikidata-bugs] [Maniphest] T306899: WCQS 500 errors

2022-05-17 Thread EBernhardson
EBernhardson claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T306899 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: EBernhardson, FRomeo_WMF, GFontenelle_WMF, Gehel, Fuzheado, Aklapper, Dominicbm

[Wikidata-bugs] [Maniphest] T306644: re-run wbsearchentities optimization process

2022-05-17 Thread EBernhardson
EBernhardson moved this task from Waiting to Needs review on the Discovery-Search (Current work) board. EBernhardson added a comment. Reports found in https://people.wikimedia.org/~ebernhardson/T306644/ Summary is that the tuning is either the same or slightly worse almost everywhere

[Wikidata-bugs] [Maniphest] T209859: Wikidata autocomplete (wbsearchentities) results with score <= 0

2022-05-16 Thread EBernhardson
EBernhardson added a comment. In T209859#7903772 <https://phabricator.wikimedia.org/T209859#7903772>, @Lucas_Werkmeister_WMDE wrote: > In T209859#7881777 <https://phabricator.wikimedia.org/T209859#7881777>, @gerritbot wrote: > >> Change 786267 *

[Wikidata-bugs] [Maniphest] T306899: WCQS 500 errors

2022-05-11 Thread EBernhardson
EBernhardson added a comment. Following the thread of something related to auth, I've found that the application server (jetty) which hosts the app has never properly had it's logging setup. Logs only come from the embedded applications, the application server itself ends up with bare

[Wikidata-bugs] [Maniphest] T306899: WCQS 500 errors

2022-05-09 Thread EBernhardson
EBernhardson added a comment. Not finding anything decisive yet, but will continue looking. It occured to me that if it's happening consistently for an individual user but not in general that it could somehow be related to their authentication cookie. If seems plausible clearing the auth

[Wikidata-bugs] [Maniphest] T306899: WCQS 500 errors

2022-05-09 Thread EBernhardson
EBernhardson added a comment. Usually the first stop for this kind of error would be reviewing the `ATS Backends <-> Origin Servers Overview` which suggest a low rate of 5xxs, typically 1-5% of requests fail. In a quick review of the last few 500 requests on one of the servers they we

[Wikidata-bugs] [Maniphest] T307586: wbsearchentities produces no results on 1.39.0-wmf.10

2022-05-04 Thread EBernhardson
EBernhardson added a comment. Patch should resolve the issue. In terms of testing I would estimate that only integration testing would reliably catch this type of problem. We have some of that in CirrusSearch itself but nothing I'm aware of for the specialized wikidata extension. TASK

[Wikidata-bugs] [Maniphest] T307586: wbsearchentities produces no results on 1.39.0-wmf.10

2022-05-04 Thread EBernhardson
EBernhardson added a comment. There is a variety of churn in Cirrus right now related to a version upgrade which likely caused this. Will look what is causing the breakage today. TASK DETAIL https://phabricator.wikimedia.org/T307586 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T305952: Update WDQS update lag SLO grafana page to new 95% SLO

2022-04-29 Thread EBernhardson
EBernhardson moved this task from Ready for Development to Needs Reporting on the Discovery-Search (Current work) board. EBernhardson added a comment. Updated graph on wdqs-wcqs-lag-slo dashboard to use 95 instead of 99 for the threshold value. TASK DETAIL https

[Wikidata-bugs] [Maniphest] T306644: re-run wbsearchentities optimization process

2022-04-29 Thread EBernhardson
EBernhardson added a comment. Ran the previous AB testing report to get a preliminary look at the data and ensure it's collecting as expected. Everything seems reasonable, the new tuning isn't clearly better but not clearly worse either and we only have a few hundred events. As stated

[Wikidata-bugs] [Maniphest] T306644: re-run wbsearchentities optimization process

2022-04-26 Thread EBernhardson
EBernhardson added a comment. Profiles are deployed, they can be enabled for testing in a single page with a magic query string like wikidataCompletionSearchClicksBucket=T306644-fr <https://www.wikidata.org/wiki/Q2?wikidataCompletionSearchClicksBucket=T306644-fr>. Next steps

[Wikidata-bugs] [Maniphest] T306644: re-run wbsearchentities optimization process

2022-04-26 Thread EBernhardson
EBernhardson claimed this task. EBernhardson added a comment. Few ideas for future exploration: - Lots of the weights in the tuning report claim to have minimal influence on the final output, look into why. Do we need to collect more negative samples in the training set

[Wikidata-bugs] [Maniphest] T306644: re-run wbsearchentities optimization process

2022-04-26 Thread EBernhardson
EBernhardson added a comment. Reports generated and published: https://people.wikimedia.org/~ebernhardson/wbsearchentities_202203 TASK DETAIL https://phabricator.wikimedia.org/T306644 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EJoseph

[Wikidata-bugs] [Maniphest] T306054: Upgrade deployment-wdqs01 host to Buster

2022-04-25 Thread EBernhardson
EBernhardson moved this task from Incoming to In Progress on the Discovery-Search (Current work) board. EBernhardson set the point value for this task to "1". TASK DETAIL https://phabricator.wikimedia.org/T306054 WORKBOARD https://phabricator.wikimedia.org/project/board/12

[Wikidata-bugs] [Maniphest] T305952: Update WDQS update lag SLO grafana page to new 95% SLO

2022-04-25 Thread EBernhardson
EBernhardson moved this task from Incoming to Ready for Development on the Discovery-Search (Current work) board. EBernhardson set the point value for this task to "1". TASK DETAIL https://phabricator.wikimedia.org/T305952 WORKBOARD https://phabricator.wikimedia.org/project/

[Wikidata-bugs] [Maniphest] T306156: New upstream release for jvmquake

2022-04-25 Thread EBernhardson
EBernhardson closed this task as "Resolved". EBernhardson claimed this task. EBernhardson added a comment. This is the already deployed version, pinged on first run of libup-bot for jvmquake TASK DETAIL https://phabricator.wikimedia.org/T306156 EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] T306644: re-run wbsearchentities optimization process

2022-04-21 Thread EBernhardson
EBernhardson created this task. EBernhardson added projects: Wikidata, Discovery-Search (Current work). TASK DESCRIPTION To support elasticsearch 7 the scoring equation for wbsearchentities needs some small shape changes. The weights we use in this search came from relforge_wbsearchentities

[Wikidata-bugs] [Maniphest] T304437: Allow federated queries with cellar endpoint of the Publication Office and European Commission

2022-04-15 Thread EBernhardson
EBernhardson moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board. EBernhardson added a comment. This should now be enabled TASK DETAIL https://phabricator.wikimedia.org/T304437 WORKBOARD https://phabricator.wikimedia.org/project/board/1227

[Wikidata-bugs] [Maniphest] T304437: Allow federated queries with cellar endpoint of the Publication Office and European Commission

2022-04-11 Thread EBernhardson
EBernhardson claimed this task. EBernhardson moved this task from Ready for Development to Needs review on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T304437 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T301650: WCQS "Application Connection Error" E009

2022-03-07 Thread EBernhardson
EBernhardson added a comment. I'm not convinced the patch here will fix anything, but the symptom reported has to do with re-using an old cached response. This is a simple enough change and semantically correct regardless of if it fixes this issue so will deploy it sometime this week. TASK

[Wikidata-bugs] [Maniphest] T280487: Redirect requests from wcqs-beta.wmflabs.org to the final URL for WCQS

2022-03-07 Thread EBernhardson
EBernhardson added a subtask: T303202: Redirect wcqs-beta.wmflabs.org to commons-query.wikimedia.org. TASK DETAIL https://phabricator.wikimedia.org/T280487 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: EBernhardson, WikiLucas00

[Wikidata-bugs] [Maniphest] T299062: Save stats from wcqs-beta

2022-03-01 Thread EBernhardson
EBernhardson moved this task from Waiting to Needs review on the Discovery-Search (Current work) board. EBernhardson added a comment. With wcqs-beta 1 shut down and redirected to beta 2 i suspect this is complete? Moving to needs review if someone knows what steps are still necessary. TASK

[Wikidata-bugs] [Maniphest] T301650: WCQS "Application Connection Error" E009

2022-02-28 Thread EBernhardson
EBernhardson added a comment. After reviewing mdn's CORS docs and stack overflow posts about redirect based auth combined with xmlhttprequest, I'm not finding a simple way to do this that avoids changing the application. I suspect we will need some sort of hook or support within

[Wikidata-bugs] [Maniphest] T301650: WCQS "Application Connection Error" E009

2022-02-28 Thread EBernhardson
EBernhardson added a comment. While trying a few different things I found one way to cause this to fail, although it's going the opposite way of this ticket, so not certain it's related. In particular 1. Open commons-query and run an example query 2. Open browser settings and delete

[Wikidata-bugs] [Maniphest] T293462: Add user blocking in WCQS

2022-02-28 Thread EBernhardson
EBernhardson added a comment. I manually applied the fixes in the latest patch, to pass cookies on to blazegraph, and my username came through into the request logs. Hoping this will be resovled once the above is merged. TASK DETAIL https://phabricator.wikimedia.org/T293462 EMAIL

[Wikidata-bugs] [Maniphest] T299222: Properly configure logback for W[CD]QS streaming updater

2022-02-14 Thread EBernhardson
EBernhardson removed a project: Patch-For-Review. EBernhardson added a comment. doesn't look like there are any more patches here, removing patch-for-review TASK DETAIL https://phabricator.wikimedia.org/T299222 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T282117: WCQS needs to be exposed through a wikimedia.org domain

2022-02-08 Thread EBernhardson
EBernhardson removed a project: Patch-For-Review. EBernhardson moved this task from Waiting to Needs Reporting on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T282117 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL

[Wikidata-bugs] [Maniphest] T279541: Add a reconciliation strategy to the wdqs streaming updater

2022-02-08 Thread EBernhardson
EBernhardson added a comment. Airflow DAG has been deployed. I have left it turned off for now, when ready someone will need to enable it (and potentially update the start_date). TASK DETAIL https://phabricator.wikimedia.org/T279541 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T299222: Properly configure logback for W[CD]QS streaming updater

2022-02-07 Thread EBernhardson
EBernhardson added a comment. Logs themselves have been flowing for a while now, since the patch merge on Jan 26. I put up one more cleanup pa tch, after that i believe this should be complete. We don't need to do a deploy for this patch, it can run with whatever the next deployment

[Wikidata-bugs] [Maniphest] T293862: Investigate using jvmquake to limit the time a JVM is unusable due to GC overhead

2022-02-04 Thread EBernhardson
EBernhardson updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T293862 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, dcausse, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana

  1   2   3   4   >