[Wikidata-bugs] [Maniphest] [Edited] T239414: Investigate how blank nodes are used and synced between wikibase and wdqs

2019-12-02 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T239414 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Smalyshev, Lucas_Werkmeister_WMDE, Igorkim78, dcausse, Aklapper, Hook696, Daryl-TTMG

[Wikidata-bugs] [Maniphest] [Created] T239687: Rework how value and reference changes are handled

2019-12-03 Thread dcausse
dcausse created this task. dcausse added projects: Wikidata-Query-Service, Discovery-Search (Current work). Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION The current workflow of the updater requires loading the triples

[Wikidata-bugs] [Maniphest] [Claimed] T239687: Rework how value and reference changes are handled

2019-12-03 Thread dcausse
dcausse claimed this task. dcausse triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T239687 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, DannyS712, Nan

[Wikidata-bugs] [Maniphest] [Edited] T239414: Investigate how blank nodes are used and synced between wikibase and wdqs

2019-12-03 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T239414 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Smalyshev, Lucas_Werkmeister_WMDE, Igorkim78, dcausse, Aklapper, darthmon_wmde, DannyS712

[Wikidata-bugs] [Maniphest] [Edited] T239414: Investigate how blank nodes are used and synced between wikibase and wdqs

2019-12-03 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T239414 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Smalyshev, Lucas_Werkmeister_WMDE, Igorkim78, dcausse, Aklapper, darthmon_wmde, DannyS712

[Wikidata-bugs] [Maniphest] [Created] T239750: org.wikidata.query.rdf.tool.Updater - Importer error: ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access

2019-12-03 Thread dcausse
dcausse created this task. dcausse added projects: Wikidata-Query-Service, Discovery-Search (Current work). Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION Seen on wdqs1004 after enabling async imports 20:29:03.200

[Wikidata-bugs] [Maniphest] [Triaged] T239750: org.wikidata.query.rdf.tool.Updater - Importer error: ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access

2019-12-03 Thread dcausse
dcausse triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T239750 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana,

[Wikidata-bugs] [Maniphest] [Commented On] T239687: Rework how value and reference changes are handled

2019-12-03 Thread dcausse
dcausse added a comment. References and values are identified by a hash computed over their properties. It is not a stable ID as it is always generated on the fly when extracting the entity data. The current RDF projection makes it a resource that is referenced from other triples. The

[Wikidata-bugs] [Maniphest] [Commented On] T239687: Rework how value and reference changes are handled

2019-12-04 Thread dcausse
dcausse added a comment. Some numbers extracted from a dump: - number of values: 20,659,551 - number of unique values: 11,028,526 - number of references: 60,078,314 - number of unique references: 58,876,057 So to the question: > is it worthwhile to dedup values&ref

[Wikidata-bugs] [Maniphest] [Closed] T101013: Log Wikidata Query Service queries to the event gate infrastructure

2019-12-05 Thread dcausse
dcausse moved this task from To Be Deployed to Done on the Discovery-Search (Current work) board. dcausse closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T101013 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] [Unblock] T234968: Measure performance impact of code optimization and/or blazegraph settings on real traffic data

2019-12-05 Thread dcausse
dcausse closed subtask T101013: Log Wikidata Query Service queries to the event gate infrastructure as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T234968 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: J

[Wikidata-bugs] [Maniphest] [Created] T239898: Investigate triple counts difference between dumps and what blazegraph reports

2019-12-05 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION Seen in munged dumps: - Nov 6 munged dump: 5 909 445 794 - Nov 15 dump (lexeme): 21 591 655

[Wikidata-bugs] [Maniphest] [Created] T239908: Extract more metrics from blazegraph sparql update response

2019-12-05 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION As of today when we run an update to blazegraph we only extract the total number of mutations

[Wikidata-bugs] [Maniphest] [Created] T239931: Reduce the impact of the sanitizer on wikidata

2019-12-05 Thread dcausse
dcausse created this task. dcausse added projects: Wikidata, CirrusSearch. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Discovery-Search. TASK DESCRIPTION The sanitizer seems to be a bit aggressive with wikidata causing significant load on the

[Wikidata-bugs] [Maniphest] [Edited] T239931: Reduce the impact of the sanitizer on wikidata

2019-12-05 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T239931 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Ladsgroup, Addshore, Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86

[Wikidata-bugs] [Maniphest] [Commented On] T239898: Investigate triple counts difference between dumps and what blazegraph reports

2019-12-06 Thread dcausse
dcausse added a comment. We should export the triples from a production journal to try to understand where are the differences. To do this we need to copy a journal and run some tools provided by blazegraph. The tool is ExportKB to run it we need all the jars present in the war (the

[Wikidata-bugs] [Maniphest] [Commented On] T177453: Add wikibase client support for searching wikidata items

2018-05-25 Thread dcausse
dcausse added a comment. Does it mean that we would make WikbaseClient dependent on CirrusSearch and create all necessary query builders into this client? Have we considered the possibility to run an actual API call to wbsearchentit...@wikidata.org? I have no clue if the current API output would

[Wikidata-bugs] [Maniphest] [Commented On] T182717: Move fine tuning of search configs to mediawiki-config

2018-06-27 Thread dcausse
dcausse added a comment. moving back to in progress as the second patch generated some warnings on test servers: [Wed Jun 27 13:49:51 2018] [hphp] [482:7f0a5afff700:37030:01] [] \nWarning: Invalid argument supplied for foreach() in /srv/mediawiki/php-1.32.0-wmf.8/extensions/CirrusSearch

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T88534: [Story] Implement EntitySearch service on top of Elastic

2015-10-14 Thread dcausse
dcausse added a subscriber: dcausse. TASK DETAIL https://phabricator.wikimedia.org/T88534 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Ricordisamoa, Addshore, Deskana, Manybubbles, Christopher, Wikidata-bugs, hoo, daniel

[Wikidata-bugs] [Maniphest] [Commented On] T115482: [Task] Enable GeoData extension on wikidata.org

2015-10-14 Thread dcausse
dcausse added a comment. Looks like the mapping will be changed (GD should use the CirrusSearchMappingConfig hook no?) so yes I'm afraid that we'll have to rebuild the index with the new mapping. TASK DETAIL https://phabricator.wikimedia.org/T115482 EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-12 Thread dcausse
dcausse added a subscriber: dcausse. dcausse added a comment. //First of all: sorry for all the low level details in this comment but it's always complex to tackle such relevance issues.// I assume that `life` is the query. Wikidata already uses `incoming_link` to boost the top-N results

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-12 Thread dcausse
dcausse added a comment. Yes if you have numeric properties that are ready to use, we might be able to use them soon. With https://gerrit.wikimedia.org/r/#/c/249460/ we should be able to write a custom rescore profile for wikidata and try to workaround the poor lucene scores we have today. PS

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-12 Thread dcausse
dcausse added a comment. Sorry... I was completely wrong when analyzing lucene explain for Q3 (it's a pain to debug scoring issues <https://www.wikidata.org/w/index.php?title=Special:Search&limit=10&offset=850&profile=default&search=life&cirrusDumpResult&cirru

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-18 Thread dcausse
dcausse added a comment. A big +1. As far as I know it should be pretty straightforward, you just need to implement 2 hooks (`CirrusSearchMappingConfig` and `CirrusSearchBuildDocumentParse`). The profiles (we may want to create multiple profiles with different weights for testing purpose) can

[Wikidata-bugs] [Maniphest] [Updated] T78157: [Story] Implement label prefix search based on Elastic (resp Cirrus, Lucene)

2015-12-02 Thread dcausse
dcausse edited blocking tasks, added: T120089: Add an internal completion or suggestions API to core SearchEngine; removed: T112028: Implement completion suggester as a Beta Feature. TASK DETAIL https://phabricator.wikimedia.org/T78157 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-03 Thread dcausse
dcausse added a comment. @aude I can help to write the rescore profiles when you are ready. Also I realized that the example profiles I wrote in Cirrus are wrong: they use "multiply" to combine the scores but it makes no sense : `(weight1 * score1) * (weight2 * score2)`. We might pre

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-03 Thread dcausse
dcausse added a comment. We can inhibit tf/idf by setting the weight of the main query to 0 and use either "max" or "add". Note that tf/idf will still play a role to extract the top-N results that will be rescored. N is 8196*7 (number of shards) so if shards are well bala

[Wikidata-bugs] [Maniphest] [Claimed] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-04 Thread dcausse
dcausse claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Sjoerddebruin, EBernhardson, aude, dcausse, Deskana, daniel, Mbch331, Aklapper, Lydia_Pintscher, Wikidata

[Wikidata-bugs] [Maniphest] [Changed Project Column] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-04 Thread dcausse
dcausse moved this task to In progress on the Discovery-Cirrus-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T110648 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To

[Wikidata-bugs] [Maniphest] [Changed Project Column] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2015-12-08 Thread dcausse
dcausse moved this task to Needs review on the Discovery-Cirrus-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T110648 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To

[Wikidata-bugs] [Maniphest] [Changed Project Column] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2015-12-17 Thread dcausse
dcausse moved this task to Needs review on the Discovery-Cirrus-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T110648 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2015-12-17 Thread dcausse
dcausse added a comment. moving back to needs-review as all patches needed in wikidata have been merged. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: gerritbot, Sjoerddebruin

[Wikidata-bugs] [Maniphest] [Commented On] T249099: [WDQS Streaming Updater] Error during munging process

2020-04-02 Thread dcausse
dcausse added a comment. Are we emitting exceptions when the HTTP status is not what we expect, e.g. 404? If yes this is worrisome and we definitely need to look into what entity and revision is producing such RDF. TASK DETAIL https://phabricator.wikimedia.org/T249099 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Updated] T249260: SUPPORT: wikibase update from 1.33 to 1.34 error message elastic search

2020-04-06 Thread dcausse
dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T249260 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore, dcausse Cc: dcausse, Aklapper, DD063520, CBogen, Samantha_Alipio_WMDE

[Wikidata-bugs] [Maniphest] [Updated] T249260: SUPPORT: wikibase update from 1.33 to 1.34 error message elastic search

2020-04-06 Thread dcausse
dcausse added a comment. @DD063520 did you make any modification to the LocalSettings? I tried a fresh install with a fix for T249496 <https://phabricator.wikimedia.org/T249496> applied and `updateSearchIndexConfig.php` worked appropriately. The error `Unknown Similarity type

[Wikidata-bugs] [Maniphest] [Updated] T249196: Test the impact of the wdqs updater performance by disabling values cleanup

2020-04-07 Thread dcausse
dcausse edited projects, added Wikidata-Query-Service; removed WDQS-Optimizer. Restricted Application added a project: Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T249196 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc

[Wikidata-bugs] [Maniphest] [Claimed] T249196: Test the impact of the wdqs updater performance by disabling values cleanup

2020-04-07 Thread dcausse
dcausse claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T249196 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, CBogen, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic

[Wikidata-bugs] [Maniphest] [Claimed] T248464: [WDQS Streaming Updater] Implement ouput format in Streaming Updater

2020-04-07 Thread dcausse
dcausse claimed this task. dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T248464 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Zbyszko, CBogen, darthmon_wmde, Nandana

[Wikidata-bugs] [Maniphest] [Changed Project Column] T249196: Test the impact of the wdqs updater performance by disabling values cleanup

2020-04-07 Thread dcausse
dcausse moved this task from In Progress to To Be Deployed on the Discovery-Search (Current work) board. dcausse added a comment. The lag on wdqs1007 has been absorbed much faster than other eqiad nodes. F31741086: lag_wdqs1007.png <https://phabricator.wikimedia.org/F31741

[Wikidata-bugs] [Maniphest] [Triaged] T248365: New keyword hasalias and inalias

2020-04-08 Thread dcausse
dcausse triaged this task as "Medium" priority. dcausse moved this task from needs triage to Wikidata Search on the Discovery-Search board. TASK DETAIL https://phabricator.wikimedia.org/T248365 WORKBOARD https://phabricator.wikimedia.org/project/board/1849/ EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] [Triaged] T248363: Haslabel treats aliases as labels

2020-04-08 Thread dcausse
dcausse triaged this task as "Medium" priority. dcausse moved this task from needs triage to Wikidata Search on the Discovery-Search board. dcausse added a comment. Aliases were put in the labels field for performance reasons, we need to investigated whether it's feasible o

[Wikidata-bugs] [Maniphest] [Created] T250140: icinga: WDQS high update lag should alert when the service times out

2020-04-14 Thread dcausse
dcausse created this task. dcausse added projects: Wikidata, Wikidata-Query-Service. TASK DESCRIPTION Currently when the check returns: `CHECK_NRPE STATE UNKNOWN: Socket timeout after 10 seconds. for Query Service HTTP Port and NaN for WDQS high update lag` we do not send an alert. Being is

[Wikidata-bugs] [Maniphest] [Declined] T239397: Wikibase RDF output does not link to the same blank node truthy statement values and the value from the reified statement

2020-04-15 Thread dcausse
dcausse closed this task as "Declined". TASK DETAIL https://phabricator.wikimedia.org/T239397 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, darthmon_wmde, Nandana, L

[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-16 Thread dcausse
dcausse added a comment. In T244341#6062795 <https://phabricator.wikimedia.org/T244341#6062795>, @Dipsacus_fullonum wrote: > Yes, `isLiteral` should still work for properties where the real values are literals. Without knowing the internal workings of Blazegraph I would guess

[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-17 Thread dcausse
dcausse added a comment. In T244341#6064237 <https://phabricator.wikimedia.org/T244341#6064237>, @Dipsacus_fullonum wrote: > Many queries use the optimizer hint `hint:Prior hint:rangeSafe true. ` when e.g. comparing date or number values with constants in a filter as suggested

[Wikidata-bugs] [Maniphest] [Commented On] T242453: Deadlock in blazegraph blocking all queries and updates

2020-04-20 Thread dcausse
dcausse added a comment. In T242453#6071767 <https://phabricator.wikimedia.org/T242453#6071767>, @Addshore wrote: > This just happened again, so depooled and restarted 1006, and switched traffic over to codfw. > Seems to always be 1006? I don't think so, it happe

[Wikidata-bugs] [Maniphest] [Updated] T243603: Create a way to deploy WDQS artifacts to Archiva with Jenkins

2020-04-20 Thread dcausse
dcausse removed a parent task: T244590: EPIC: Rework the WDQS updater as an event driven application. TASK DETAIL https://phabricator.wikimedia.org/T243603 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Zbyszko, dcausse Cc: Aklapper, Zbyszko

[Wikidata-bugs] [Maniphest] [Updated] T244590: EPIC: Rework the WDQS updater as an event driven application

2020-04-20 Thread dcausse
dcausse removed a subtask: T243603: Create a way to deploy WDQS artifacts to Archiva with Jenkins. TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: revi, Mholloway, Ladsgroup

[Wikidata-bugs] [Maniphest] [Updated] T228348: Category graph includes deleted categories

2020-04-20 Thread dcausse
dcausse closed this task as a duplicate of T246568: Deepcategory returns only very few results. TASK DETAIL https://phabricator.wikimedia.org/T228348 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Smalyshev, Mathew.onipe, Gehel

[Wikidata-bugs] [Maniphest] [Updated] T228348: Category graph includes deleted categories

2020-04-20 Thread dcausse
dcausse added a comment. merged in T246568 <https://phabricator.wikimedia.org/T246568> which is where we'll announce that the full reload has been done. TASK DETAIL https://phabricator.wikimedia.org/T228348 EMAIL PREFERENCES https://phabricator.wikimedia.org/set

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-04-21 Thread dcausse
dcausse added a comment. @ArielGlenn no not yet, this is still blocked on T243292 <https://phabricator.wikimedia.org/T243292> which requires some investigation to determine which component (dump or the wdqs transformation process) is wrong. TASK DETAIL https://phabricator.wikimed

[Wikidata-bugs] [Maniphest] [Claimed] T245728: Add a component to generate a diff between two entity revisions

2020-04-21 Thread dcausse
dcausse claimed this task. dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T245728 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Zbyszko, Aklapper, dcausse, CBogen

[Wikidata-bugs] [Maniphest] [Triaged] T245728: Add a component to generate a diff between two entity revisions

2020-04-21 Thread dcausse
dcausse triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T245728 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Zbyszko, Aklapper, dcausse, CBogen, darthmon_wmde, Nandana, L

[Wikidata-bugs] [Maniphest] [Triaged] T248464: [WDQS Streaming Updater] Implement ouput format in Streaming Updater

2020-04-21 Thread dcausse
dcausse triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T248464 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Zbyszko, Blissjay007, Oblanco79, Alter-paule, Beast1978, CBog

[Wikidata-bugs] [Maniphest] [Triaged] T250140: icinga: WDQS high update lag should alert when the service times out

2020-04-28 Thread dcausse
dcausse triaged this task as "High" priority. TASK DETAIL https://phabricator.wikimedia.org/T250140 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: William_Avery, Aklapper, Addshore, dcausse, darthmon_wmde, Nandana, L

[Wikidata-bugs] [Maniphest] [Updated] T250140: icinga: WDQS high update lag should alert when the service times out

2020-04-28 Thread dcausse
dcausse added a parent task: T251149: [epic] Ryan's onboarding to the Search Platform team. TASK DETAIL https://phabricator.wikimedia.org/T250140 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: William_Avery, Aklapper, Addshore, dc

[Wikidata-bugs] [Maniphest] [Triaged] T242453: Deadlock in blazegraph blocking all queries and updates

2020-04-28 Thread dcausse
dcausse triaged this task as "High" priority. dcausse added a comment. Raising to high, this issue might be hard to solve as it sounds related to the blazegraph design flaw of running with unbounded thread pools. We might perhaps at least try to add some debugging code to i

[Wikidata-bugs] [Maniphest] [Created] T251270: The streaming updater should produce its events to kafka

2020-04-28 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION AC: - the streaming updater should produce its events to kafka - the events should remain

[Wikidata-bugs] [Maniphest] [Updated] T251270: The streaming updater should produce its events to kafka

2020-04-28 Thread dcausse
dcausse added a parent task: T244590: EPIC: Rework the WDQS updater as an event driven application. TASK DETAIL https://phabricator.wikimedia.org/T251270 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde

[Wikidata-bugs] [Maniphest] [Triaged] T251270: The streaming updater should produce its events to kafka

2020-04-28 Thread dcausse
dcausse triaged this task as "Medium" priority. dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T251270 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcaus

[Wikidata-bugs] [Maniphest] [Updated] T244590: EPIC: Rework the WDQS updater as an event driven application

2020-04-28 Thread dcausse
dcausse added a subtask: T251270: The streaming updater should produce its events to kafka. TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: revi, Mholloway, Ladsgroup, Multichill

[Wikidata-bugs] [Maniphest] [Created] T251275: Add a new updater component to update blazegraph based on the content present in the streaming updater output kafka stream

2020-04-28 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION AC: - a component running close to blazegraph should read the content produced in T251270

[Wikidata-bugs] [Maniphest] [Updated] T244590: EPIC: Rework the WDQS updater as an event driven application

2020-04-28 Thread dcausse
dcausse added a subtask: T251275: Add a new updater component to update blazegraph based on the content present in the streaming updater output kafka stream. TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Updated] T251275: Add a new updater component to update blazegraph based on the content present in the streaming updater output kafka stream

2020-04-28 Thread dcausse
dcausse added a parent task: T244590: EPIC: Rework the WDQS updater as an event driven application. TASK DETAIL https://phabricator.wikimedia.org/T251275 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde

[Wikidata-bugs] [Maniphest] [Retitled] T251275: Update blazegraph based on the content present in the streaming updater output kafka stream

2020-04-28 Thread dcausse
dcausse renamed this task from "Add a new updater component to update blazegraph based on the content present in the streaming updater output kafka stream" to "Update blazegraph based on the content present in the streaming updater output kafka stream". dcausse updated

[Wikidata-bugs] [Maniphest] [Created] T251387: Missing sitelinks for some wikibase items

2020-04-29 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added subscribers: Strainu, Cosine02, revi, Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION Some sitelinks are missing for Q5084390. at the time of writing https

[Wikidata-bugs] [Maniphest] [Edited] T251387: Missing sitelinks for some wikibase items

2020-04-29 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T251387 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, revi, Cosine02, Strainu, dcausse, darthmon_wmde, Nandana, Lahi, Gq86

[Wikidata-bugs] [Maniphest] [Edited] T251387: Missing sitelinks for some wikibase items

2020-04-29 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T251387 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, revi, Cosine02, Strainu, dcausse, darthmon_wmde, Nandana, Lahi, Gq86

[Wikidata-bugs] [Maniphest] [Updated] T251387: Missing sitelinks for some wikibase items

2020-04-29 Thread dcausse
dcausse added a comment. @Peter_James thanks! The current update strategy assumes that entity <> sitelink pairs are unique and thus when a sitelink is removed it blindly assumes that it's not used elsewhere. Not doing so would require a much more costly update process that wo

[Wikidata-bugs] [Maniphest] [Updated] T251387: Missing sitelinks for some wikibase items

2020-04-30 Thread dcausse
dcausse added a comment. I think the best approach here is to wait for the cleanup in T249613 <https://phabricator.wikimedia.org/T249613> and its report then make sure that true duplicates are removed and then schedule a new full reload of all the servers. In the meantime ite

[Wikidata-bugs] [Maniphest] [Triaged] T251387: Missing sitelinks for some wikibase items

2020-04-30 Thread dcausse
dcausse triaged this task as "High" priority. TASK DETAIL https://phabricator.wikimedia.org/T251387 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Peter_James, Aklapper, revi, Cosine02, Strainu, dcausse, darthmon_wmde, Nan

[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints

2020-04-30 Thread dcausse
dcausse added a comment. @Multichill the discussion <https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_Service_and_search#Blank_node_deprecation_in_WDQS_&_Wikibase_RDF_model> seems to have stalled. Thanks to Peter the pros and cons has been well summari

[Wikidata-bugs] [Maniphest] [Retitled] T244341: Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS

2020-04-30 Thread dcausse
dcausse renamed this task from "Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints" to "Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS". dcausse updated the task description. TASK DETAIL https://phabricator.w

[Wikidata-bugs] [Maniphest] [Commented On] T244341: Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS

2020-04-30 Thread dcausse
dcausse added a comment. In T244341#6097277 <https://phabricator.wikimedia.org/T244341#6097277>, @Pfps wrote: > I don't understand why it was considered necessary to make a breaking change the RDF dump to improve WDQS performance when there is a solution that does not m

[Wikidata-bugs] [Maniphest] [Edited] T245541: Add a new munge option to do blank node skolemization

2020-04-30 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T245541 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Addshore, Aklapper, Lucas_Werkmeister_WMDE, mkroetzsch, Daniel_Mietchen, Jheald, dcausse

[Wikidata-bugs] [Maniphest] [Edited] T245541: Add a new munge option to do blank node skolemization

2020-04-30 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T245541 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Addshore, Aklapper, Lucas_Werkmeister_WMDE, mkroetzsch, Daniel_Mietchen, Jheald, dcausse

[Wikidata-bugs] [Maniphest] [Commented On] T249099: [WDQS Streaming Updater] Error during munging process

2020-05-05 Thread dcausse
dcausse added a comment. happened a couple of times on a test run: FailedOp(FullImport(Q93246620,2020-05-04T12:57:49Z,1173447691),org.wikidata.query.rdf.tool.exception.ContainedException: Didn't get a revision id for []) FailedOp(FullImport(Q12439094,2020-05-04T13:2

[Wikidata-bugs] [Maniphest] [Declined] T249097: [WDQS Streaming Updater] Fix pipeline checkpointing

2020-05-05 Thread dcausse
dcausse closed this task as "Declined". dcausse added a comment. checkpointing works as expected now TASK DETAIL https://phabricator.wikimedia.org/T249097 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklappe

[Wikidata-bugs] [Maniphest] [Unblock] T244590: EPIC: Rework the WDQS updater as an event driven application

2020-05-05 Thread dcausse
dcausse closed subtask T249097: [WDQS Streaming Updater] Fix pipeline checkpointing as "Declined". TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: revi, Mholloway,

[Wikidata-bugs] [Maniphest] [Claimed] T251275: [WDQS Streaming Updater] Update blazegraph based on the content present in the streaming updater output kafka stream

2020-05-05 Thread dcausse
dcausse claimed this task. dcausse triaged this task as "Medium" priority. dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T251275 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To:

[Wikidata-bugs] [Maniphest] [Commented On] T249260: SUPPORT: wikibase update from 1.33 to 1.34 error message elastic search

2020-05-06 Thread dcausse
dcausse added a comment. In T249260#6112298 <https://phabricator.wikimedia.org/T249260#6112298>, @DD063520 wrote: > Is there a way to reindex? If the index already exists perhaps forcing a reindex might help. For this you need to run: `php updateSearchIndexC

[Wikidata-bugs] [Maniphest] [Declined] T169798: Create UDFs for analyzing SPARQL queries

2020-05-06 Thread dcausse
dcausse added subscribers: JAllemandou, dcausse. dcausse closed this task as "Declined". dcausse added a comment. Closing this, @JAllemandou has done plenty of work on this already. TASK DETAIL https://phabricator.wikimedia.org/T169798 EMAIL PREFERENCES https://phabricator.wik

[Wikidata-bugs] [Maniphest] [Unblock] T143819: Data request for logs from SparQL interface at query.wikidata.org

2020-05-06 Thread dcausse
dcausse closed subtask T169798: Create UDFs for analyzing SPARQL queries as "Declined". TASK DETAIL https://phabricator.wikimedia.org/T143819 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Andrawaag, Esc3300, JAllemandou, mpop

[Wikidata-bugs] [Maniphest] [Commented On] T244341: Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS

2020-05-11 Thread dcausse
dcausse added a comment. In T244341#6113236 <https://phabricator.wikimedia.org/T244341#6113236>, @Lucas_Werkmeister_WMDE wrote: >> Is anyone proposing a change to Wikibase (or Wikidata)? > > Yes – the goal is that the RDF in the query service, the RDF dumps,

[Wikidata-bugs] [Maniphest] [Commented On] T244341: Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS

2020-05-11 Thread dcausse
dcausse added a comment. In T244341#6124638 <https://phabricator.wikimedia.org/T244341#6124638>, @Pfps wrote: > If 'unskolemizing' is a trivial step then that should be implemented by WDQS, instead of pushing it to every consumer (including indirect consumers) of W

[Wikidata-bugs] [Maniphest] [Commented On] T244341: Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS

2020-05-12 Thread dcausse
dcausse added a comment. In T244341#6124894 <https://phabricator.wikimedia.org/T244341#6124894>, @Pfps wrote: > I was completely unaware that WDQS is so integrated into the inner workings of Wikidata. Where is this described? Was this mentioned in the announcement of the

[Wikidata-bugs] [Maniphest] [Updated] T244341: Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS

2020-05-12 Thread dcausse
dcausse added a comment. In T244341#6129321 <https://phabricator.wikimedia.org/T244341#6129321>, @Pfps wrote: > I thus view it misleading to state in this Phabricator ticket that "performance issues [of the WDQS] cause edits on wikidata to be throttled", which gives

[Wikidata-bugs] [Maniphest] [Assigned] T243292: Fix the munger to support commons RDF dump

2020-05-14 Thread dcausse
dcausse assigned this task to Zbyszko. dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T243292 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Zbyszko, dcausse Cc: Mahir256, Physikerwelt

[Wikidata-bugs] [Maniphest] [Closed] T230754: WDQS labs role role::wdqs::labs fails when not finding /srv/wdqs

2020-05-19 Thread dcausse
dcausse assigned this task to EBernhardson. dcausse closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T230754 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson, dcausse Cc: Aklapper, Gehel, Smalysh

[Wikidata-bugs] [Maniphest] [Commented On] T251497: Adapt munging process for SDoC

2020-05-25 Thread dcausse
dcausse added a comment. The munger should exclude rdf:type statement by default: SELECT ?o { wd:M19705716 a ?o . } returns : schema:ImageObject schema:MediaObject wikibase:Mediainfo similar query on query.wikidata.org do not return such statements

[Wikidata-bugs] [Maniphest] [Created] T253753: Increase retention for mediawiki.revision-create on the kafka jumbo cluster

2020-05-27 Thread dcausse
dcausse created this task. dcausse added projects: Wikidata-Query-Service, Analytics. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION Generating the initial state of the wdqs streaming update requires parsing the TTL dumps

[Wikidata-bugs] [Maniphest] [Updated] T244590: EPIC: Rework the WDQS updater as an event driven application

2020-05-27 Thread dcausse
dcausse added a subtask: T253753: Increase retention for mediawiki.revision-create on the kafka jumbo cluster. TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: revi, Mholloway

[Wikidata-bugs] [Maniphest] [Updated] T253753: Increase retention for mediawiki.revision-create on the kafka jumbo cluster

2020-05-27 Thread dcausse
dcausse added a parent task: T244590: EPIC: Rework the WDQS updater as an event driven application. TASK DETAIL https://phabricator.wikimedia.org/T253753 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Ottomata, dcausse, Aklapper, CBogen

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-05-27 Thread dcausse
dcausse added a comment. @ArielGlenn we plan to make a subtle change to the dump (prefixes), this won't be technically a breaking change but could cause some confusion if users start to assume the presence of some prefixes. Would it be possible to pause the publication of the dumps whi

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-05-27 Thread dcausse
dcausse added a comment. Just a note on the current problem: the prefixes defined in ttl dumps are identical to the ones used by wikidata e.g.: @prefix wdt: <http://commons.wikimedia.org/prop/direct/> . This is perfectly valid but might cause some confusions because when

[Wikidata-bugs] [Maniphest] [Created] T253798: Commons RDF dump should use specific prefixes not the ones used by wikidata

2020-05-27 Thread dcausse
dcausse created this task. dcausse added projects: WikibaseMediaInfo, Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Currently the RDF output of commons is using the same set of prefixes as the ones used by wikidata. This is confusing as someone reading

[Wikidata-bugs] [Maniphest] [Updated] T221917: Create RDF dump of structured data on Commons

2020-05-27 Thread dcausse
dcausse added a subscriber: CBogen. dcausse added a comment. Looks like it was decided not to use wikidata specific prefixes for MediaInfo exports but uses a more specific `sdc` for these (see: T222995 <https://phabricator.wikimedia.org/T222995>). The code does still look to be har

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-05-27 Thread dcausse
dcausse added a comment. @WMDE-leszek oops, sorry I replied before reading you comment and was reading an old code base... if this is just a config change it can hopefully be merged soon. Thanks! TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Declined] T253798: Commons RDF dump should use specific prefixes not the ones used by wikidata

2020-05-27 Thread dcausse
dcausse closed this task as "Declined". dcausse added a comment. Wikibase has now a way to override the default namespaces, for commons it should happen thanks to https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/569260 . TASK DETAIL https://phabricator.wikimedia.o

[Wikidata-bugs] [Maniphest] [Commented On] T253753: Increase retention for mediawiki.revision-create on the kafka jumbo cluster

2020-05-28 Thread dcausse
dcausse added a comment. @JAllemandou I think that is an option as well, the thing is that is it is transitional to help to bootstrap a test of the full pipeline. In the end we won't be using jumbo and thus won't be able to rely on a 30days retention on main so hopefully we

<    1   2   3   4   5   6   7   8   9   10   >