[Wikidata-bugs] [Maniphest] [Lowered Priority] T219364: Wikidata search lagging behind
dcausse lowered the priority of this task from "Unbreak Now!" to "High". dcausse edited projects, added Discovery-Search (Current work), CirrusSearch, Operations; removed Discovery. dcausse added a comment. Restricted Application edited projects, added Discovery-Search; removed Discovery-Search (Current work). The backlog of updates is being processed, once we catch up on these updates we will run a maint script to reindex lost updates. Lowering to High as the immediate actions were taken, it now may take few days to fully sync the index and the database for the affected wikis. TASK DETAIL https://phabricator.wikimedia.org/T219364 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, Smalyshev, Lea_Lacroix_WMDE, Gehel, dcausse, TerraCodes, Liuxinyu970226, Aklapper, Addshore, alaa_wmde, Legado_Shulgin, Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, LawExplorer, Zppix, _jensen, rosalieper, Wong128hk, Wikidata-bugs, aude, jayvdb, faidon, Mbch331, Jay8g, fgiunchedi, jeremyb, ET4Eva, Darkminds3113, Avner, FloNight ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Retitled] T219364: Elasticsearch indices went read-only causing huge lag
dcausse renamed this task from "Wikidata search lagging behind" to "Elasticsearch indices went read-only causing huge lag". dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T219364 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, Smalyshev, Lea_Lacroix_WMDE, Gehel, dcausse, TerraCodes, Liuxinyu970226, Aklapper, Addshore, alaa_wmde, Legado_Shulgin, Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, LawExplorer, Zppix, _jensen, rosalieper, Wong128hk, Wikidata-bugs, aude, jayvdb, faidon, Mbch331, Jay8g, fgiunchedi, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T219364: Elasticsearch indices went read-only causing huge lag
dcausse edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T219364 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, Smalyshev, Lea_Lacroix_WMDE, Gehel, dcausse, TerraCodes, Liuxinyu970226, Aklapper, Addshore, alaa_wmde, Legado_Shulgin, Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, LawExplorer, Zppix, _jensen, rosalieper, Wong128hk, Wikidata-bugs, aude, jayvdb, faidon, Mbch331, Jay8g, fgiunchedi, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T219364: Elasticsearch indices went read-only causing huge lag
dcausse moved this task from in progress to Done on the Discovery-Search (Current work) board. dcausse added a comment. Backlog of updates is now completely absorbed, a script has been run to catchup lost updates, nothing we can do at this point except waiting for the maint script to stop moving to done. TASK DETAIL https://phabricator.wikimedia.org/T219364 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Mholloway, Lucas_Werkmeister_WMDE, Smalyshev, Lea_Lacroix_WMDE, Gehel, dcausse, TerraCodes, Liuxinyu970226, Aklapper, Addshore, alaa_wmde, Legado_Shulgin, Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, LawExplorer, Zppix, _jensen, rosalieper, Wong128hk, Wikidata-bugs, aude, jayvdb, faidon, Mbch331, Jay8g, fgiunchedi, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T124196: Fatal "cannot perform this operation with arrays" from CirrusSearch/ElasticaWrite (using JobQueueDB)
dcausse added a comment. > E.g. avoid queuing updates of this type or this size (possibly configurable), or run them differently, or to try it as today and then catch/suppress the failure - maybe logging a warning in its stead. Imo the JobQueue should raise an error if it's not able to save the message correctly. Since the Queue owns the way the message is serialized it's hard for an extension to determine what will be the actual size of the stored message. TASK DETAIL https://phabricator.wikimedia.org/T124196 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, GTirloni, debt, EBernhardson, aaron, Krinkle, Rudloff, Physikerwelt, GFXDude2010, Zoglun, hoo, aude, Aklapper, alaa_wmde, ET4Eva, Nandana, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Avner, Gehel, _jensen, rosalieper, FloNight, Wikidata-bugs, jayvdb, Jdforrester-WMF, Mbch331, Jay8g, Krenair, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T220823: Use ElasticSearch for bulk Wikidata entity term lookup
dcausse edited projects, added Discovery-Search; removed Discovery. TASK DETAIL https://phabricator.wikimedia.org/T220823 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, alaa_wmde, Addshore, Aklapper, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Mbch331, ET4Eva, Darkminds3113, Avner, Gehel, FloNight ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T206613: Search of wikidata string property values using haswbstatement is case sensitive
dcausse added a comment. @Smalyshev switching the main field for statements to `lowercase_keyword` won't break anything, it's like a new field it'll be taken into account just after the next reindex. I would advise against a new field here, the cardinality would nearly double. TASK DETAIL https://phabricator.wikimedia.org/T206613 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: EBernhardson, WMDE-leszek, Multichill, Aklapper, Lydia_Pintscher, aude, debt, Smalyshev, Lea_Lacroix_WMDE, ArthurPSmith, Esc3300, dcausse, Mvolz, E.S.A-Sheild, darthmon_wmde, Premeditated, joker88john, ET4Eva, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, WSH1906, Avner, Lewizho99, Maathavan, Gehel, _jensen, rosalieper, FloNight, Wikidata-bugs, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T206613: Search of wikidata string property values using haswbstatement is case sensitive
dcausse added a comment. we should also note we index this data in the main filter field which means that for searches that are unlikely to be ambiguous (IDs and such) one could simply search for 10.1371/journal.pcbi.1002947 <https://www.wikidata.org/w/index.php?search=10.1371/journal.pcbi.1002947&title=Special%3ASearch&profile=default&fulltext=1>. Benefit is that it's tolerant to small variation in punctuation but also accept partial searches like: journal.pcbi.1002947 <https://www.wikidata.org/w/index.php?search=journal.pcbi.1002947&title=Special%3ASearch&profile=default&fulltext=1> or even with small variations: journal pcbi 1002947 <https://www.wikidata.org/w/index.php?search=journal pcbi 1002947&title=Special%3ASearch&profile=default&fulltext=1>. So instead of giving up with no results this kind of searches could be tried if a human is behind to select/accept/validate a result. TASK DETAIL https://phabricator.wikimedia.org/T206613 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: EBernhardson, WMDE-leszek, Multichill, Aklapper, Lydia_Pintscher, aude, debt, Smalyshev, Lea_Lacroix_WMDE, ArthurPSmith, Esc3300, dcausse, Mvolz, E.S.A-Sheild, darthmon_wmde, Premeditated, joker88john, ET4Eva, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, WSH1906, Avner, Lewizho99, Maathavan, Gehel, _jensen, rosalieper, FloNight, Wikidata-bugs, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T206613: Search of wikidata string property values using haswbstatement is case sensitive
dcausse added a comment. @Smalyshev I totally agree, I was suggesting a UX where a first attempt search would try to match using the haswbstatement keyword (switched to case insensitive) and then a second try could be made using the fulltext mode if the first attempt is unsuccessful. TASK DETAIL https://phabricator.wikimedia.org/T206613 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: EBernhardson, WMDE-leszek, Multichill, Aklapper, Lydia_Pintscher, aude, debt, Smalyshev, Lea_Lacroix_WMDE, ArthurPSmith, Esc3300, dcausse, Mvolz, E.S.A-Sheild, darthmon_wmde, Premeditated, joker88john, ET4Eva, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, WSH1906, Avner, Lewizho99, Maathavan, Gehel, _jensen, rosalieper, FloNight, Wikidata-bugs, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Merged] T215615: Stop using negative scores for deboosting statements
dcausse closed this task as a duplicate of T209859: Wikidata autocomplete (wbsearchentities) results with score <= 0. TASK DETAIL https://phabricator.wikimedia.org/T215615 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Premeditated, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Merged] T209859: Wikidata autocomplete (wbsearchentities) results with score <= 0
dcausse merged a task: T215615: Stop using negative scores for deboosting statements. TASK DETAIL https://phabricator.wikimedia.org/T209859 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Liuxinyu970226, dcausse, Smalyshev, EBernhardson, Aklapper, darthmon_wmde, Premeditated, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Status] T202254: Use ExtensionRegistry instead of class_exists to check for CirrusSearch in Wikibase
dcausse changed the task status from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T202254 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Addshore, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Jayprakash12345, QZanden, LawExplorer, _jensen, rosalieper, Izno, Wikidata-bugs, aude, Dinoguy1000, Mbch331, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Claimed] T186037: Need mvn build mode that does not build gui
dcausse claimed this task. dcausse moved this task from Backlog to In progress on the Discovery-Wikidata-Query-Service-Sprint board. TASK DETAIL https://phabricator.wikimedia.org/T186037 WORKBOARD https://phabricator.wikimedia.org/project/board/1239/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper, Smalyshev, darthmon_wmde, ET4Eva, Nandana, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, _jensen, rosalieper, Cirdan, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T186037: Need mvn build mode that does not build gui
dcausse added a project: Discovery-Wikidata-Query-Service-Sprint. TASK DETAIL https://phabricator.wikimedia.org/T186037 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper, Smalyshev, darthmon_wmde, ET4Eva, Nandana, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, _jensen, rosalieper, Cirdan, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T186037: Need mvn build mode that does not build gui
dcausse added a comment. We could also use `mvn -pl -gui` which does not require any changes TASK DETAIL https://phabricator.wikimedia.org/T186037 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper, Smalyshev, darthmon_wmde, ET4Eva, Nandana, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, _jensen, rosalieper, Cirdan, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T173248: Convert blank nodes to “unknown value”
dcausse added a comment. I see that the response is t1514691780 t1514691780 Would that work if the API returns again a blank node instead of trying to deal with the string? t1514691780 t1514691780 The UI could do something special when it encounters blank nodes. TASK DETAIL https://phabricator.wikimedia.org/T173248 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Jonas, Aklapper, Smalyshev, Lucas_Werkmeister_WMDE, PokestarFan, darthmon_wmde, ET4Eva, Nandana, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, _jensen, rosalieper, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T229329: WDQS Updater: java.lang.StringIndexOutOfBoundsException: String index out of range: -8
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION #logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg %mdc%n 13:09:09.985 [main] INFO org.wikidata.query.rdf.tool.Update - Starting Updater 0.3.2-SNAPSHOT (2629afc6287b660a4576d795debea6781879afff <https://phabricator.wikimedia.org/rWDQR2629afc6287b660a4576d795debea6781879afff>) 13:09:10.699 [main] INFO o.w.q.r.t.change.ChangeSourceContext - Checking where we left off 13:09:10.699 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the updater 13:09:10.854 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Found left off time from the updater 13:09:10.855 [main] INFO o.w.q.r.t.change.ChangeSourceContext - Found start time in the RDF store: 2019-07-30T12:06:13Z 13:09:10.881 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Creating consumer wdqs1009 13:09:11.157 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Subscribed to 6 topics 13:09:11.158 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic codfw.mediawiki.revision-create-0 to (timestamp=1564488373000, offset=56972315) 13:09:11.158 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic eqiad.mediawiki.page-undelete-0 to (timestamp=1564488373000, offset=190052) 13:09:11.158 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic codfw.mediawiki.page-undelete-0 to (timestamp=1564488373000, offset=6581) 13:09:11.158 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic eqiad.mediawiki.revision-create-0 to (timestamp=1564488373000, offset=1346546990) 13:09:11.158 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic codfw.mediawiki.page-delete-0 to (timestamp=1564488373000, offset=409877) 13:09:11.159 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic eqiad.mediawiki.page-delete-0 to (timestamp=1564488373000, offset=9775382) 13:09:12.626 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Found 903 changes 13:09:12.809 [main] ERROR org.wikidata.query.rdf.tool.Update - Error during updater run. java.lang.StringIndexOutOfBoundsException: String index out of range: -8 at java.lang.String.substring(String.java:1931) at org.wikidata.query.rdf.tool.Updater.getRevisionUpdates(Updater.java:289) at org.wikidata.query.rdf.tool.Updater.handleChanges(Updater.java:228) at org.wikidata.query.rdf.tool.Updater.run(Updater.java:150) at org.wikidata.query.rdf.tool.Update.run(Update.java:173) at org.wikidata.query.rdf.tool.Update.main(Update.java:97) #logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg %mdc%n 13:09:24.357 [main] INFO org.wikidata.query.rdf.tool.Update - Starting Updater 0.3.2-SNAPSHOT (2629afc6287b660a4576d795debea6781879afff <https://phabricator.wikimedia.org/rWDQR2629afc6287b660a4576d795debea6781879afff>) 13:09:25.141 [main] INFO o.w.q.r.t.change.ChangeSourceContext - Checking where we left off 13:09:25.141 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the updater 13:09:25.258 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Found left off time from the updater 13:09:25.258 [main] INFO o.w.q.r.t.change.ChangeSourceContext - Found start time in the RDF store: 2019-07-30T12:06:13Z 13:09:25.279 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Creating consumer wdqs1009 13:09:25.523 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Subscribed to 6 topics 13:09:25.524 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic codfw.mediawiki.revision-create-0 to (timestamp=1564488373000, offset=56972315) 13:09:25.524 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic eqiad.mediawiki.page-undelete-0 to (timestamp=1564488373000, offset=190052) 13:09:25.524 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic codfw.mediawiki.page-undelete-0 to (timestamp=1564488373000, offset=6581) 13:09:25.524 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic eqiad.mediawiki.revision-create-0 to (timestamp=1564488373000, offset=1346546990) 13:09:25.524 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic codfw.mediawiki.page-delete-0 to (timestamp=1564488373000, offset=409877) 13:09:25.524 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Set topic eqiad.mediawiki.page-delete-0 to (timestamp=1564488373000, offset=9775382) 13:09:27.043 [main] INFO o.w.q.rdf.tool.change.KafkaPoller - Found 903 changes 13:09:27.229 [main] ERROR org.wikidata.query.rdf.tool.Update - Error during updater run. java.lang.StringIndexOutOfBoundsException: String index out of range: -8 at java.lang.String.substring(String.java:1931) at org.wikidata.query.rdf.tool.Updater.getRevisionUpdates(Updater.java:289
[Wikidata-bugs] [Maniphest] [Edited] T229329: WDQS Updater: java.lang.StringIndexOutOfBoundsException: String index out of range: -8
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T229329 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T229329: WDQS Updater: java.lang.StringIndexOutOfBoundsException: String index out of range: -8
dcausse added a comment. if uris.entity().length() is greater than entityId.length() by 8 char it'll cause this exception. Since it's a test server it's perhaps misconfigured. TASK DETAIL https://phabricator.wikimedia.org/T229329 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T240334: \Wikibase\EntityContent::getTextForSearchIndex no longer includes textual properties
dcausse created this task. dcausse added projects: Wikidata, CirrusSearch. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Discovery-Search. TASK DESCRIPTION It appears that all textual properties are being removed from the indexed text search content. As far as I understand `repo/config/Wikibase.searchindex.php` is responsible for this and it looks like it stopped to be registered as a `WikibaseTextForSearchIndex` hook. Original report <https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#Search_doesn't_include_subtitle_field?> TASK DETAIL https://phabricator.wikimedia.org/T240334 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Triaged] T240334: \Wikibase\EntityContent::getTextForSearchIndex no longer includes textual properties
dcausse triaged this task as "High" priority. TASK DETAIL https://phabricator.wikimedia.org/T240334 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T240334: \Wikibase\EntityContent::getTextForSearchIndex no longer includes textual properties
dcausse added a project: Regression. TASK DETAIL https://phabricator.wikimedia.org/T240334 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, Jayprakash12345, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wong128hk, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Retitled] T240334: Evaluate adding all/more textual properties to the text field
dcausse renamed this task from "\Wikibase\EntityContent::getTextForSearchIndex no longer includes textual properties" to "Evaluate adding all/more textual properties to the text field". dcausse lowered the priority of this task from "High" to "Medium". dcausse removed a project: Regression. dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T240334 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb, Jayprakash12345, Wong128hk ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T240334: Evaluate adding all/more textual properties to the text field
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T240334 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Retitled] T240334: Evaluate adding all/some textual properties to the text field
dcausse renamed this task from "Evaluate adding all/more textual properties to the text field" to "Evaluate adding all/some textual properties to the text field". TASK DETAIL https://phabricator.wikimedia.org/T240334 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Closed] T239898: Investigate triple counts difference between dumps and what blazegraph reports
dcausse closed this task as "Invalid". dcausse added a comment. I recounted properly (using a rdf parser) the triple count from the dump after the munge operation and found 8.9B triples, closing as invalid. TASK DETAIL https://phabricator.wikimedia.org/T239898 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: JAllemandou, Gehel, elukey, dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T240328: Slow indexing for wbsearchentities
dcausse added a comment. Could you precise what search string are you using? `wbsearchentities` should be using the mysql database when searching using entity ids the lag should be relatively small. On the hand the search index will take some time to udpate (job queue lag + elasticsearch refresh ) so searches based on labels/aliases may not react immediately after an entity is added. TASK DETAIL https://phabricator.wikimedia.org/T240328 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Daniel_Mietchen, WMDE-leszek, Lydia_Pintscher, Aklapper, Fnielsen, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Unassigned] T105427: Need a way for WDQS updater to become aware of suppressed deletes
dcausse removed Smalyshev as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T105427 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Bugreporter, Sjoerddebruin, Krenair, gerritbot, JanZerebecki, Deskana, daniel, Legoktm, Aklapper, Smalyshev, darthmon_wmde, ET4Eva, DannyS712, Nandana, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, _jensen, rosalieper, Scott_WUaS, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T240453: EPIC: Improve completion search on wikidata
dcausse added a project: Wikidata. dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T240453 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T240328: Slow indexing for wbsearchentities
dcausse added a comment. @Fnielsen thanks for letting us know, if search by entity ID is slow again please re-open this issue with a link to the entity you created so that we can correlate with the metrics we monitor. For label search we are currently experiencing recurrent lag on the jobqueue that could make it rather bad (several minutes per T224425 <https://phabricator.wikimedia.org/T224425>). TASK DETAIL https://phabricator.wikimedia.org/T240328 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Fnielsen, dcausse Cc: dcausse, Daniel_Mietchen, WMDE-leszek, Lydia_Pintscher, Aklapper, Fnielsen, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Claimed] T239338: Manually purge obsolete entites from WDQS
dcausse claimed this task. dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T239338 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Lea_Lacroix_WMDE, Lydia_Pintscher, Gehel, SCIdude, Aklapper, MisterSynergy, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Triaged] T239338: Manually purge obsolete entites from WDQS
dcausse triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T239338 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Lea_Lacroix_WMDE, Lydia_Pintscher, Gehel, SCIdude, Aklapper, MisterSynergy, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Closed] T239338: Manually purge obsolete entites from WDQS
dcausse closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T239338 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Lea_Lacroix_WMDE, Lydia_Pintscher, Gehel, SCIdude, Aklapper, MisterSynergy, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Paste] [Updated] P9859: Number of blank nodes used as object and grouped by predicate (wdqs2006)
dcausse changed the title of this paste from "Blank node grouped by predicate (wdqs2006)" to "Number of blank nodes used as object and grouped by predicate (wdqs2006)". dcausse added a project: Wikidata-Query-Service. PASTE DETAIL https://phabricator.wikimedia.org/P9859 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T239414: Investigate how blank nodes are used and synced between wikibase and wdqs
dcausse added a comment. P9859 <https://phabricator.wikimedia.org/P9859> contains the output of select ?p (count(*)as ?cnt) { ?s ?p ?o . filter (isBlank(?o)) } group by ?p ran on wdqs2006 Will run the same query but with a filter on the subject as asked, expectations here are to find only `owl:complementOf` around 42K. TASK DETAIL https://phabricator.wikimedia.org/T239414 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Smalyshev, Lucas_Werkmeister_WMDE, Igorkim78, dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T239414: Investigate how blank nodes are used and synced between wikibase and wdqs
dcausse added a comment. select ?p (count(*)as ?cnt) { ?s ?p ?o . filter (isBlank(?s)) } group by ?p output is at P9862 <https://phabricator.wikimedia.org/P9862> and as expected we only see the corresponding subjects of the owl constraint on `owl:complementOf` (`rdf:type`, `owl:onProperty` and `owl:someValuesFrom`) as exported by the wikibase today: wdno:P31 a owl:Class ; owl:complementOf _:genid1 . _:genid1 a owl:Restriction ; owl:onProperty wdt:P31 ; owl:someValuesFrom owl:Thing . @Igorkim78 could you take a look? TASK DETAIL https://phabricator.wikimedia.org/T239414 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Smalyshev, Lucas_Werkmeister_WMDE, Igorkim78, dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Unassigned] T239908: Extract more metrics from blazegraph sparql update response
dcausse removed Zbyszko as the assignee of this task. dcausse added a subscriber: Zbyszko. dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T239908 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Zbyszko, Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T239908: Extract more metrics from blazegraph sparql update response
dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T239908 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Zbyszko, dcausse Cc: Zbyszko, Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Assigned] T239908: Extract more metrics from blazegraph sparql update response
dcausse assigned this task to Zbyszko. TASK DETAIL https://phabricator.wikimedia.org/T239908 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Zbyszko, dcausse Cc: Zbyszko, Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Triaged] T239908: Extract more metrics from blazegraph sparql update response
dcausse triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T239908 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Zbyszko, dcausse Cc: Zbyszko, Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T238002: WDQS Munger should be multi threaded
dcausse added a comment. Separation of - parsing - munging - writing in multiple thread doubled the speed of the munger old: real1371m34.618s user1854m48.672s sys 24m44.480s new: real731m20.495s user1798m42.176s sys 30m7.888s I should have linked https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/553758 to this task. Since the rdf parser is the limiting factor I think we will have to do the entity delimitation without a rdf parser if we want to further improve the speed of this step. We could also consider switching to the `nt` format which I'm sure will be a lot faster to parse if the size overhead is acceptable. TASK DETAIL https://phabricator.wikimedia.org/T238002 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Smalyshev, Gehel, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Claimed] T239750: org.wikidata.query.rdf.tool.Updater - Importer error: ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
dcausse claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T239750 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T240540: Investigate usage of the query service & queries that are run
dcausse added a comment. Also T239852 <https://phabricator.wikimedia.org/T239852> TASK DETAIL https://phabricator.wikimedia.org/T240540 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Simon_Villeneuve, Lucas_Werkmeister_WMDE, Lydia_Pintscher, Addshore, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T241125: Import wikidata RDF dump to hadoop
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION We currently have no easy way to run large scale analysis on the wikidata graph. WDQS and blazegraph are not suited for this scenario. Hadoop seems to be a better fit. Discussing with @JAllemandou we believe that a simple parquet file with quads might be sufficient for now. TASK DETAIL https://phabricator.wikimedia.org/T241125 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, JAllemandou, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T241128: EPIC: Reduce the time needed to do the initial WDQS import
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION Tracking task to collect all the efforts made in this direction. | start | dump | node | munge time | import time | initial lag | time to catchup | | 2019-12-04 | wikidata-20191202-all-BETA.ttl.bz2 | wdqs1010 | 22.85h[1] | 191h (8days) | 2 weeks | //in progress// | | [1] munge times improved to 12.18hours in T238002 <https://phabricator.wikimedia.org/T238002> TASK DETAIL https://phabricator.wikimedia.org/T241128 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T241128: EPIC: Reduce the time needed to do the initial WDQS import
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T241128 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T241128: EPIC: Reduce the time needed to do the initial WDQS import
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T241128 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T241128: EPIC: Reduce the time needed to do the initial WDQS import
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T241128 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T241213: Organize and improve integration test coverage for WDQS Updater
dcausse added a comment. The most annoying integration test (and probably slowest) is org.wikidata.query.rdf.tool.wikibase.WikibaseRepositoryIntegrationTest: - it generates anonymous edits to test.wikidata.org in order to test the RecentChange api - Concurrent runs of this test will cause failure. The test expects to see the timestamp of the edits it makes, if this test is run concurrently (two patches in CI) it's a race and can fail. - it adds a lot of complexity to test the robustness (retries) by launching a custom Proxy prior running the integration tests (start-proxy and org.wikidata.query.rdf.tool.Proxy) TASK DETAIL https://phabricator.wikimedia.org/T241213 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Zbyszko, dcausse Cc: dcausse, Aklapper, Zbyszko, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Triaged] T240453: EPIC: Improve completion search on wikidata
dcausse triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T240453 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lea_Lacroix_WMDE, dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T241128: EPIC: Reduce the time needed to do the initial WDQS import
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T241128 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T242453: wdqs1005 stopped to handle updates
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION Apparently a deadlock inside blazegraph itself: Found one Java-level deadlock: = "GASEngine4": waiting for ownable synchronizer 0x7fcbf9dbc3c0, (a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is held by "com.bigdata.journal.Journal.executorService1539347" "com.bigdata.journal.Journal.executorService1539347": waiting to lock monitor 0x7fc555798e18 (object 0x7fcfda000320, a java.lang.Object), which is held by "GASEngine2" "GASEngine2": waiting to lock monitor 0x7fc57c22e358 (object 0x7fcbf9b97710, a java.lang.Object), which is held by "com.bigdata.journal.Journal.executorService1539347" full stack: P10117 <https://phabricator.wikimedia.org/P10117> The problem remained unseen by the system, but started around 2020-01-10T15:44. The machine stopped to handle updates and queries, the lag stopped to be reported as well. Blazegraph was restarted around 19:44. TASK DETAIL https://phabricator.wikimedia.org/T242453 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T242640: query/wikidata/gui jenkins build broken
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION Seen on https://gerrit.wikimedia.org/r/c/wikidata/query/gui/+/564056 17:41:06 + npm install --no-progress 17:41:07 npm WARN deprecated vis@4.21.0: Please consider using https://github.com/visjs 17:41:10 npm WARN deprecated grunt-filerev@2.3.1: Deprecated 17:41:23 npm WARN deprecated hawk@3.1.3: This module moved to @hapi/hawk. Please make sure to switch over as this distribution is no longer supported and may contain bugs and critical security issues. 17:41:23 npm WARN deprecated boom@2.10.1: This version has been deprecated in accordance with the hapi support policy (hapi.im/support). Please upgrade to the latest version to get the best features, bug fixes, and security patches. If you are unable to upgrade at this time, paid support is available for older versions (hapi.im/commercial). 17:41:23 npm WARN deprecated sntp@1.0.9: This module moved to @hapi/sntp. Please make sure to switch over as this distribution is no longer supported and may contain bugs and critical security issues. 17:41:23 npm WARN deprecated hoek@2.16.3: This version has been deprecated in accordance with the hapi support policy (hapi.im/support). Please upgrade to the latest version to get the best features, bug fixes, and security patches. If you are unable to upgrade at this time, paid support is available for older versions (hapi.im/commercial). 17:41:23 npm WARN deprecated cryptiles@2.0.5: This version has been deprecated in accordance with the hapi support policy (hapi.im/support). Please upgrade to the latest version to get the best features, bug fixes, and security patches. If you are unable to upgrade at this time, paid support is available for older versions (hapi.im/commercial). 17:41:25 npm WARN deprecated jscs-preset-wikimedia@1.0.1: No longer maintained. We recomment migrating to ESLint with eslint-config-wikimedia. 17:41:26 npm WARN deprecated core-js@2.6.11: core-js@<3 is no longer maintained and not recommended for usage due to the number of issues. Please, upgrade your dependencies to the actual version of core-js@3. 17:41:26 npm WARN deprecated nomnom@1.8.1: Package no longer supported. Contact supp...@npmjs.com for more info. 17:41:26 npm ERR! Linux 4.9.0-11-amd64 17:41:26 npm ERR! argv "/usr/bin/nodejs" "/usr/local/bin/npm" "install" "--no-progress" 17:41:26 npm ERR! node v6.11.0 17:41:26 npm ERR! npm v3.8.3 17:41:26 npm ERR! code EMISSINGARG 17:41:26 17:41:26 npm ERR! typeerror Error: Missing required argument #1 17:41:26 npm ERR! typeerror at andLogAndFinish (/usr/local/lib/node_modules/npm/lib/fetch-package-metadata.js:31:3) 17:41:26 npm ERR! typeerror at fetchPackageMetadata (/usr/local/lib/node_modules/npm/lib/fetch-package-metadata.js:51:22) 17:41:26 npm ERR! typeerror at resolveWithNewModule (/usr/local/lib/node_modules/npm/lib/install/deps.js:455:12) 17:41:26 npm ERR! typeerror at /usr/local/lib/node_modules/npm/lib/install/deps.js:456:7 17:41:26 npm ERR! typeerror at /usr/local/lib/node_modules/npm/node_modules/iferr/index.js:13:50 17:41:26 npm ERR! typeerror at /usr/local/lib/node_modules/npm/lib/fetch-package-metadata.js:37:12 17:41:26 npm ERR! typeerror at addRequestedAndFinish (/usr/local/lib/node_modules/npm/lib/fetch-package-metadata.js:82:5) 17:41:26 npm ERR! typeerror at returnAndAddMetadata (/usr/local/lib/node_modules/npm/lib/fetch-package-metadata.js:117:7) 17:41:26 npm ERR! typeerror at pickVersionFromRegistryDocument (/usr/local/lib/node_modules/npm/lib/fetch-package-metadata.js:134:20) 17:41:26 npm ERR! typeerror at /usr/local/lib/node_modules/npm/node_modules/iferr/index.js:13:50 17:41:26 npm ERR! typeerror This is an error with npm itself. Please report this error at: 17:41:26 npm ERR! typeerror <http://github.com/npm/npm/issues> 17:41:26 17:41:26 npm ERR! Please include the following file with any support request: 17:41:26 npm ERR! /src/npm-debug.log TASK DETAIL https://phabricator.wikimedia.org/T242640 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Triaged] T242640: query/wikidata/gui jenkins build broken
dcausse triaged this task as "High" priority. TASK DETAIL https://phabricator.wikimedia.org/T242640 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T242640: query/wikidata/gui jenkins build broken
dcausse added a comment. very similar to T242587 <https://phabricator.wikimedia.org/T242587> TASK DETAIL https://phabricator.wikimedia.org/T242640 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T242453: wdqs1005 stopped to handle updates
dcausse added a comment. icinga check showed: `CHECK_NRPE STATE UNKNOWN: Socket timeout after 10 seconds.` for `Query Service HTTP Port` and `NaN` for `WDQS high update lag`. We should probably alert in case of timeouts. Stackdumps from blazegraph: P10185 <https://phabricator.wikimedia.org/P10185> TASK DETAIL https://phabricator.wikimedia.org/T242453 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Addshore, dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T243270: Test commons RDF dumps on sdcquery.wmflabs.org
dcausse added projects: Wikidata-Query-Service, Discovery-Search (Current work). Restricted Application added a project: Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T243270 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T243292: Fix the munger to support commons RDF dump
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a project: Wikidata. TASK DESCRIPTION When trying to munge the dumps the process is filtering many triples saying: 15:03:28.962 [org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler$RDFActionsReplayer] INFO o.wikidata.query.rdf.tool.rdf.Munger - Unrecognized statement: s:http://commons.wikimedia.org/entity/statement/M51372-16FD5B4C-7B40-4FCC-984C-4DAA9A8D00CA p:http://wikiba.se/ontology#rank o:http://wikiba.se/ontology#NormalRank 15:03:28.962 [org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler$RDFActionsReplayer] INFO o.wikidata.query.rdf.tool.rdf.Munger - Unrecognized statement: s:http://commons.wikimedia.org/entity/statement/M51372-16FD5B4C-7B40-4FCC-984C-4DAA9A8D00CA p:http://www.wikidata.org/prop/statement/P7482 o:http://www.wikidata.org/entity/Q66458942 15:03:28.962 [org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler$RDFActionsReplayer] INFO o.wikidata.query.rdf.tool.rdf.Munger - Unrecognized subjects: [http://commons.wikimedia.org/entity/statement/M51376-4B8D8CD4-0783-433F-B0A2-1DD667F8FBAB] while processing http://commons.wikimedia.org/entity/M51376. Expected only sitelinks and subjects starting with http://commons.wikimedia.org/wiki/Special:EntityData/ and [http://www.wikidata.org/entity/, http://commons.wikimedia.org/entity/] 15:03:28.962 [org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler$RDFActionsReplayer] INFO o.wikidata.query.rdf.tool.rdf.Munger - Unrecognized statement: s:http://commons.wikimedia.org/entity/statement/M51376-4B8D8CD4-0783-433F-B0A2-1DD667F8FBAB p:http://www.w3.org/1999/02/22-rdf-syntax-ns#type o:http://wikiba.se/ontology#BestRank 15:03:28.962 [org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler$RDFActionsReplayer] INFO o.wikidata.query.rdf.tool.rdf.Munger - Unrecognized statement: s:http://commons.wikimedia.org/entity/statement/M51376-4B8D8CD4-0783-433F-B0A2-1DD667F8FBAB p:http://wikiba.se/ontology#rank o:http://wikiba.se/ontology#NormalRank 15:03:28.962 [org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler$RDFActionsReplayer] INFO o.wikidata.query.rdf.tool.rdf.Munger - Unrecognized statement: s:http://commons.wikimedia.org/entity/statement/M51376-4B8D8CD4-0783-433F-B0A2-1DD667F8FBAB p:http://www.wikidata.org/prop/statement/P7482 o:http://www.wikidata.org/entity/Q66458942 15:03:28.962 [org.wikidata.query.rdf.tool.rdf.AsyncRDFHandler$RDFActionsReplayer] INFO o.wikidata.query.rdf.tool.rdf.Munger - Unrecognized subjects: [http://commons.wikimedia.org/entity/statement/M51389-FE3B5391-E9F2-45E2-B353-84FD0ED8FDC8] while processing http://commons.wikimedia.org/entity/M51389. Expected only sitelinks and subjects starting with http://commons.wikimedia.org/wiki/Special:EntityData/ and [http://www.wikidata.org/entity/, http://commons.wikimedia.org/entity/] The munger is ran with the following options: `-w commons.wikimedia.org -U http://www.wikidata.org --commonsUri http://commons.wikimedia.org`. TASK DETAIL https://phabricator.wikimedia.org/T243292 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T243431: Grant more rights to wikidata/query/rdf for the group wikidata/query (similar to search)
dcausse created this task. dcausse added projects: Wikidata-Query-Service, Gerrit-Privilege-Requests. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION In order to use the mvn release plugin on `wikidata/query/service` we need special rights to the repo. For search <https://gerrit.wikimedia.org/r/admin/projects/search,access> we ALLOW: - Create Signed Tag - Create Annotated Tag - Push (no-force-push) TASK DETAIL https://phabricator.wikimedia.org/T243431 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Zbyszko, Mstyles, Gehel, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Legoktm, MarcoAurelio ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T243431: Grant more rights to wikidata/query/rdf for the group wikidata/query (similar to search)
dcausse added a project: Release-Engineering-Team. TASK DETAIL https://phabricator.wikimedia.org/T243431 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Zbyszko, Mstyles, Gehel, dcausse, darthmon_wmde, DannyS712, Nandana, NebulousIris, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Liudvikas, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Legoktm ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Retitled] T243431: Grant more rights to wikidata/query/rdf for the group wikidata-query (similar to search)
dcausse renamed this task from "Grant more rights to wikidata/query/rdf for the group wikidata/query (similar to search)" to "Grant more rights to wikidata/query/rdf for the group wikidata-query (similar to search)". TASK DETAIL https://phabricator.wikimedia.org/T243431 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Zbyszko, Mstyles, Gehel, dcausse, darthmon_wmde, DannyS712, Nandana, NebulousIris, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Liudvikas, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Legoktm ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T244341: Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints
dcausse created this task. dcausse added projects: Wikidata, Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The use of blank nodes makes an update process always a challenging operation (http://www.aidanhogan.com/docs/blank_nodes_jws.pdf). The use of blank nodes by wikibase is very limited and thus I propose to remove them to simplify the WDQS update strategy. In wikibase we use blank nodes for two purposes: - denote an //unknown value// (originally discussed in T95441 <https://phabricator.wikimedia.org/T95441>) - owl constraints of wdno property For the unknown value use-case we seem to only use the blank node as a way to //filter// such unknown value. For the OWL constraints it's unclear if it is actually used/useful. For unknown values I suggest: wd:Q3 a wikibase:Item wdunk:P2 . wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c a wikibase:Statement wdunk:P2; wikibase:rank wikibase:NormalRank . A query like SELECT ?human WHERE { ?human wdt:P106 ?o FILTER isBLANK(?o) . } Would become SELECT ?human WHERE { ?human a wdunk:P106 } And SELECT ?human WHERE { ?human wdt:P106 ?o } Would now mean: //All entities with a known occupation// As opposed to //All entities with a known or unkown occupation// which should be written as: SELECT ?human WHERE { {?human wdt:P106 ?o} union {?human a wdunk:P106} } For OWL constraints I simply suggest to remove them or materialize the blank node. wdno:P109 a owl:Class ; owl:complementOf wdowl:P109 . wdowl:P109 a owl:Restriction ; owl:onProperty wdt:P109 ; owl:someValuesFrom owl:Thing . This is a breaking change to https://www.mediawiki.org/w/index.php?title=Wikibase/Indexing/RDF_Dump_Format if this is accepted I suggest a transition period where blank nodes would be kept, the use of //isBlank// from the query service could start emitting a deprecation warning. TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints
dcausse added a comment. In T244341#5852014 <https://phabricator.wikimedia.org/T244341#5852014>, @Lucas_Werkmeister_WMDE wrote: > If the problem is just the blank nodes themselves, why not use this new `wdunk:P2` in the same way, as in `wd:Q3 wdt:P2 wdunk:P2`? That’s still worse than the blank nodes (multiple “unknown value” statements collapse into one triple, just as is currently the case for “no value” statements), but at least it shouldn’t break as many queries. Yes the problem are the blank nodes themselves as there are no ways to mutate the graph without querying it. I'm OK with your suggestion but this makes two unrelated unknown values equal. Would something like wd:Q2 wdt:P2 wdunk:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 be acceptable? This would be very similar to the previous approach using blank nodes. No different unknown values could be collapsed, the drawback is that to extract unknown values one would have to rely on a uri prefix filter using `STRSTARTS`. SELECT ?human WHERE { ?human wdt:P106 ?o FILTER isBLANK(?o) . } would become PREFIX wdunk: <http://www.wikidata.org/prop/unknown/> SELECT ?human WHERE { ?human wdt:P106 ?o FILTER STRSTARTS( STR(?o), 'http://www.wikidata.org/prop/unknown/' ) . } Any other suggestions? Ideally I'd like to find a structure that does no require having to run filters. TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Triaged] T221709: scap service restarts for WDQS are inconsistent
dcausse triaged this task as "High" priority. TASK DETAIL https://phabricator.wikimedia.org/T221709 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Gehel, Aklapper, Smalyshev, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, thcipriani, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Jay8g, Krenair, fgiunchedi, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints
dcausse added a comment. Yes the issue with blank nodes is that they are not "reference-able" and thus point delete queries are impossible which is what we want to achieve with the next gen updater. I did some tests and isBlank is a lot faster (I suppose because this information is inlined as opposed to the IRI that has to be fetched from its dictionary). So materializing the unknown value with the statement identifier we risk to encounter timeouts more frequently. So unless we have a third alternative we have two choices: - use a constant value: probably very fast but we now say: all unknown values are equal. - use the statement identifier: very close to the previous semantic but a lot slower I think I prefer the first approach you suggested, dealing with perf issues seems more annoying than a less precise graph. The usecases that I can think of that could be affected are: - queries based on equality: find entities which share the same value. Such queries will have to filter out explicitly the "unknown value" - queries based on the number of unknown values on a particular property? Examples would help here I think. - other usecases? TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T244341: Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T244341: Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T244341: Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T244590: EPIC: Rework the WDQS updater as an event driven application
dcausse created this task. dcausse added projects: Epic, Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION The the current merging strategy for applying updates require sending all the entity data on every update. The goal of this task is to design a new updater that will be able to send the minimal number of triples to the RDF store to synchronize the graph with the state of wikibase. Note: this is a very rough plan and many details will probably change as implementation specific requirements will pop up. The proposed approach relies on a system able to do stateful computation over data streams (flink). F31553995: updater_v2.png <https://phabricator.wikimedia.org/F31553995> Based on a set of source event streams populated by mediawiki and change propagation the steps are: 1. filter: filter events related to wikibase and its entities 2. event time reordering: reorder the events and assemble them to a single partitioned stream 3. rev state evaluation: determine what command needs to applied to mutate the graph - this steps require holding a state of previously seen revision and other actions (e.g. visibility change) - the output of this is a simple event without any data saying: do a diff between rev X and Y, fully delete entity QXYZ, ... - the initial state will be populated using the revisions present in the RDF dump - seen revisions (after a fresh import) will be easy to discard 4. rdf diff generation: materialize the command and fetch the data from wikibase and send it over a RDF stream - it's probable that in some cases (suppressed delete) the exact set of triples to be deleted will be unknown and thus will require a special delete command to be applied to the backend 5. rdf import: The components reading this stream will be very similar to the current updater: a process running locally on the wdqs nodes pushing data to blazegraph For the first iteration no cleanups will be performed, orphaned values & references will remain in the RDF store. This will be mitigated by more frequent reloads of the dump. Such system being prone to deviation frequent reloads will be important, it's important to note that the state of step 3 is tightly coupled with its dump and thus we will have to instantiate a new stream per imported dump. In other words a wdqs system imported using dump Y will have to consume the RDF stream generated from an initial state based on this same dump. This means that the RDF stream will be named against a particular dump instance. Note on event time reordering: - seems to be relatively easy in flink: e.g. https://github.com/ververica/flink-training-exercises/blob/master/src/main/java/com/ververica/flinktraining/examples/datastream_java/process/CarEventSort.java Note on state management: - RocksDB offers incremental checkpoint and seems to support high cardinality https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend quite well, the operation where we need a large state seems to be partitionable and thus the state can be split into multiple buckets. Note on initial state: - seems to be allowed by flink using its state-processor-api: https://flink.apache.org/feature/2019/09/13/state-processor-api.html TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Zbyszko, Gehel, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints
dcausse added a comment. Thanks for all the feedback. I'll discard the "constant" option. A note on the motivations: we plan to redesign the update process as a set of trivial mutations to the graph, as far as I can see updating a graph with blank nodes cannot be a "trivial operation", citing http://www.aidanhogan.com/docs/blank_nodes_jws.pdf (page 10 //Issues with blank nodes//): > Given a fixed, serialised RDF graph (i.e., a document), labelling of blank nodes can vary across parsers and across time. Checking if two representations originate from the same data thus often requires an isomorphism check, for which in general, no polynomial algorithms are known. By making some assumptions on the wikibase RDF model I believe that generating a diff between two entity revisions should be relatively easy even if blank nodes are involved, the problem is when applying this diff to the RDF backend, if it involves blank nodes it cannot be a set of trivial mutations (here trivial means using `INSERT|DELETE DATA` statements). E.g. if the diff indicates that we need to remove: wd:Q2 wdt:P576 _:genid1 because `DELETE DATA` is not possible with blank nodes we have to send something like DELETE { ?s ?p ?o } WHERE { wd:Q2 wdt:P576 ?o . FILTER(isBlank(?o)) ?s ?p ?o } Which will delete all blank nodes attached to `wd:Q2` by `wdt:P576`. I haven't checked but I hope that at most one blank node can be attached to the same subject/predicate, if not this makes the sync algorithm a bit more complex. TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Retitled] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints
dcausse renamed this task from "Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints" to "Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints". dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T203397: Provide more useful redirect for statement nodes (wds:…)
dcausse added a project: Discovery-Search (Current work). dcausse added a comment. @Lea_Lacroix_WMDE no, we just need to deploy it, sorry for the delay. TASK DETAIL https://phabricator.wikimedia.org/T203397 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Lea_Lacroix_WMDE, Lucas_Werkmeister_WMDE, Aklapper, Beast1978, Un1tY, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T196165: Commons image: when pasting the exact title, get the correct file first in the suggester
dcausse added a comment. I believe that because the file name has many words the score on the tokenized text fields is very high (since we sum all token scores), the score on the exact match having only one word and despite having a high weight it's not enough to compete with the loss of its text matches discarded because of the negation. In general I suggest using autocomplete APIs (opensearch/prefixsearch) for type-a-head searches, this is faster and the list of results does no change unexpectedly as you type. What's done in the mobile app is a two steps search: first send a prefixsearch then a fulltext search if not results are found. When using the fulltext search (list=search) if the user is not aware that it's using the fulltext engine the UI should escape the search syntax otherwise some chars may trigger a special syntax (negation in this case). The proper way to fix this issue is imo to: - use a completion API + fulltext search fallback - escape the fulltext search syntax from the UI: AND, OR, NOT, ||, &&, -, !, ", :, \?, * TASK DETAIL https://phabricator.wikimedia.org/T196165 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Silvan_WMDE, dcausse Cc: hoo, EBernhardson, TJones, dcausse, Ladsgroup, Silvan_WMDE, Addshore, Bencemac, Aklapper, Ayack, Liuxinyu970226, Smalyshev, Lydia_Pintscher, Lea_Lacroix_WMDE, Beast1978, Un1tY, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, Iflorez, darthmon_wmde, alaa_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints
dcausse added a comment. In T244341#5890517 <https://phabricator.wikimedia.org/T244341#5890517>, @Lucas_Werkmeister_WMDE wrote: >> I haven't checked but I hope that at most one blank node can be attached to the same subject/predicate, if not this makes the sync algorithm a bit more complex. > > At least currently, this is not the case. I added a second “partner: unknown value” statement to the sandbox item <https://www.wikidata.org/wiki/Q4115189>, and now wd:Q4115189 wdt:P451 ?v <https://query.wikidata.org/#SELECT%20%2a%20%7B%20wd%3AQ4115189%20wdt%3AP451%20%3Fv.%20%7D> produces two blank nodes as result. Thanks for checking, this makes the diff process and the update query a bit more complex as now we need to track the number of blank nodes attached to a particular subject/predicate. As for the update query I believe this is still possible with: DELETE { ?s ?p ?o } WHERE { SELECT ?s ?p ?o { wd:Q4115189 wdt:P451 ?o . FILTER(isBlank(?o)) ?s ?p ?o } LIMIT 1 # number of blank nodes to keep } But overall this makes updating a triple with a blank node a completely separate operation that cannot be batched with and like `INSERT DATA` or `DELETE DATA`. > Once we stop using blank nodes for OWL constraints, though, I believe you can at least assume that blank nodes are never the subject of a triple – would that help? (I feel like this ought to eliminate the need for a full isomorphism check from your quote.) Indeed, this and the fact that for SomeValue all blank nodes are unique, even the same statement "SomeValue" used as wdt and ps is different currently <https://query.wikidata.org/#SELECT%20%2a%20%7B%0A%20%20%7B%20wd%3AQ4115189%20wdt%3AP451%20%3Fv.%20%7D%0A%20%20UNION%0A%20%20%7B%0A%20%20%20%20wd%3AQ4115189%20p%3AP451%20%3Fs%20.%0A%20%20%20%20%3Fs%20ps%3AP451%20%3Fv%0A%20%20%7D%0A%7D>. From the point of view of a "simple diff operation" this is a fortunate situation as it makes the update process simpler in the scenario we decline this task and stick with blank nodes. In the case we decide to move forward with IRIs placeholders the object of wdt and ps predicates of the same statement will become identical for SomeValue. TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T244590: EPIC: Rework the WDQS updater as an event driven application
dcausse added a comment. In T244590#5893018 <https://phabricator.wikimedia.org/T244590#5893018>, @Ottomata wrote: > COOL! :) > >> it's important to note that the state of step 3 is tightly coupled with its dump and thus we will have to instantiate a new stream per imported dump. In other words a wdqs system imported using dump Y will have to consume the RDF stream generated from an initial state based on this same dump. This means that the RDF stream will be named against a particular dump instance. > > Hm. Would it be possible instead to lambda architecture this part? Instead of having to reload from a full dump and then recreate a new stream, could accomplish the same cleanups by backfilling from a batch job in Hadoop? I'm not sure I fully understand the 'cleanups' here. Are they not do-able with the stream because events representing some of the state changes don't exist (yet)? I hope that in the future once the stream has been stabilized yes reloading the system might become less necessary and that a fresh and consistent dump can be reconstructed (daily?) using the stream itself. Reloading from the dump generated by MW is something we need anyways in order to bootstrap the system and at the beginning will be needed to circumvent: - bug fixes (bug where the data is simply lost) - lost events (undetected failures or bugs in MW) - cleanup The cleanup operation mentioned here is a sort of "garbage collection", to simplify we need to detect unused resources (subgraph) in the graph, the stream itself does not know this unless we keep another large state doing references counting. The solution proposed here is to simply spawn a new system from time to time (the dump generated by MW is clean) so that we do cleanup and fix lost events at the same time, but I agree with you this is not ideal and leveraging more batch jobs and/or more states in the stream will help minimize the need to do a full reload. TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Ottomata, JAllemandou, Aklapper, Zbyszko, Gehel, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints
dcausse added a comment. To move this forward I propose the following plan: 1. add a `wikibase:isSomeValue` custom function configurable to work as a proxy to `isBlank()` or `STRSTARTS( STR(?o), 'http://www.wikidata.org/prop/somevalue/' )` and announce it 2. instead of changing the RDF representation generated by wikibase add a new option to the updater/munger to transform (on the fly) blank nodes as IRIs placeholders 3. setup a test instance of the query service using this proposal and ask for feedback 4. if no major blockers are encountered we can announce that the RDF representation is about to change 5. start emitting deprecation warnings when seeing `isBlank` 6. after a deprecation period activate placeholder IRIs everywhere 7. change the wikibase RDF representation TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T245533: Add a custom wikibase:isSomeValue() function
dcausse created this task. dcausse added projects: Wikidata, Wikidata-Query-Service. TASK DESCRIPTION In order to allow a "smooth" transition from blank nodes to IRI placeholders the `wikibase:isSomeValue` function will be added to the set of custom functions offered by the //query service//. A new option will be added read at blazegraph startup to instruct this function to behave like: - `isBlank()` - or `STRSTARTS( STR(?o), 'http://www.wikidata.org/prop/somevalue/' )` TASK DETAIL https://phabricator.wikimedia.org/T245533 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T245541: Add a new munge option to do blank node skolemization
dcausse created this task. dcausse added projects: Wikidata, Wikidata-Query-Service. TASK DESCRIPTION This munge option will transform all blank nodes as placeholder IRIs using the following rules: wdno:P109 a owl:Class ; owl:complementOf _:1 . _:1 a owl:Restriction ; owl:onProperty wdt:P109 ; owl:someValuesFrom owl:Thing . to: wdno:P109 a owl:Class ; owl:complementOf wdowl:P109 . wdowl:P109 a owl:Restriction ; owl:onProperty wdt:P109 ; owl:someValuesFrom owl:Thing . Introducing a new prefix: @prefix wdowl: <http://www.wikidata.org/owl/> . - wd:Q2 wdt:P576 _:genid1 ; p:P576 s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 . s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 a wikibase:Statement, wikibase:BestRank ; wikibase:rank wikibase:NormalRank ; ps:P576 _:genid2 ; pq:P805 wd:Q2003654 . to wd:Q2 wdt:P576 wdsome:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ; p:P576 s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 . s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 a wikibase:Statement, wikibase:BestRank ; wikibase:rank wikibase:NormalRank ; ps:P576 wdsome:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ; pq:P805 wd:Q2003654 . introducing a new prefix: @prefix wdsome: <http://www.wikidata.org/prop/somevalue/> . Question: https://www.w3.org/2011/rdf-wg/wiki/Skolemisation mentions using //well known// IRIs (rfc5785 <https://tools.ietf.org/html/rfc5785>) but since this proposal is not finished (and unlikely to be ever finished? stalled since 2011) I wonder if we should follow it? The placeholder has not been decided but arbitrarily choosing //bnode// the placeholder IRIs would become: `http://www.wikidata.org/.well-known/bnode/Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658` TASK DETAIL https://phabricator.wikimedia.org/T245541 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Lucas_Werkmeister_WMDE, mkroetzsch, Daniel_Mietchen, Jheald, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints
dcausse added a comment. In T244341#5893723 <https://phabricator.wikimedia.org/T244341#5893723>, @Lucas_Werkmeister_WMDE wrote: > Well, I’d like to see what the IRIs for unknown value in qualifiers and references look like before we move ahead with this plan. Sure, I tried to add some but I'm not sure how I did not find my way in the UI, could you try to update the sandbox item so that we can have a look? TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T245533: Add a custom wikibase:isSomeValue() function
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T245533 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T245533: Add a custom function to identify wikibase "somevalue"
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T245533 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Retitled] T245533: Add a custom function to identify wikibase "somevalue"
dcausse renamed this task from "Add a custom wikibase:isSomeValue() function" to "Add a custom function to identify wikibase "somevalue"". dcausse added a subscriber: Lucas_Werkmeister_WMDE. dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T245533 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T239687: Rework how value and reference changes are handled
dcausse moved this task from In Progress to Done on the Discovery-Search (Current work) board. dcausse added a comment. The munger has been reworked so that it does not deal with this cleanup. The next gen updater will address this cleanup in a different way. For the current updater one thing to keep in mind is that the ref cleanup was disabled some time ago (investigating T194325 <https://phabricator.wikimedia.org/T194325>: https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/437362) and never re-enabled since then. We could imagine disabling values cleanup as well this could give us some room with the current updater. TASK DETAIL https://phabricator.wikimedia.org/T239687 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Daniel_Mietchen, Lucas_Werkmeister_WMDE, dcausse, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T241125: Import wikidata RDF dump to hadoop
dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T241125 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Daniel_Mietchen, Aklapper, dcausse, JAllemandou, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T239908: Extract more metrics from blazegraph sparql update response
dcausse moved this task from To Be Deployed to Done on the Discovery-Search (Current work) board. dcausse added a comment. Dashboard created here: https://grafana.wikimedia.org/d/dSksY08Zk/wikidata-query-service-updater?orgId=1 TASK DETAIL https://phabricator.wikimedia.org/T239908 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Zbyszko, dcausse Cc: Daniel_Mietchen, Zbyszko, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Claimed] T203397: Provide more useful redirect for statement nodes (wds:…)
dcausse claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T203397 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Lea_Lacroix_WMDE, Lucas_Werkmeister_WMDE, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Merged] T244590: EPIC: Rework the WDQS updater as an event driven application
dcausse merged a task: T229544: Create RDF diff for WDQS updating. dcausse added subscribers: Smalyshev, Iamamz3. TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Iamamz3, Smalyshev, Ottomata, JAllemandou, Aklapper, Zbyszko, Gehel, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T229544: Create RDF diff for WDQS updating
dcausse closed this task as a duplicate of T244590: EPIC: Rework the WDQS updater as an event driven application. TASK DETAIL https://phabricator.wikimedia.org/T229544 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Iamamz3, Aklapper, Smalyshev, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints
dcausse added a comment. @Lucas_Werkmeister_WMDE thanks! Indeed this becomes a bit more challenging as the statement identifier alone cannot be used to identify a bnode under a particular statement. I'll continue to discuss about this specific issue in T245541 <https://phabricator.wikimedia.org/T245541> to limit noise on this ticket. @Jheald about blank nodes usage in T239414 <https://phabricator.wikimedia.org/T239414> we investigated how blank nodes are currently used and extracted some numbers here: P9859 <https://phabricator.wikimedia.org/P9859> (count per predicate where a blank node is used a an object). Sadly such counts won't be faster using this new proposed approach. TASK DETAIL https://phabricator.wikimedia.org/T244341 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Jheald, Daniel_Mietchen, mkroetzsch, Denny, Lucas_Werkmeister_WMDE, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Retitled] T231515: Duplicate blank nodes on edited properties
dcausse renamed this task from "Duplicate wdno: clauses on edited properties" to "Duplicate blank nodes on edited properties". dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T231515 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Igorkim78, Gehel, Aklapper, Smalyshev, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T245541: Add a new munge option to do blank node skolemization
dcausse added a comment. In https://www.wikidata.org/wiki/Q4115189#Q4115189$7d68afee-408d-1c1e-946b-43d8d37a17b5 @Lucas_Werkmeister_WMDE added more "somevalue" to the graph (references and qualifiers) which outputs the following graph: wd:Q4115189 p:P370 s:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5 . s:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5 a wikibase:Statement, wikibase:BestRank ; wikibase:rank wikibase:NormalRank ; ps:P370 _:genid6 ; pq:P2315 "this is a demo for T244341, if possible please don’t remove it before, say, 2020-02-26 :)"@en ; pq:P370 _:genid7 ; pq:P1106 _:genid8 ; prov:wasDerivedFrom ref:6c8b1cd1c3cd814ab99e3c40580f12024ceff994 . ref:6c8b1cd1c3cd814ab99e3c40580f12024ceff994 a wikibase:Reference ; pr:P370 _:genid9 ; pr:P855 _:genid10 . //First constatation is that our current update strategy is not able do a clean change on this entity, existing blank nodes are leaked (updated T231515 <https://phabricator.wikimedia.org/T231515>).// The proposed solution for encoding bnodes as currently stated does not work well as it will conflate all bnodes attached to a statement. One obvious solution would to encode more information to this made-up IRI by prefixing/suffixing the predicate: wd:Q4115189 p:P370 s:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5 . s:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5 a wikibase:Statement, wikibase:BestRank ; wikibase:rank wikibase:NormalRank ; ps:P370 wdsome:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5-PS-P370 ; pq:P2315 "this is a demo for T244341, if possible please don’t remove it before, say, 2020-02-26 :)"@en ; pq:P370 wdsome:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5-PQ-P370 ; pq:P1106 wdsome:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5-PQ-P1106 ; prov:wasDerivedFrom ref:6c8b1cd1c3cd814ab99e3c40580f12024ceff994 . ref:6c8b1cd1c3cd814ab99e3c40580f12024ceff994 a wikibase:Reference ; pr:P370 wdsome:ref-6c8b1cd1c3cd814ab99e3c40580f12024ceff994-PR-P370 ; pr:P855 wdsome:ref-6c8b1cd1c3cd814ab99e3c40580f12024ceff994-PR-P855 . This is a bit ugly but this would ensure uniqueness of the IRIs, also I'm not a big fan of propagating information into IDs as I'm afraid that some process may want to make some assumptions on the structure of the ID itself. Here the only information we want to encode is the: - uniqueness of the node - a common IRI prefix to detect that these are skolem IRIs. I wonder if we could not simply hash things. `wdsome:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5-PS-P370` would become `wdsome:e81da6d67fa0cbf0e1daf440c31cf138ffe565c8` TASK DETAIL https://phabricator.wikimedia.org/T245541 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Lucas_Werkmeister_WMDE, mkroetzsch, Daniel_Mietchen, Jheald, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T245727: Create a streaming-updater submodule under query/wikidata/rdf
dcausse created this task. dcausse added projects: Epic, Wikidata-Query-Service, Wikidata. TASK DESCRIPTION Using flink and scala with ideally a small test case. TASK DETAIL https://phabricator.wikimedia.org/T245727 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Zbyszko, dcausse Cc: Gehel, Zbyszko, Aklapper, JAllemandou, Ottomata, Smalyshev, Iamamz3, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T245728: Add a component to generate a diff between two entity revisions
dcausse created this task. dcausse added projects: Wikidata-Query-Service, Wikidata. TASK DESCRIPTION This component will take the list of triples of entity at revision X and Y and generate a diff between these two. The diff should be the list of triples to add and the ones to delete. For now we assume that no blank nodes are present. Depending on the outcome on the discussion about blank nodes diffing before or after munging might vary. It might be safer to consider that this component takes unmunged version of the triples even if for now the very first step will be to munge the inputs. TASK DETAIL https://phabricator.wikimedia.org/T245728 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Zbyszko, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T245727: Create a streaming-updater submodule under query/wikidata/rdf
dcausse removed a project: Epic. TASK DETAIL https://phabricator.wikimedia.org/T245727 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Zbyszko, dcausse Cc: Gehel, Zbyszko, Aklapper, JAllemandou, Ottomata, Smalyshev, Iamamz3, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Dinoguy1000 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T246237: Extract some statistics on the use of the isBlank() function in wdqs query logs
dcausse created this task. dcausse added projects: Wikidata, Wikidata-Query-Service. TASK DESCRIPTION It would nice to have an idea of the percentage of queries that uses the `isBlank` function. It might interesting to know if we can identify tools using this function in order to contact their maintainer if we were to introduce a new function to replace `isBlank`. TASK DETAIL https://phabricator.wikimedia.org/T246237 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: JAllemandou, Aklapper, Lucas_Werkmeister_WMDE, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T246237: Extract some statistics on the use of the isBlank() function in wdqs query logs
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T246237 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: JAllemandou, Aklapper, Lucas_Werkmeister_WMDE, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Assigned] T246238: Investigate common qualifiers for “unknown value” statement main snaks
dcausse assigned this task to JAllemandou. dcausse added a subscriber: JAllemandou. dcausse added a comment. @JAllemandou did some work and could extract some numbers from a dump imported to hadoop: SELECT ?property (COUNT(*) AS ?count) WHERE { ?statement ps:P20 ?unknown. FILTER(ISBLANK(?unknown)) ?statement ?pq ?qualifier. ?property wikibase:qualifier ?pq. } GROUP BY ?property ORDER BY DESC(?count) ++-+ |s1 |count| ++-+ |http://www.wikidata.org/entity/P1319|7| |http://www.wikidata.org/entity/P17 |5| |http://www.wikidata.org/entity/P131 |4| |http://www.wikidata.org/entity/P1476|1| ++-+ SELECT ?mainProperty ?qualifierProperty (COUNT(*) AS ?count) WHERE { ?mainProperty wikibase:claim ?p; wikibase:statementProperty ?ps. ?qualifierProperty wikibase:qualifier ?pq. ?subject ?p ?statement. ?statement ?ps ?unknown. FILTER(isBlank(?unknown)) ?statement ?pq ?qualifier. } GROUP BY ?mainProperty ?qualifierProperty +++-+ |mp |qp |count| +++-+ |http://www.wikidata.org/entity/P1343|http://www.wikidata.org/entity/P1810|77855| |http://www.wikidata.org/entity/P123 |http://www.wikidata.org/entity/P1932|56081| |http://www.wikidata.org/entity/P1343|http://www.wikidata.org/entity/P459 |29573| |http://www.wikidata.org/entity/P1343|http://www.wikidata.org/entity/P3519|24827| |http://www.wikidata.org/entity/P1343|http://www.wikidata.org/entity/P3382|15795| |http://www.wikidata.org/entity/P1343|http://www.wikidata.org/entity/P352 |9593 | |http://www.wikidata.org/entity/P1343|http://www.wikidata.org/entity/P2926|7310 | |http://www.wikidata.org/entity/P570 |http://www.wikidata.org/entity/P1319|4143 | |http://www.wikidata.org/entity/P1343|http://www.wikidata.org/entity/P973 |2233 | |http://www.wikidata.org/entity/P98 |http://www.wikidata.org/entity/P1932|2192 | |http://www.wikidata.org/entity/P98 |http://www.wikidata.org/entity/P1545|1630 | |http://www.wikidata.org/entity/P393 |http://www.wikidata.org/entity/P1932|1356 | |http://www.wikidata.org/entity/P110 |http://www.wikidata.org/entity/P1932|1334 | |http://www.wikidata.org/entity/P110 |http://www.wikidata.org/entity/P1545|1302 | |http://www.wikidata.org/entity/P559 |http://www.wikidata.org/entity/P131 |957 | |http://www.wikidata.org/entity/P655 |http://www.wikidata.org/entity/P1932|952 | |http://www.wikidata.org/entity/P655 |http://www.wikidata.org/entity/P1545|879 | |http://www.wikidata.org/entity/P570 |http://www.wikidata.org/entity/P1326|864 | |http://www.wikidata.org/entity/P569 |http://www.wikidata.org/entity/P1326|690 | |http://www.wikidata.org/entity/P5202|http://www.wikidata.org/entity/P1932|627 | |http://www.wikidata.org/entity/P5202|http://www.wikidata.org/entity/P1545|615 | |http://www.wikidata.org/entity/P50 |http://www.wikidata.org/entity/P1932|542 | |http://www.wikidata.org/entity/P569 |http://www.wikidata.org/entity/P1319|384 | |http://www.wikidata.org/entity/P2093|http://www.wikidata.org/entity/P1545|380 | |http://www.wikidata.org/entity/P1343|http://www.wikidata.org/entity/P3523|369 | |http://www.wikidata.org/entity/P571 |http://www.wikidata.org/entity/P1326|336 | |http://www.wikidata.org/entity/P571 |http://www.wikidata.org/entity/P1319|273 | |http://www.wikidata.org/entity/P3872|http://www.wikidata.org/entity/P137 |223 | |http://www.wikidata.org/entity/P3872|http://www.wikidata.org/entity/P580 |206 | |http://www.wikidata.org/entity/P3872|http://www.wikidata.org/entity/P582 |206 | |http://www.wikidata.org/entity/P26 |http://www.wikidata.org/entity/P580 |199 | |http://www.wikidata.org/entity/P816 |http://www.wikidata.org/entity/P817 |192 | |http://www.wikidata.org/entity/P921 |http://www.wikidata.org/entity/P1545|158 | |http://www.wikidata.org/entity/P816 |http://www.wikidata.org/entity/P1107|139 | |http://www.wikidata.org/entity/P2679|http://www.wikidata.org/entity/P1932|137 | |http://www.wikidata.org/entity/P3383|http://www.wikidata.org/entity/P805 |135 | |http://www.wikidata.org/entity/P921 |http://www.wikidata.org/entity/P1559|131 | |http://www.wikidata.org/entity/P625 |http://www.wikidata.org/entity/P828 |126 | |http://www.wikidata.org/entity/P629 |http://www.wikidata.org/entity/P407 |114 | |http://www.wikidata.org/entity/P1181
[Wikidata-bugs] [Maniphest] [Updated] T246238: Investigate common qualifiers for “unknown value” statement main snaks
dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T246238 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, dcausse Cc: JAllemandou, Lea_Lacroix_WMDE, Gehel, Aklapper, dcausse, Igorkim78, Lucas_Werkmeister_WMDE, Jheald, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Assigned] T246237: Extract some statistics on the use of the isBlank() function in wdqs query logs
dcausse assigned this task to JAllemandou. dcausse added a project: Discovery-Search (Current work). dcausse added a subscriber: Lea_Lacroix_WMDE. dcausse added a comment. @Lea_Lacroix_WMDE the use of `isBlank` seems pretty low, do you think we should still try to identify bots by grouping by user-agent and see if something is identifiable? TASK DETAIL https://phabricator.wikimedia.org/T246237 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, dcausse Cc: Lea_Lacroix_WMDE, JAllemandou, Aklapper, Lucas_Werkmeister_WMDE, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Triaged] T246237: Extract some statistics on the use of the isBlank() function in wdqs query logs
dcausse triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T246237 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, dcausse Cc: Lea_Lacroix_WMDE, JAllemandou, Aklapper, Lucas_Werkmeister_WMDE, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs