[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Smalyshev added a comment. It almost sounds like rather than having a live stream of edit events, or at least acting entirely on a live stream of edit events, the updater should instead do internal batching I thought about it, the problem here is that the starting point can be anything, so batching is a bit problematic. That said, the above is not exactly true, we have dump as natural starting point, and then we have starting points based off where the previous update finished. So if we create an aggregated stream which can be synchronized with dumps (e.g. make dump issue an event to the same stream and make aggregator never cross dump events) then it could work. Note however the batched stream does not really solve the problem of stale updates - even with batches update can still read old data. Right now we detect this situation and re-queue the update, but ideally we'd use ChronologyProtector to fetch the proper data.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Addshore, MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Addshore added a comment. No, this has horrible performance impact if several edits happen in a row, since it only fetches the oldid, not the latest one, so instead of one update with the latest ID you get 10 updates with each of the intermediate IDs. I think we have had this exact conversation on another ticket before In any case, stale reads are discussed in T212550. I was going to comment on T212550. but it didn't quite fit so will comment here It almost sounds like rather than having a live stream of edit events, or at least acting entirely on a live stream of edit events, the updater should instead do internal batching (waiting for a few seconds before trying to get the updated data to see if there are subsequent writes) or it should be listening to an event stream that already does this (events batches in mediawiki). Such a batched event stream probably only makes sense for wikibase edits? it could look something like (entity:Q123,latestrev:123,changecount:5) for example. I could be making assumptions about the finomation the updater needs here. I'm going to stop thinking on this topic now (unless poked)TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, AddshoreCc: Addshore, MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Smalyshev added a comment. It might make sense to remove this and instead pass in either the oldid or revision param No, this has horrible performance impact if several edits happen in a row, since it only fetches the oldid, not the latest one, so instead of one update with the latest ID you get 10 updates with each of the intermediate IDs.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Addshore, MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Addshore added a comment. In T210044#4767023, @Smalyshev wrote: Maybe the Wikidata replica Updater is reading from is lagging behind the updates and returns an old revision? Not sure if that is possible. If it's possible to read old data after we've got Kafka message with new data, that would be a lot of trouble for us. That seems totally possible to me, it's a shame the maxlag can not be requested by the updater while using Special:EntityData. Most of the time the replicas are not lagged, but if your requesting the entity data sub 1 second hitting old data could be possible. I went and looked at the requests of the updater. "uri_path":"/wiki/Special:EntityData/Q60305784.ttl","uri_query":"?nocache=1550568436299=dump" If I'm correct nocache here is just the current ts of the request added to avoid cache hits? Although this will avoid hitting the varnish cache, you could still be getting old data in some situations. It might make sense to remove this and instead pass in either the oldid or revision param (if this data is available in kafka). You'll then get a 404 if the revision is not yet retrievable and should probably wait a few seconds and try again, the correct amount of time to wait could probably be determined by maxlag, but maxlag doesn't currently work with Special:EntityData at all.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, AddshoreCc: Addshore, MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Smalyshev added a comment. @Floatingpurr I have fixed some problematic data but still didn't find the root cause yet. Do you have some specific issues you think are related to this?TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Floatingpurr added a comment. Hey guys! Any news about this problem?TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, FloatingpurrCc: MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
doctaxon added a comment. @Smalyshev What's the status to this task? There are still problems, -> https://www.wikidata.org/wiki/Wikidata:Request_a_query#SPARQL_query_result_erroneousTASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, doctaxonCc: doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Fnielsen added a comment. May clearing of deleted items be related? https://www.wikidata.org/wiki/Lexeme:L31707 has been deleted since 14 November 2018, but is still in WDQS: https://query.wikidata.org/#DESCRIBE%20wd%3AL31707TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, FnielsenCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Smalyshev added a comment. The query is captured in F27383365.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Smalyshev added a comment. SPARQL dumps show that data is present in SPARQL but not in the database. Filed https://github.com/blazegraph/database/issues/109 with upstream and will dig into it further to see what we can find out there.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2018-12-04T19:35:30Z] Finished deploy [wdqs/wdqs@81dac18]: Install new Updater for T210044 investigation (duration: 10m 36s)TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, StashbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2018-12-04T19:24:53Z] Started deploy [wdqs/wdqs@81dac18]: Install new Updater for T210044 investigationTASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, StashbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
gerritbot added a comment. Change 477429 merged by Gehel: [operations/puppet@production] Enable SPARQL logging to a separate file https://gerrit.wikimedia.org/r/477429TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
gerritbot added a comment. Change 477429 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [operations/puppet@production] Enable SPARQL logging to a separate file https://gerrit.wikimedia.org/r/477429TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
gerritbot added a comment. Change 477410 merged by Gehel: [operations/puppet@production] Stop RDF dumps https://gerrit.wikimedia.org/r/477410TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
gerritbot added a comment. Change 477410 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [operations/puppet@production] Stop RDF dumps https://gerrit.wikimedia.org/r/477410TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Smalyshev added a comment. RDF dumps confirm that data is coming fine through This also happens for single-update items, which haven't been touched for a while, so it's not some kind of update race. Weird thing is it happens on multiple servers in the same way. E.g. check on Q3601865 reveals: ['wdq21', 'wdq22', 'wdq23', 'wdq24', 'wdq25', 'wdq26'] - are fine ['wdq6', 'wdq7', 'wdq8', 'wdq9', 'wdq10'] - are missing rank statements for Q36018657-1D11B168-9B90-44EF-88EE-4B8B09EE08A8 - all for the same statement ID! ['wdq5', 'wdq4', 'wdq3'] - are missing rank statements and P2860 statement for the same statement. This does not look completely random data corruption - something is going on. I'll try to dump SPARQL we're sending next to see what's up there.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Smalyshev added a comment. @Lea_Lacroix_WMDE yes, this is possible.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
gerritbot added a comment. Change 475243 merged by Gehel: [operations/puppet@production] Enable dumping RDF on test & internal https://gerrit.wikimedia.org/r/475243TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
gerritbot added a comment. Change 475241 merged by Gehel: [operations/puppet@production] Enable dumping RDF data for debugging purposes https://gerrit.wikimedia.org/r/475241TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Smalyshev added a comment. timings for the last one: wdq5: 2018-11-21T23:35:56Z wdq4: 2018-11-21T23:35:04Z wdq6: 2018-11-21T23:34:44Z wdq21: 2018-11-21T23:34:55Z wdq22: 2018-11-21T23:34:49Z wdq23: 2018-11-21T23:35:07Z wdq3: 2018-11-21T23:35:03Z wdq7: 2018-11-21T23:34:51Z wdq8: 2018-11-21T23:34:56Z wdq24: 2018-11-21T23:34:47Z wdq25: 2018-11-21T23:34:56Z wdq26: 2018-11-21T23:34:56Z wdq9: 2018-11-21T23:34:40Z wdq10: 2018-11-21T23:34:51Z So wdq5 is actually updating very late compared to other hosts, yet it gets stale data! Very strange.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Smalyshev added a comment. I also discover for some items the data is not the latest revision: e.g. for Q57529925 we have all servers except wdq5 on 795730255 but wdq5 on 795729753. This seems to be related to bursts of robotic edits on the same entry, which may suggest there's some kind of race condition. Maybe the Wikidata replica Updater is reading from is lagging behind the updates and returns an old revision? Not sure if that is possible. If it's possible to read old data after we've got Kafka message with new data, that would be a lot of trouble for us.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
gerritbot added a comment. Change 475243 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [operations/puppet@production] Enable dumping RDF on test & internal https://gerrit.wikimedia.org/r/475243TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
gerritbot added a comment. Change 475241 had a related patch set uploaded (by Smalyshev; owner: Smalyshev): [operations/puppet@production] Enable dumping RDF data for debugging purposes https://gerrit.wikimedia.org/r/475241TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, merbst, LawExplorer, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS
Smalyshev added a comment. Timestamps for data updates: wdq10: 2018-11-20T05:49:19Z wdq6: 2018-11-20T05:49:25Z wdq26: 2018-11-20T05:49:31Z wdq21: 2018-11-20T05:49:32Z wdq22: 2018-11-20T05:49:32Z wdq3: 2018-11-20T05:49:35Z wdq7: 2018-11-20T05:49:40Z wdq9: 2018-11-20T05:49:39Z wdq8: 2018-11-20T05:49:35Z wdq5: 2018-11-20T05:49:39Z wdq25: 2018-11-20T05:49:39Z wdq4: 2018-11-20T05:49:48Z wdq24: 2018-11-20T05:49:53Z wdq23: 2018-11-20T05:50:13Z Looks like the group that has the good data was updated earlier than the one that has bad data, even though timestamps overlap a bit. But the groups appear to be distinct. So it's possible something happened in the meantime... Maybe we need to capture updates RDF to see what the servers are seeing.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Aklapper, Smalyshev, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs