[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-20 Thread Smalyshev
Smalyshev added a comment.
It almost sounds like rather than having a live stream of edit events, or at least acting entirely on a live stream of edit events, the updater should instead do internal batching

I thought about it, the problem here is that the starting point can be anything, so batching is a bit problematic. That said, the above is not exactly true, we have dump as natural starting point, and then we have starting points based off where the previous update finished. So if we create an aggregated stream which can be synchronized with dumps (e.g. make dump issue an event to the same stream and make aggregator never cross dump events) then it could work. Note however the batched stream does not really solve the problem of stale updates - even with batches update can still read old data. Right now we detect this situation and re-queue the update, but ideally we'd use ChronologyProtector to fetch the proper data.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Addshore, MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-20 Thread Addshore
Addshore added a comment.
No, this has horrible performance impact if several edits happen in a row, since it only fetches the oldid, not the latest one, so instead of one update with the latest ID you get 10 updates with each of the intermediate IDs.

I think we have had this exact conversation on another ticket before

In any case, stale reads are discussed in T212550.

I was going to comment on T212550. but it didn't quite fit so will comment here

It almost sounds like rather than having a live stream of edit events, or at least acting entirely on a live stream of edit events, the updater should instead do internal batching (waiting for a few seconds before trying to get the updated data to see if there are subsequent writes) or it should be listening to an event stream that already does this (events batches in mediawiki).
Such a batched event stream probably only makes sense for wikibase edits? it could look something like (entity:Q123,latestrev:123,changecount:5) for example.

I could be making assumptions about the finomation the updater needs here.
I'm going to stop thinking on this topic now (unless poked)TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, AddshoreCc: Addshore, MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-19 Thread Smalyshev
Smalyshev added a comment.
It might make sense to remove this and instead pass in either the oldid or revision param

No, this has horrible performance impact if several edits happen in a row, since it only fetches the oldid, not the latest one, so instead of one update with the latest ID you get 10 updates with each of the intermediate IDs.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Addshore, MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-19 Thread Addshore
Addshore added a comment.

In T210044#4767023, @Smalyshev wrote:
Maybe the Wikidata replica Updater is reading from is lagging behind the updates and returns an old revision? Not sure if that is possible. If it's possible to read old data after we've got Kafka message with new data, that would be a lot of trouble for us.


That seems totally possible to me, it's a shame the maxlag can not be requested by the updater while using Special:EntityData.
Most of the time the replicas are not lagged, but if your requesting the entity data sub 1 second hitting old data could be possible.

I went and looked at the requests of the updater.

"uri_path":"/wiki/Special:EntityData/Q60305784.ttl","uri_query":"?nocache=1550568436299=dump"

If I'm correct nocache here is just the current ts of the request added to avoid cache hits?
Although this will avoid hitting the varnish cache, you could still be getting old data in some situations.

It might make sense to remove this and instead pass in either the oldid or revision param (if this data is available in kafka).
You'll then get a 404 if the revision is not yet retrievable and should probably wait a few seconds and try again, the correct amount of time to wait could probably be determined by maxlag, but maxlag doesn't currently work with Special:EntityData at all.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, AddshoreCc: Addshore, MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-18 Thread Smalyshev
Smalyshev added a comment.
@Floatingpurr I have fixed some problematic data but still didn't find the root cause yet. Do you have some specific issues you think are related to this?TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-02-08 Thread Floatingpurr
Floatingpurr added a comment.
Hey guys! Any news about this problem?TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, FloatingpurrCc: MichaelSchoenitzer, doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2019-01-10 Thread doctaxon
doctaxon added a comment.
@Smalyshev What's the status to this task? There are still problems, -> https://www.wikidata.org/wiki/Wikidata:Request_a_query#SPARQL_query_result_erroneousTASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, doctaxonCc: doctaxon, Nikki, Lydia_Pintscher, hoo, Multichill, KaMan, Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-05 Thread Fnielsen
Fnielsen added a comment.
May clearing of deleted items be related? https://www.wikidata.org/wiki/Lexeme:L31707 has been deleted since 14 November 2018, but is still in WDQS:

https://query.wikidata.org/#DESCRIBE%20wd%3AL31707TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, FnielsenCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-04 Thread Smalyshev
Smalyshev added a comment.
The query is captured in F27383365.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-04 Thread Smalyshev
Smalyshev added a comment.
SPARQL dumps show that data is present in SPARQL but not in the database. Filed https://github.com/blazegraph/database/issues/109 with upstream and will dig into it further to see what we can find out there.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-04 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-12-04T19:35:30Z]  Finished deploy [wdqs/wdqs@81dac18]: Install new Updater for T210044 investigation (duration: 10m 36s)TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, StashbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-04 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-12-04T19:24:53Z]  Started deploy [wdqs/wdqs@81dac18]: Install new Updater for T210044 investigationTASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, StashbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-04 Thread gerritbot
gerritbot added a comment.
Change 477429 merged by Gehel:
[operations/puppet@production] Enable SPARQL logging to a separate file

https://gerrit.wikimedia.org/r/477429TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-03 Thread gerritbot
gerritbot added a comment.
Change 477429 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Enable SPARQL logging to a separate file

https://gerrit.wikimedia.org/r/477429TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-03 Thread gerritbot
gerritbot added a comment.
Change 477410 merged by Gehel:
[operations/puppet@production] Stop RDF dumps

https://gerrit.wikimedia.org/r/477410TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-03 Thread gerritbot
gerritbot added a comment.
Change 477410 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Stop RDF dumps

https://gerrit.wikimedia.org/r/477410TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-12-03 Thread Smalyshev
Smalyshev added a comment.
RDF dumps confirm that data is coming fine through

This also happens for single-update items, which haven't been touched for a while, so it's not some kind of update race. Weird thing is it happens on multiple servers in the same way. E.g. check on Q3601865 reveals:

['wdq21', 'wdq22', 'wdq23', 'wdq24', 'wdq25', 'wdq26'] - are fine
['wdq6', 'wdq7', 'wdq8', 'wdq9', 'wdq10'] - are missing rank statements for Q36018657-1D11B168-9B90-44EF-88EE-4B8B09EE08A8 - all for the same statement ID!
['wdq5', 'wdq4', 'wdq3'] - are missing rank statements and P2860 statement for the same statement.

This does not look completely random data corruption - something is going on. I'll try to dump SPARQL we're sending next to see what's up there.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Fnielsen, abian, Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-28 Thread Smalyshev
Smalyshev added a comment.
@Lea_Lacroix_WMDE yes, this is possible.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Lea_Lacroix_WMDE, MisterSynergy, ValterVB, gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-26 Thread gerritbot
gerritbot added a comment.
Change 475243 merged by Gehel:
[operations/puppet@production] Enable dumping RDF on test & internal

https://gerrit.wikimedia.org/r/475243TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-22 Thread gerritbot
gerritbot added a comment.
Change 475241 merged by Gehel:
[operations/puppet@production] Enable dumping RDF data for debugging purposes

https://gerrit.wikimedia.org/r/475241TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-21 Thread Smalyshev
Smalyshev added a comment.
timings for the last one:

wdq5: 2018-11-21T23:35:56Z
wdq4: 2018-11-21T23:35:04Z
wdq6: 2018-11-21T23:34:44Z
wdq21: 2018-11-21T23:34:55Z
wdq22: 2018-11-21T23:34:49Z
wdq23: 2018-11-21T23:35:07Z
wdq3: 2018-11-21T23:35:03Z
wdq7: 2018-11-21T23:34:51Z
wdq8: 2018-11-21T23:34:56Z
wdq24: 2018-11-21T23:34:47Z
wdq25: 2018-11-21T23:34:56Z
wdq26: 2018-11-21T23:34:56Z
wdq9: 2018-11-21T23:34:40Z
wdq10: 2018-11-21T23:34:51Z

So wdq5 is actually updating very late compared to other hosts, yet it gets stale data! Very strange.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-21 Thread Smalyshev
Smalyshev added a comment.
I also discover for some items the data is not the latest revision: e.g. for Q57529925 we have all servers except wdq5 on 795730255 but wdq5 on 795729753.  This seems to be related to bursts of robotic edits on the same entry, which may suggest there's some kind of race condition.

Maybe the Wikidata replica Updater is reading from is lagging behind the updates and returns an old revision? Not sure if that is possible. If it's possible to read old data after we've got Kafka message with new data, that would be a lot of trouble for us.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-21 Thread gerritbot
gerritbot added a comment.
Change 475243 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Enable dumping RDF on test & internal

https://gerrit.wikimedia.org/r/475243TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-21 Thread gerritbot
gerritbot added a comment.
Change 475241 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Enable dumping RDF data for debugging purposes

https://gerrit.wikimedia.org/r/475241TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, Graemebp, Gstupp, Tarrow, Jane023, EBjune, Floatingpurr, mhl20, Wikidata-Query-Service, Oravrattas, Lucas_Werkmeister_WMDE, Stashbot, Alexsdutton, Aklapper, Smalyshev, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, merbst, LawExplorer, _jensen, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T210044: Data corruption when loading RDF data into WDQS

2018-11-20 Thread Smalyshev
Smalyshev added a comment.
Timestamps for data updates:

wdq10: 2018-11-20T05:49:19Z
wdq6: 2018-11-20T05:49:25Z
wdq26: 2018-11-20T05:49:31Z
wdq21: 2018-11-20T05:49:32Z
wdq22: 2018-11-20T05:49:32Z
wdq3: 2018-11-20T05:49:35Z
wdq7: 2018-11-20T05:49:40Z

wdq9: 2018-11-20T05:49:39Z

wdq8: 2018-11-20T05:49:35Z
wdq5: 2018-11-20T05:49:39Z
wdq25: 2018-11-20T05:49:39Z
wdq4: 2018-11-20T05:49:48Z
wdq24: 2018-11-20T05:49:53Z
wdq23: 2018-11-20T05:50:13Z

Looks like the group that has the good data was updated earlier than the one that has bad data, even though timestamps overlap a bit. But the groups appear to be distinct. So it's possible something happened in the meantime... Maybe we need to capture updates RDF to see what the servers are seeing.TASK DETAILhttps://phabricator.wikimedia.org/T210044EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Aklapper, Smalyshev, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, D3r1ck01, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs