[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 Anja Jentzsch anja.jentz...@wikimedia.de changed: What|Removed |Added Summary|Secondary and primary |Secondary and primary |storage go out of sync on |storage go out of sync on |constraint violation in |constraint violation in |secondary storage (8) |secondary storage Whiteboard||storypoints: 8 -- You are receiving this mail because: You are watching all bug changes. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage (8)
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 Anja Jentzsch anja.jentz...@wikimedia.de changed: What|Removed |Added Status|RESOLVED|VERIFIED -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage (8)
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 --- Comment #10 from Anja Jentzsch anja.jentz...@wikimedia.de 2012-11-29 12:37:07 UTC --- Verified in Wikidata demo time for sprint 8 -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage (8)
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 Bug 36431 depends on bug 36519, which changed state. Bug 36519 Summary: Validate data structure https://bugzilla.wikimedia.org/show_bug.cgi?id=36519 What|Old Value |New Value Status|NEW |RESOLVED Resolution||FIXED -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage (8)
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 denny vrandecic denny.vrande...@wikimedia.de changed: What|Removed |Added Summary|Secondary and primary |Secondary and primary |storage go out of sync on |storage go out of sync on |constraint violation in |constraint violation in |secondary storage |secondary storage (8) -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 Daniel Kinzler daniel.kinz...@wikimedia.de changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #9 from Daniel Kinzler daniel.kinz...@wikimedia.de 2012-06-27 20:33:37 UTC --- fixed (fir sitelinks) in https://gerrit.wikimedia.org/r/13119 -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 denny vrandecic denny.vrande...@wikimedia.de changed: What|Removed |Added Status|NEW |ASSIGNED -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 denny vrandecic denny.vrande...@wikimedia.de changed: What|Removed |Added Priority|High|Highest -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 --- Comment #8 from jeb...@gmail.com 2012-06-20 22:44:52 UTC --- There was an erroneous additional save that lead to the race condition. Seems like it is fixed. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 --- Comment #7 from jeb...@gmail.com 2012-06-12 20:19:52 UTC --- Constraint violation in secondary storage, is that solved now? That is, is it possible to close this bug or should it be merged with bug 36519 - Validate data structure? -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 jeb...@gmail.com changed: What|Removed |Added See Also||https://bugzilla.wikimedia. ||org/show_bug.cgi?id=36519 -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 denny vrandecic denny.vrande...@wikimedia.de changed: What|Removed |Added Depends on||36519 -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 --- Comment #4 from Daniel Kinzler daniel.kinz...@wikimedia.de 2012-05-07 12:50:23 UTC --- I agree that we need a Content::isValid() method (or maybe better: getErrors(), returning a Status object or false). Checking this before save is something that would need to be done in core, I guess. (By the way - being able to cache the result of this check would be a reason to have non-mutable Content objects. Caching would be useful since the check may be called multiple times during a save, on different levels of processing) I think doing this kind of check before saving is the first thing we need to do, no matter how we try to enforce database consistency. By itself, it introduces a race condition of course. But one that would rarely be hit. While we should try to avoid inconsistent database states, it would be fine to just show a database error message or some such in case we hit a conflict. As to keeping the database consistent, here's the key points: * checking validity, saving the primary blob and storing secondary data must all happen in the same transaction. * if everything should be consistent, there's no distinction (in this context) between primary and secondary data. Indeed, it would be handy to perform the primary update the same way the secondary updates are performed: using a list of objects representing updates jobs. * we should consider the case of having secondary data on several different database systems. * full ACID conformance always bares the risk of operations blocking for a long time (because of retires in case of communication failure in the commit phase). Eventually, I think it should work this way (in core): * The initial validity check, writing the primary blob and updating the secondary data stores should all be modeled as update objects. * When a page is to be saved, a list of update object is constructed. Each update object can open/commit and abort a transaction, and perform the actual update. * The update is then performed in 3 stages: open all transactions, do all updates, then commit all transactions. It would be very hard to make the commit phase truly atomic. I believe we will have to live with the risk of inconsistencies introduces by connections failing the the middle of a transaction. Anyway, for now, I think it's sufficient to implement a pre-save check without any transactional logic. We should keep the transaction stuff for later. Maybe in a separate ticket. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 --- Comment #5 from Jeroen De Dauw jeroen_ded...@yahoo.com 2012-05-07 13:07:30 UTC --- Yeah, +1, that's basically what I said, but broken down better :) using a list of objects representing updates jobs As long as the jobs get run immediately - would be very bad to have them end up in the jobque and not seeing the changes made right after save. By the way - being able to cache the result of this check would be a reason to have non-mutable Content objects. Disagree. You can easily cache the validity and set it to unknown when a change that can impact it is made to the object. This would be even more effective since you can ignore changes that don't impact it (you'd need to pass $isValid to the constructor in the immutable case, which seems evil), and you don't have all the overhead of constantly creating new instances. ... And it would not work in the first place since you need to check right before doing the save, unless you don't mind making the transaction significantly less atomic :) -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 --- Comment #6 from Daniel Kinzler daniel.kinz...@wikimedia.de 2012-05-07 13:18:35 UTC --- (In reply to comment #5) Yeah, +1, that's basically what I said, but broken down better :) using a list of objects representing updates jobs As long as the jobs get run immediately - would be very bad to have them end up in the jobque and not seeing the changes made right after save. Yes, of course. Such job objects may also be used for deferred updates, but it must be very clear which jobs have to run when. By the way - being able to cache the result of this check would be a reason to have non-mutable Content objects. Disagree. You can easily cache the validity and set it to unknown when a change that can impact it is made to the object. This would be even more effective since you can ignore changes that don't impact it yes, but it's much more error prone and harder to maintain. but I don't insist on imutable objects :) (you'd need to pass $isValid to the constructor in the immutable case, which seems evil), no - immutable only says that isValid() (and all other getters) will always return the same value. Which can be cached internally, but may be initialized lazily. That's of course problematic in cases like this, when isValid() depends on external state (the database). and you don't have all the overhead of constantly creating new instances. We could have a mutator object, analogous to a StringBuffer in java. But whatever. ... And it would not work in the first place since you need to check right before doing the save, unless you don't mind making the transaction significantly less atomic :) Well, if you want to avoid a race condition, yes. Then you can't cache the value anyway. But as I said, I think the race condition is acceptable, as long as it doesn't lead to permanent data corruption or loss. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 Jeroen De Dauw jeroen_ded...@yahoo.com changed: What|Removed |Added CC||jeroen_ded...@yahoo.com --- Comment #2 from Jeroen De Dauw jeroen_ded...@yahoo.com 2012-05-06 15:04:33 UTC --- Two different issues here: * We need to not save invalid data to the page table if it was rejected from the secondary tables. Might be better to have a Content::isValid() method that is called before save (but can also be used at other places, such as preview or whatever) and does all the needed checks (in case of WikibaseItem this includes some reads to the db to see if no duplicate stuff is entered). This method can then remove the invalid data and continue the save or abort it, and also return the encountered issues to the caller. One thing to keep in mind is that any changes to this kind of info should always be written to both the page table and the secondary tables, preferably in the same transaction. * We need some way of dealing with race conditions. If a save happens and we do a read, and it turns out all is well, we must still be sure that no write that creates a previously absent conflict does not happen before the save completes. Not sure how to do this in MediaWiki. Probably needs some investigation. Let's not put all the storage related stuff into a single bug, so I suggest splitting up the second issue into it's own bug, so this one can be closed once the first one has been addressed. check one database and then continue, leaving the remaining databases What databases are you talking about? AFAIK we will only have a single master db for the Wikidata (repo) wiki. This db will hold both the page table and the secondary tables needed for duplication checks, so we won't need to care about any additional stuff we set up for answering queries or other stuff I think. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 --- Comment #3 from jeb...@gmail.com 2012-05-06 15:33:17 UTC --- If secondary storage is within the same database in the repo there should be no problem to use transactions. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 jeb...@gmail.com changed: What|Removed |Added CC||jeb...@gmail.com --- Comment #1 from jeb...@gmail.com 2012-05-06 02:45:39 UTC --- Unless the secondary storage is under direct control by the primary storage it would be hard to avoid any race conflict or getting a large delay. A working solution could be to check one database and then continue, leaving the remaining databases to use rollback for sorting out remaining conflicts. Those will normally be edit conflicts within a short time frame. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36431] Secondary and primary storage go out of sync on constraint violation in secondary storage
https://bugzilla.wikimedia.org/show_bug.cgi?id=36431 Mark A. Hershberger m...@everybody.org changed: What|Removed |Added Priority|Unprioritized |High CC||m...@everybody.org -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l