Hi Tomek, Trying to wrap my head around this… So this is just a thought dump :)
First off, my example of the root document was probably a bad one, as direct root modifications will be rare. The root node will mostly be modified by the background thread. A better example might be a property index’s root. Is that correct? (not that it matters a lot - just for understanding the problem better). I wondered if we could find optimal parameters through tests, i.e. Find the value at which applying the fallback right away is overall cheaper than re-trying bulk updates 3 times. The problem of course is that I imagine this to depend heavily on the write pattern. Related to this: do you have numbers on the performance difference between a) going to fallback directly and b) trying 3 (failing) bulk updates first? My point being: I wonder how much value is in tweaking the exact parameters. Cheers Michael On 15/12/15 14:04, "Tomek Rekawek" <[email protected]> wrote: >Hi Michael, > >The algorithm forgets history after 1h, so yes, it’ll include the root >document again when it has no longer 20 fresh records about failures/successes. > >Let’s assume that there’re 5 bulk operations every minute and root conflicts >in 4 of them: > >12:00 - root failed 5 times (success: 1, failures: 4) >12:01 - root failed 5 times (s: 2, f: 8) >12:02 - root failed 5 times (s: 3, f: 12) >12:03 - root failed 5 times (s: 4, f: 16) > >At this point root won’t be included in the bulk update (as we have 20 samples >with 75% failure rate). At 13:00 we’ll forget about 5 failures from the 12:00. >The history will be to small (15 entries) to make a decision, so the root will >be included again in the bulk update. > > >I thought that there may be cases in which “being a hotspot” is a temporary >condition, that’s why I didn’t want to block documents forever. We can improve >this by increasing history TTL depending on the failure rate. For instance, a >document failing in 100% may be blocked for 3 hours, not just one. > >Also, it’s worth mentioning that a conflicting document doesn’t cause the >whole bulk update to fail. The batch result contains a list of successful and >failed modifications and we’re trying to re-apply only the latter. There are 3 >iterations of the bulk updates and after that there’s a sequential fallback >for the remaining ones. The above algorithm redirects hotspots directly to the >fallback. > >Best regards, >Tomek > >On 15/12/15 12:47, "Michael Marth" <[email protected]> wrote: > >>Hi Tomek, >> >>I like the statistical approach to finding the hotspot documents. >>However, I have a question about the criterion “conflicted in more than 50% >>cases”: >> >>Let’s say root conflicts often (more than 50%). In the proposed algorithm you >>would then remove it from bulk updates. So for the next 1h there would not be >>conflicts on root in bulk updates. But, after that: would the algorithm >>basically start with fresh data, find that there are no conflicts in root and >>therefore re-add it to bulk updates? Meaning that conflicting documents would >>move in and out of bulk updates periodically? >>Or do you envision that removal from bulk updates would be forever, once a >>document is removed? >> >>Michael >> >> >> >> >>On 15/12/15 11:35, "Tomek Rekawek" <[email protected]> wrote: >> >>>Hello, >>> >>>The OAK-2066 contains a number of patches, which finally will lead to use >>>batch insert/update operations available in RDB and Mongo. It’ll increase >>>the performance of applying a commit, especially when we have many small >>>updates of different documents. >>> >>>There are some documents that shouldn’t be included in the batch update, >>>because they are changing too often (like root). Otherwise, they’ll cause a >>>conflict and we need to send another bulk update, containing only failing >>>documents, etc. (detailed description can be found in OAK-3748). It would be >>>good to find such documents, extract them from the bulk operation and update >>>them sequentially, one after another. >>> >>>I prepared OAK-3748, which uses following way to find the hotspots: if the >>>document was included in at least 20 bulk operations during the last 1h and >>>it conflicted in more than 50% cases, it should be extracted from the future >>>bulk updates. The first two constraints makes it self refreshing - after a >>>while the number of bulk operations in which the “blocked" document was >>>included during the last hour will be less than 20 (all constants are >>>configurable). >>> >>>I’d appreciate a feedback, both on the “algorithm” and on the implementation >>>in OAK-3748. >>> >>>Best regards, >>>Tomek >>> >>>-- >>>Tomek Rękawek | Adobe Research | www.adobe.com >>>[email protected] >>> >>> >>>
