Re: bulk updates heuristics

Tomek Rekawek Tue, 15 Dec 2015 05:06:09 -0800

Hi Michael,

The algorithm forgets history after 1h, so yes, it’ll include the root document 
again when it has no longer 20 fresh records about failures/successes.

Let’s assume that there’re 5 bulk operations every minute and root conflicts in 
4 of them:

12:00 - root failed 5 times (success: 1, failures: 4)
12:01 - root failed 5 times (s: 2, f: 8)
12:02 - root failed 5 times (s: 3, f: 12)
12:03 - root failed 5 times (s: 4, f: 16)

At this point root won’t be included in the bulk update (as we have 20 samples 
with 75% failure rate). At 13:00 we’ll forget about 5 failures from the 12:00. 
The history will be to small (15 entries) to make a decision, so the root will 
be included again in the bulk update.

I thought that there may be cases in which “being a hotspot” is a temporary 
condition, that’s why I didn’t want to block documents forever. We can improve 
this by increasing history TTL depending on the failure rate. For instance, a 
document failing in 100% may be blocked for 3 hours, not just one.

Also, it’s worth mentioning that a conflicting document doesn’t cause the whole 
bulk update to fail. The batch result contains a list of successful and failed 
modifications and we’re trying to re-apply only the latter. There are 3 
iterations of the bulk updates and after that there’s a sequential fallback for 
the remaining ones. The above algorithm redirects hotspots directly to the 
fallback.

Best regards,
Tomek

On 15/12/15 12:47, "Michael Marth" <[email protected]> wrote:

>Hi Tomek,
>
>I like the statistical approach to finding the hotspot documents.
>However, I have a question about the criterion “conflicted in more than 50% 
>cases”:
>
>Let’s say root conflicts often (more than 50%). In the proposed algorithm you 
>would then remove it from bulk updates. So for the next 1h there would not be 
>conflicts on root in bulk updates. But, after that: would the algorithm 
>basically start with fresh data, find that there are no conflicts in root and 
>therefore re-add it to bulk updates? Meaning that conflicting documents would 
>move in and out of bulk updates periodically?
>Or do you envision that removal from bulk updates would be forever, once a 
>document is removed?
>
>Michael
>
>
>
>
>On 15/12/15 11:35, "Tomek Rekawek" <[email protected]> wrote:
>
>>Hello,
>>
>>The OAK-2066 contains a number of patches, which finally will lead to use 
>>batch insert/update operations available in RDB and Mongo. It’ll increase the 
>>performance of applying a commit, especially when we have many small updates 
>>of different documents.
>>
>>There are some documents that shouldn’t be included in the batch update, 
>>because they are changing too often (like root). Otherwise, they’ll cause a 
>>conflict and we need to send another bulk update, containing only failing 
>>documents, etc. (detailed description can be found in OAK-3748). It would be 
>>good to find such documents, extract them from the bulk operation and update 
>>them sequentially, one after another.
>>
>>I prepared OAK-3748, which uses following way to find the hotspots: if the 
>>document was included in at least 20 bulk operations during the last 1h and 
>>it conflicted in more than 50% cases, it should be extracted from the future 
>>bulk updates. The first two constraints makes it self refreshing - after a 
>>while the number of bulk operations in which the “blocked" document was 
>>included during the last hour will be less than 20 (all constants are 
>>configurable).
>>
>>I’d appreciate a feedback, both on the “algorithm” and on the implementation 
>>in OAK-3748.
>>
>>Best regards,
>>Tomek
>>
>>-- 
>>Tomek Rękawek | Adobe Research | www.adobe.com
>>[email protected]
>>
>>
>>

Re: bulk updates heuristics

Reply via email to