Re: Repairing without -pr shows unexpected out-of-sync ranges
> is (2) a direct consequence of a repair on the full token range (and thus anti-compaction ran only on a subset of the RF nodes)? Not necessarily, because even with -pr enabled the nodes will be responsible for different ranges, so they will flush and compact at different instants. The effect of this on long running repairs is that data that was marked as repaired in one replica, may be compacted in some other replica, causing it to not be marked as repaired due to CASSANDRA-9143, what will cause a mismatch in the next repair. This could probably be alleviated by CASSANDRA-6696. 2016-10-03 12:16 GMT-03:00 Stefano Ortolani : > I was wondering: is (2) a direct consequence of a repair on the full > token range (and thus anti-compaction ran only on a subset of the RF > nodes)?. If I understand correctly, a repair with -pr should fix this, > at the cost of all nodes performing the anticompaction phase? > > Cheers, > Stefano > > On Tue, Sep 27, 2016 at 4:09 PM, Stefano Ortolani > wrote: > > Didn't know about (2), and I actually have a time drift between the > nodes. > > Thanks a lot Paulo! > > > > Regards, > > Stefano > > > > On Thu, Sep 22, 2016 at 6:36 PM, Paulo Motta > > wrote: > >> > >> There are a couple of things that could be happening here: > >> - There will be time differences between when nodes participating repair > >> flush, so in write-heavy tables there will always be minor differences > >> during validation, and those could be accentuated by low resolution > merkle > >> trees, which will affect mostly larger tables. > >> - SSTables compacted during incremental repair will not be marked as > >> repaired, so nodes with different compaction cadences will have > different > >> data in their unrepaired set, what will cause mismatches in the > subsequent > >> incremental repairs. CASSANDRA-9143 will hopefully fix that limitation. > >> > >> 2016-09-22 7:10 GMT-03:00 Stefano Ortolani : > >>> > >>> Hi, > >>> > >>> I am seeing something weird while running repairs. > >>> I am testing 3.0.9 so I am running the repairs manually, node after > node, > >>> on a cluster with RF=3. I am using a standard repair command > (incremental, > >>> parallel, full range), and I just noticed that the third node detected > some > >>> ranges out of sync with one of the nodes that just finished repairing. > >>> > >>> Since there was no dropped mutation, that sounds weird to me > considering > >>> that the repairs are supposed to operate on the whole range. > >>> > >>> Any idea why? > >>> Maybe I am missing something? > >>> > >>> Cheers, > >>> Stefano > >>> > >> > > >
Re: Repairing without -pr shows unexpected out-of-sync ranges
I was wondering: is (2) a direct consequence of a repair on the full token range (and thus anti-compaction ran only on a subset of the RF nodes)?. If I understand correctly, a repair with -pr should fix this, at the cost of all nodes performing the anticompaction phase? Cheers, Stefano On Tue, Sep 27, 2016 at 4:09 PM, Stefano Ortolani wrote: > Didn't know about (2), and I actually have a time drift between the nodes. > Thanks a lot Paulo! > > Regards, > Stefano > > On Thu, Sep 22, 2016 at 6:36 PM, Paulo Motta > wrote: >> >> There are a couple of things that could be happening here: >> - There will be time differences between when nodes participating repair >> flush, so in write-heavy tables there will always be minor differences >> during validation, and those could be accentuated by low resolution merkle >> trees, which will affect mostly larger tables. >> - SSTables compacted during incremental repair will not be marked as >> repaired, so nodes with different compaction cadences will have different >> data in their unrepaired set, what will cause mismatches in the subsequent >> incremental repairs. CASSANDRA-9143 will hopefully fix that limitation. >> >> 2016-09-22 7:10 GMT-03:00 Stefano Ortolani : >>> >>> Hi, >>> >>> I am seeing something weird while running repairs. >>> I am testing 3.0.9 so I am running the repairs manually, node after node, >>> on a cluster with RF=3. I am using a standard repair command (incremental, >>> parallel, full range), and I just noticed that the third node detected some >>> ranges out of sync with one of the nodes that just finished repairing. >>> >>> Since there was no dropped mutation, that sounds weird to me considering >>> that the repairs are supposed to operate on the whole range. >>> >>> Any idea why? >>> Maybe I am missing something? >>> >>> Cheers, >>> Stefano >>> >> >
Re: Repairing without -pr shows unexpected out-of-sync ranges
Didn't know about (2), and I actually have a time drift between the nodes. Thanks a lot Paulo! Regards, Stefano On Thu, Sep 22, 2016 at 6:36 PM, Paulo Motta wrote: > There are a couple of things that could be happening here: > - There will be time differences between when nodes participating repair > flush, so in write-heavy tables there will always be minor differences > during validation, and those could be accentuated by low resolution merkle > trees, which will affect mostly larger tables. > - SSTables compacted during incremental repair will not be marked as > repaired, so nodes with different compaction cadences will have different > data in their unrepaired set, what will cause mismatches in the subsequent > incremental repairs. CASSANDRA-9143 will hopefully fix that limitation. > > 2016-09-22 7:10 GMT-03:00 Stefano Ortolani : > >> Hi, >> >> I am seeing something weird while running repairs. >> I am testing 3.0.9 so I am running the repairs manually, node after node, >> on a cluster with RF=3. I am using a standard repair command (incremental, >> parallel, full range), and I just noticed that the third node detected some >> ranges out of sync with one of the nodes that just finished repairing. >> >> Since there was no dropped mutation, that sounds weird to me considering >> that the repairs are supposed to operate on the whole range. >> >> Any idea why? >> Maybe I am missing something? >> >> Cheers, >> Stefano >> >> >
Re: Repairing without -pr shows unexpected out-of-sync ranges
There are a couple of things that could be happening here: - There will be time differences between when nodes participating repair flush, so in write-heavy tables there will always be minor differences during validation, and those could be accentuated by low resolution merkle trees, which will affect mostly larger tables. - SSTables compacted during incremental repair will not be marked as repaired, so nodes with different compaction cadences will have different data in their unrepaired set, what will cause mismatches in the subsequent incremental repairs. CASSANDRA-9143 will hopefully fix that limitation. 2016-09-22 7:10 GMT-03:00 Stefano Ortolani : > Hi, > > I am seeing something weird while running repairs. > I am testing 3.0.9 so I am running the repairs manually, node after node, > on a cluster with RF=3. I am using a standard repair command (incremental, > parallel, full range), and I just noticed that the third node detected some > ranges out of sync with one of the nodes that just finished repairing. > > Since there was no dropped mutation, that sounds weird to me considering > that the repairs are supposed to operate on the whole range. > > Any idea why? > Maybe I am missing something? > > Cheers, > Stefano > >
Repairing without -pr shows unexpected out-of-sync ranges
Hi, I am seeing something weird while running repairs. I am testing 3.0.9 so I am running the repairs manually, node after node, on a cluster with RF=3. I am using a standard repair command (incremental, parallel, full range), and I just noticed that the third node detected some ranges out of sync with one of the nodes that just finished repairing. Since there was no dropped mutation, that sounds weird to me considering that the repairs are supposed to operate on the whole range. Any idea why? Maybe I am missing something? Cheers, Stefano