Re: repair strange behavior

Igor Mon, 23 Apr 2012 02:54:05 -0700

Hi, Aaron

Just sum of total volume for all streams between nodes.

But seems I understand what happened: after repair my column family passover several minor compactions, and during these compactions it createnew tombstones (my CF contain data with TTL, so it can discover and marknew data each time it make minor compaction). As these tombstonesarranged and created differently on each node (sstables have differentsizes and so on, so size-tiered compaction works slightly different) -each subsequent repair discover new ranges to sync.

When I try to run *major* compaction, and then run repair it vent inminutes (against hours) as far as I understand - because after majorcompaction tombstones on all nodes are almost the same.


Does it sounds reasonable?

I'll try to find best strategy to minimize repair streams as I'm afraidof major compactions for other, possible large, CFs.


On 04/23/2012 12:34 PM, aaron morton wrote:

What is strange - when streams for the second repair starts they havethe same or even bigger total volume,

What measure are you using ?

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/04/2012, at 10:16 PM, Igor wrote:

but after repair all nodes should be in sync regardless of whethernew files were compacted or not.

Do you suggest major compaction after repair? I'd like to avoid it.

On 04/22/2012 11:52 AM, Philippe wrote:


Repairs generate new files that then need to be compacted.
Maybe that's where the temporary extra volume comes from?

Le 21 avr. 2012 20:43, "Igor" <i...@4friends.od.ua<mailto:i...@4friends.od.ua>> a écrit :


    Hi

    I can't understand the repair behavior in my case. I have 12
    nodes ring (all 1.0.7):

    10.254.237.2    LA          ADS-LA-1    Up     Normal  50.92 GB
           0.00%   0
    10.254.238.2    TX          TX-24-RACK  Up     Normal  33.29 GB
           0.00%   1
    10.254.236.2    VA          ADS-VA-1    Up     Normal  50.07 GB
           0.00%   2
    10.254.93.2     IL          R1          Up     Normal  49.29 GB
           0.00%   3
    10.253.4.2      AZ          R1          Up     Normal  37.83 GB
           0.00%   5
    10.254.180.2    GB          GB-1        Up     Normal  42.86 GB
           50.00%  85070591730234615865843651857942052863
    10.254.191.2    LA          ADS-LA-1    Up     Normal  47.64 GB
           0.00%   85070591730234615865843651857942052864
    10.254.221.2    TX          TX-24-RACK  Up     Normal  43.42 GB
           0.00%   85070591730234615865843651857942052865
    10.254.217.2    VA          ADS-VA-1    Up     Normal  38.44 GB
           0.00%   85070591730234615865843651857942052866
    10.254.94.2     IL          R1          Up     Normal  49.31 GB
           0.00%   85070591730234615865843651857942052867
    10.253.5.2      AZ          R1          Up     Normal  49.01 GB
           0.00%   85070591730234615865843651857942052869
    10.254.179.2    GB          GB-1        Up     Normal  27.08 GB
           50.00%  170141183460469231731687303715884105727

    I have single keyspace 'meter' and two column families (one
    'ids' is small, and second is bigger). The strange thing
    happened today when I try to run
    "nodetool -h 10.254.180.2 -pr meter ids"
    two times one after another. First repair finished successfully

     INFO 16:33:02,492 [repair
    #db582370-8bba-11e1-0000-5b777f708bff] ids is fully synced
     INFO 16:33:02,526 [repair
    #db582370-8bba-11e1-0000-5b777f708bff] session completed
    successfully

    after moving near 50G of data, and I started second session one
    hour later:

    INFO 17:44:37,842 [repair #aa415d00-8bd9-11e1-0000-5b777f708bff]
    new session: will sync localhost/1
    0.254.180.2, /10.254.221.2 <http://10.254.221.2/>, /10.254.191.2
    <http://10.254.191.2/>, /10.254.217.2 <http://10.254.217.2/>,
    /10.253.5.2 <http://10.253.5.2/>, /10.254.94.2
    <http://10.254.94.2/> on range (5,8507
    0591730234615865843651857942052863] for meter.[ids]

    What is strange - when streams for the second repair starts they
    have the same or even bigger total volume, and I expected that
    second run will move less data (or even no data at all).

    Is it OK? Or should I fix something?

    Thanks!

Re: repair strange behavior

Reply via email to