Re: [ceph-users] object size changing after a pg repair
Hi Greg Opened this one http://tracker.ceph.com/issues/16567 Let us see what they say. Cheers G. On 07/01/2016 04:09 AM, Gregory Farnum wrote: On Wed, Jun 29, 2016 at 10:50 PM, Goncalo Borgeswrote: Hi Shinobu Sorry probably I don't understand your question properly. Is what you're worry about that object mapped to specific pg could be overwritten on different osds? Not really. I was worried by seeing object sizes changing on the fly. I will try to clarify. We are enabling cephfs for our user community. My mind was set to a context where data is not changing. I got scared because I was not finding plausible for the object to change size and content. I thought this was a consequence of a bad repair. However, thinking it over, if we have some application overwritting the same file over and over again (which I think we have), that means that we will see the same objects change size and content over time. In cephfs, the name of the objects is directly related to the file inode and how it is stripped so the object names do not actually change if a file is overwritten. Right?! So, in summary, in this scenario, it is normal for objects to change size and content all the time. That's true... A consequence of this is that the very fast overwrite of files / objects could raise some scrub errors if, by chance, ceph is scrubbing pgs with objects which are changing on the fly. ...but that's definitely not the case. Scrubbing and client IO shouldn't race with each other. If you're seeing size errors, it could be some minor scrub race, but it's definitely a bug. You should discuss on irc and/or open a ticket for the RADOS team to look at. -Greg -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Physics A28 | University of Sydney, NSW 2006 T: +61 2 93511937 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] object size changing after a pg repair
On Wed, Jun 29, 2016 at 10:50 PM, Goncalo Borgeswrote: > Hi Shinobu > >> Sorry probably I don't understand your question properly. >> Is what you're worry about that object mapped to specific pg could be >> overwritten on different osds? > > Not really. I was worried by seeing object sizes changing on the fly. > > I will try to clarify. > > We are enabling cephfs for our user community. > > My mind was set to a context where data is not changing. I got scared because > I was not finding plausible for the object to change size and content. I > thought this was a consequence of a bad repair. > > However, thinking it over, if we have some application overwritting the same > file over and over again (which I think we have), that means that we will see > the same objects change size and content over time. In cephfs, the name of > the objects is directly related to the file inode and how it is stripped so > the object names do not actually change if a file is overwritten. Right?! > So, in summary, in this scenario, it is normal for objects to change size and > content all the time. That's true... > A consequence of this is that the very fast overwrite of files / objects > could raise some scrub errors if, by chance, ceph is scrubbing pgs with > objects which are changing on the fly. ...but that's definitely not the case. Scrubbing and client IO shouldn't race with each other. If you're seeing size errors, it could be some minor scrub race, but it's definitely a bug. You should discuss on irc and/or open a ticket for the RADOS team to look at. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] object size changing after a pg repair
Thank you for your clarification. On Thu, Jun 30, 2016 at 2:50 PM, Goncalo Borges < goncalo.bor...@sydney.edu.au> wrote: > Hi Shinobu > > > Sorry probably I don't understand your question properly. > > Is what you're worry about that object mapped to specific pg could be > overwritten on different osds? > > Not really. I was worried by seeing object sizes changing on the fly. > > I will try to clarify. > > We are enabling cephfs for our user community. > > My mind was set to a context where data is not changing. I got scared > because I was not finding plausible for the object to change size and > content. I thought this was a consequence of a bad repair. > > However, thinking it over, if we have some application overwritting the > same file over and over again (which I think we have), that means that we > will see the same objects change size and content over time. In cephfs, the > name of the objects is directly related to the file inode and how it is > stripped so the object names do not actually change if a file is > overwritten. Right?! So, in summary, in this scenario, it is normal for > objects to change size and content all the time. > > A consequence of this is that the very fast overwrite of files / objects > could raise some scrub errors if, by chance, ceph is scrubbing pgs with > objects which are changing on the fly. > IMHO in case that situation, clients write operations to rados will be cancel (maybe `cancel` is not appropriate word in this sentence) until the full epoch before touching same object. Since clients must have latest OSD map. Does it make sense? Anyway in case I've been missing something, some will add more. > > Does this make sense? > G. > > > > > From: Shinobu Kinjo [shinobu...@gmail.com] > Sent: 30 June 2016 15:10 > To: Goncalo Borges > Cc: ceph-us...@ceph.com > Subject: Re: [ceph-users] object size changing after a pg repair > > On Thu, Jun 30, 2016 at 1:48 PM, Goncalo Borges < > goncalo.bor...@sydney.edu.au<mailto:goncalo.bor...@sydney.edu.au>> wrote: > > Hi Shinobu > > I've run > ># ceph pg 6.263 query > 6.263query1.txt; sleep 30; ceph pg 6.263 query > > 6.263query2.txt > > and I am sending the full 6.263query1.txt output as well as the results of > ># diff -Nua 6.263query1.txt 6.263query2.txt > > Actually the sizes of the objects are fixed if the data isn't changed, > right?! I am actually thinking in a scenario where a user application is > overwriting the same file meaning that the objects will be overwritten all > the time. If by change scrub runs in one of these objects, there is a > chance that it finds differences depending on how fast the file / object > are overwritten and replicated to the different osds. > > Sorry probably I don't understand your question properly. > Is what you're worry about that object mapped to specific pg could be > overwritten on different osds? > > > Is this possible? > > Cheers > Goncalo > > > # cat 6.263query1.txt > { > "state": "active+clean", > "snap_trimq": "[]", > "epoch": 1005, > "up": [ > 56, > 39, > 6 > ], > "acting": [ > 56, > 39, > 6 > ], > "actingbackfill": [ > "6", > "39", > "56" > ], > "info": { > "pgid": "6.263", > "last_update": "1005'2273061", > "last_complete": "1005'2273061", > "log_tail": "1005'227", > "last_user_version": 2273061, > "last_backfill": "MAX", > "last_backfill_bitwise": 0, > "purged_snaps": "[]", > "history": { > "epoch_created": 341, > "last_epoch_started": 996, > "last_epoch_clean": 996, > "last_epoch_split": 0, > "last_epoch_marked_full": 0, > "same_up_since": 994, > "same_interval_since": 995, > "same_primary_since": 995, > "last_scrub": "1005'2076134", > "last_scrub_stamp": "2016-06-30 02:13:00.455256", > "last_deep_scrub": "1005'2076134", > "last_deep_scrub_stamp": "2016-06-30 02:13:00.455256", >
Re: [ceph-users] object size changing after a pg repair
Hi Shinobu > Sorry probably I don't understand your question properly. > Is what you're worry about that object mapped to specific pg could be > overwritten on different osds? Not really. I was worried by seeing object sizes changing on the fly. I will try to clarify. We are enabling cephfs for our user community. My mind was set to a context where data is not changing. I got scared because I was not finding plausible for the object to change size and content. I thought this was a consequence of a bad repair. However, thinking it over, if we have some application overwritting the same file over and over again (which I think we have), that means that we will see the same objects change size and content over time. In cephfs, the name of the objects is directly related to the file inode and how it is stripped so the object names do not actually change if a file is overwritten. Right?! So, in summary, in this scenario, it is normal for objects to change size and content all the time. A consequence of this is that the very fast overwrite of files / objects could raise some scrub errors if, by chance, ceph is scrubbing pgs with objects which are changing on the fly. Does this make sense? G. From: Shinobu Kinjo [shinobu...@gmail.com] Sent: 30 June 2016 15:10 To: Goncalo Borges Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] object size changing after a pg repair On Thu, Jun 30, 2016 at 1:48 PM, Goncalo Borges <goncalo.bor...@sydney.edu.au<mailto:goncalo.bor...@sydney.edu.au>> wrote: Hi Shinobu I've run # ceph pg 6.263 query > 6.263query1.txt; sleep 30; ceph pg 6.263 query > 6.263query2.txt and I am sending the full 6.263query1.txt output as well as the results of # diff -Nua 6.263query1.txt 6.263query2.txt Actually the sizes of the objects are fixed if the data isn't changed, right?! I am actually thinking in a scenario where a user application is overwriting the same file meaning that the objects will be overwritten all the time. If by change scrub runs in one of these objects, there is a chance that it finds differences depending on how fast the file / object are overwritten and replicated to the different osds. Sorry probably I don't understand your question properly. Is what you're worry about that object mapped to specific pg could be overwritten on different osds? Is this possible? Cheers Goncalo # cat 6.263query1.txt { "state": "active+clean", "snap_trimq": "[]", "epoch": 1005, "up": [ 56, 39, 6 ], "acting": [ 56, 39, 6 ], "actingbackfill": [ "6", "39", "56" ], "info": { "pgid": "6.263", "last_update": "1005'2273061", "last_complete": "1005'2273061", "log_tail": "1005'227", "last_user_version": 2273061, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": "[]", "history": { "epoch_created": 341, "last_epoch_started": 996, "last_epoch_clean": 996, "last_epoch_split": 0, "last_epoch_marked_full": 0, "same_up_since": 994, "same_interval_since": 995, "same_primary_since": 995, "last_scrub": "1005'2076134", "last_scrub_stamp": "2016-06-30 02:13:00.455256", "last_deep_scrub": "1005'2076134", "last_deep_scrub_stamp": "2016-06-30 02:13:00.455256", "last_clean_scrub_stamp": "2016-06-30 02:13:00.455256" }, "stats": { "version": "1005'2273061", "reported_seq": "2937682", "reported_epoch": "1005", "state": "active+clean", "last_fresh": "2016-06-30 04:38:13.270047", "last_change": "2016-06-30 02:13:00.455293", "last_active": "2016-06-30 04:38:13.270047", "last_peered": "2016-06-30 04:38:13.270047", "last_clean": "2016-06-30 04:38:13.270047", "last_became_active": "2016-06-27 04:57:36.949798", "last_became_peered": "2016-06-27 04:57:36.949798", "last_unstale": "2016-06-30 04:38:13.270047"
Re: [ceph-users] object size changing after a pg repair
shallow_scrub_errors": 0, > "num_deep_scrub_errors": 0, > "num_objects_recovered": 0, > "num_bytes_recovered": 0, > "num_keys_recovered": 0, > "num_objects_omap": 0, > "num_objects_hit_set_archive": 0, > "num_bytes_hit_set_archive": 0, > "num_flush": 0, > "num_flush_kb": 0, > "num_evict": 0, > "num_evict_kb": 0, > "num_promote": 0, > "num_flush_mode_high": 0, > "num_flush_mode_low": 0, > "num_evict_mode_some": 0, > "num_evict_mode_full": 0, > "num_objects_pinned": 0 > }, > "up": [ > 56, > 39, > 6 > ], > "acting": [ > 56, > 39, > 6 > ], > "blocked_by": [], > "up_primary": 56, > "acting_primary": 56 > }, > "empty": 0, > "dne": 0, > "incomplete": 0, > "last_epoch_started": 996, > "hit_set_history": { > "current_last_update": "0'0", > "history": [] > } > } > ], > "recovery_state": [ > { > "name": "Started\/Primary\/Active", > "enter_time": "2016-06-27 04:57:36.876639", > "might_have_unfound": [], > "recovery_progress": { > "backfill_targets": [], > "waiting_on_backfill": [], > "last_backfill_started": "MIN", > "backfill_info": { > "begin": "MIN", > "end": "MIN", > "objects": [] > }, > "peer_backfill_info": [], > "backfills_in_flight": [], > "recovering": [], > "pg_backend": { > "pull_from_peer": [], > "pushing": [] > } > }, > "scrub": { > "scrubber.epoch_start": "995", > "scrubber.active": 0, > "scrubber.state": "INACTIVE", > "scrubber.start": "MIN", > "scrubber.end": "MIN", > "scrubber.subset_last_update": "0'0", > "scrubber.deep": false, > "scrubber.seed": 0, > "scrubber.waiting_on": 0, > "scrubber.waiting_on_whom": [] > } > }, > { > "name": "Started", > "enter_time": "2016-06-27 04:57:35.828117" > } > ], > "agent_state": {} > } > > --- * --- > > # diff -Nua 6.263query1.txt 6.263query2.txt > --- 6.263query1.txt 2016-06-30 04:38:13.290371200 + > +++ 6.263query2.txt 2016-06-30 04:38:43.412642932 + > @@ -19,10 +19,10 @@ > ], > "info": { > "pgid": "6.263", > -"last_update": "1005'2273061", > -"last_complete": "1005'2273061", > -"log_tail": "1005'227", > -"last_user_version": 2273061, > +"last_update": "1005'2273745", > +"last_complete": "1005'2273745", > +"log_tail": "1005'2270700", > +"last_user_version": 2273745, > "last_backfill": "MAX", > "last_backfill_bitwise": 0, > "purged_snaps": "[]", > @@ -42,23 +42,23 @@ > "last_clean_scrub_stamp"
Re: [ceph-users] object size changing after a pg repair
"waiting_on_backfill": [], "last_backfill_started": "MIN", "backfill_info": { "begin": "MIN", "end": "MIN", "objects": [] }, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "pull_from_peer": [], "pushing": [] } }, "scrub": { "scrubber.epoch_start": "995", "scrubber.active": 0, "scrubber.state": "INACTIVE", "scrubber.start": "MIN", "scrubber.end": "MIN", "scrubber.subset_last_update": "0'0", "scrubber.deep": false, "scrubber.seed": 0, "scrubber.waiting_on": 0, "scrubber.waiting_on_whom": [] } }, { "name": "Started", "enter_time": "2016-06-27 04:57:35.828117" } ], "agent_state": {} } --- * --- # diff -Nua 6.263query1.txt 6.263query2.txt --- 6.263query1.txt 2016-06-30 04:38:13.290371200 + +++ 6.263query2.txt 2016-06-30 04:38:43.412642932 + @@ -19,10 +19,10 @@ ], "info": { "pgid": "6.263", -"last_update": "1005'2273061", -"last_complete": "1005'2273061", -"log_tail": "1005'227", -"last_user_version": 2273061, +"last_update": "1005'2273745", +"last_complete": "1005'2273745", +"log_tail": "1005'2270700", +"last_user_version": 2273745, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": "[]", @@ -42,23 +42,23 @@ "last_clean_scrub_stamp": "2016-06-30 02:13:00.455256" }, "stats": { -"version": "1005'2273061", -"reported_seq": "2937682", +"version": "1005'2273745", +"reported_seq": "2938345", "reported_epoch": "1005", "state": "active+clean", -"last_fresh": "2016-06-30 04:38:13.270047", +"last_fresh": "2016-06-30 04:38:43.320788", "last_change": "2016-06-30 02:13:00.455293", -"last_active": "2016-06-30 04:38:13.270047", -"last_peered": "2016-06-30 04:38:13.270047", -"last_clean": "2016-06-30 04:38:13.270047", +"last_active": "2016-06-30 04:38:43.320788", +"last_peered": "2016-06-30 04:38:43.320788", +"last_clean": "2016-06-30 04:38:43.320788", "last_became_active": "2016-06-27 04:57:36.949798", "last_became_peered": "2016-06-27 04:57:36.949798", -"last_unstale": "2016-06-30 04:38:13.270047", -"last_undegraded": "2016-06-30 04:38:13.270047", -"last_fullsized": "2016-06-30 04:38:13.270047", +"last_unstale": "2016-06-30 04:38:43.320788", +"last_undegraded": "2016-06-30 04:38:43.320788", +"last_fullsized": "2016-06-30 04:38:43.320788", "mapping_epoch": 994, -"log_start": "1005'227", -"ondisk_log_start": "1005'227", +"log_start": "1005'2270700", +"ondisk_log_start": "1005'2270700", "created": 341, "last_epoch_clean": 996, "parent": "0.0", @@ -68,8 +68,8 @@ "last_deep_scrub": "1005'2076134", "last_deep_scrub_stamp": "2016-06-30 02:13:00.455256", "last_clean_scrub_stamp": "2016-06-30 02:13:00.455256", -"log_size": 3061, -"ondisk_log_size"
Re: [ceph-users] object size changing after a pg repair
What does `ceph pg 6.263 query` show you? On Thu, Jun 30, 2016 at 12:02 PM, Goncalo Borges < goncalo.bor...@sydney.edu.au> wrote: > Dear Cephers... > > Today our ceph cluster gave us a couple of scrub errors regarding > inconsistent pgs. We just upgraded from 9.2.0 to 10.2.2 two days ago. > > # ceph health detail > HEALTH_ERR 2 pgs inconsistent; 2 scrub errors; crush map has legacy > tunables (require bobtail, min is firefly) > pg 6.39c is active+clean+inconsistent, acting [2,60,32] > pg 6.263 is active+clean+inconsistent, acting [56,39,6] > 2 scrub errors > crush map has legacy tunables (require bobtail, min is firefly); see > http://ceph.com/docs/master/rados/operations/crush-map/#tunables > > We have started by looking to pg 6.263. Errors were only appearing in > osd.56 logs but not in others. > > # cat ceph-osd.56.log-20160629 | grep -Hn 'ERR' > (standard input):8569:2016-06-29 08:09:50.952397 7fd023322700 -1 > log_channel(cluster) log [ERR] : scrub 6.263 > 6:c645f18e:::12a343d.:head on disk size (1836) does not match > object info size (41242) adjusted for ondisk to (41242) > (standard input):8602:2016-06-29 08:11:11.227865 7fd023322700 -1 > log_channel(cluster) log [ERR] : 6.263 scrub 1 errors > > So, we did a 'ceph pg repair 6.263'. > > Eventually, that pg went back to 'active+clean' > > # ceph pg dump | grep ^6.263 > dumped all in format plain > 6.263 10845 0 0 0 0 39592671010 3037 > 3037active+clean2016-06-30 02:13:00.455293 1005'2126237 > 1005:2795768[56,39,6] 56 [56,39,6] 56 > 1005'20761342016-06-30 02:13:00.455256 1005'20761342016-06-30 > 02:13:00.455256 > > However, in the logs i found > > 2016-06-30 02:03:03.992240 osd.56 192.231.127.226:6801/21569 278 : > cluster [INF] 6.263 repair starts > 2016-06-30 02:13:00.455237 osd.56 192.231.127.226:6801/21569 279 : > cluster [INF] 6.263 repair ok, 0 fixed > > I did not like the '0 fixed'. > > Inspecting a bit more, I found that the object inside the pg in all > involved osds are changing size. For example in osd.56 (but the same thing > is true in 39 and 6) I found in consecutive 'ls -l' commands: > > # ls -l > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > -rw-r--r-- 1 ceph ceph 8602 Jun 30 02:53 > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > [root@rccephosd8 ceph]# ls -l > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > -rw-r--r-- 1 ceph ceph 170 Jun 30 02:53 > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > > # ls -l > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > -rw-r--r-- 1 ceph ceph 15436 Jun 30 02:53 > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > > # ls -l > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > -rw-r--r-- 1 ceph ceph 26044 Jun 30 02:53 > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > > # ls -l > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > -rw-r--r-- 1 ceph ceph 0 Jun 30 02:53 > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > > # ls -l > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > -rw-r--r-- 1 ceph ceph 14076 Jun 30 02:53 > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > > # ls -l > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > -rw-r--r-- 1 ceph ceph 31110 Jun 30 02:53 > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > > # ls -l > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > -rw-r--r-- 1 ceph ceph 0 Jun 30 02:53 > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > > # ls -l > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > -rw-r--r-- 1 ceph ceph 20230 Jun 30 02:53 > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > > # ls -l > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > -rw-r--r-- 1 ceph ceph 23392 Jun 30 02:53 > /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 > > # ls -l >
[ceph-users] object size changing after a pg repair
Dear Cephers... Today our ceph cluster gave us a couple of scrub errors regarding inconsistent pgs. We just upgraded from 9.2.0 to 10.2.2 two days ago. # ceph health detail HEALTH_ERR 2 pgs inconsistent; 2 scrub errors; crush map has legacy tunables (require bobtail, min is firefly) pg 6.39c is active+clean+inconsistent, acting [2,60,32] pg 6.263 is active+clean+inconsistent, acting [56,39,6] 2 scrub errors crush map has legacy tunables (require bobtail, min is firefly); see http://ceph.com/docs/master/rados/operations/crush-map/#tunables We have started by looking to pg 6.263. Errors were only appearing in osd.56 logs but not in others. # cat ceph-osd.56.log-20160629 | grep -Hn 'ERR' (standard input):8569:2016-06-29 08:09:50.952397 7fd023322700 -1 log_channel(cluster) log [ERR] : scrub 6.263 6:c645f18e:::12a343d.:head on disk size (1836) does not match object info size (41242) adjusted for ondisk to (41242) (standard input):8602:2016-06-29 08:11:11.227865 7fd023322700 -1 log_channel(cluster) log [ERR] : 6.263 scrub 1 errors So, we did a 'ceph pg repair 6.263'. Eventually, that pg went back to 'active+clean' # ceph pg dump | grep ^6.263 dumped all in format plain 6.263 10845 0 0 0 0 39592671010 30373037 active+clean2016-06-30 02:13:00.455293 1005'21262371005:2795768 [56,39,6] 56 [56,39,6] 56 1005'20761342016-06-30 02:13:00.455256 1005'20761342016-06-30 02:13:00.455256 However, in the logs i found 2016-06-30 02:03:03.992240 osd.56 192.231.127.226:6801/21569 278 : cluster [INF] 6.263 repair starts 2016-06-30 02:13:00.455237 osd.56 192.231.127.226:6801/21569 279 : cluster [INF] 6.263 repair ok, 0 fixed I did not like the '0 fixed'. Inspecting a bit more, I found that the object inside the pg in all involved osds are changing size. For example in osd.56 (but the same thing is true in 39 and 6) I found in consecutive 'ls -l' commands: # ls -l /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 -rw-r--r-- 1 ceph ceph 8602 Jun 30 02:53 /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 [root@rccephosd8 ceph]# ls -l /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 -rw-r--r-- 1 ceph ceph 170 Jun 30 02:53 /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 # ls -l /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 -rw-r--r-- 1 ceph ceph 15436 Jun 30 02:53 /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 # ls -l /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 -rw-r--r-- 1 ceph ceph 26044 Jun 30 02:53 /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 # ls -l /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 -rw-r--r-- 1 ceph ceph 0 Jun 30 02:53 /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 # ls -l /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 -rw-r--r-- 1 ceph ceph 14076 Jun 30 02:53 /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 # ls -l /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 -rw-r--r-- 1 ceph ceph 31110 Jun 30 02:53 /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 # ls -l /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 -rw-r--r-- 1 ceph ceph 0 Jun 30 02:53 /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 # ls -l /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 -rw-r--r-- 1 ceph ceph 20230 Jun 30 02:53 /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 # ls -l /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 -rw-r--r-- 1 ceph ceph 23392 Jun 30 02:53 /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 # ls -l /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 -rw-r--r-- 1 ceph ceph 0 Jun 30 02:53 /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6 # ls -l