Re: [ceph-users] object size changing after a pg repair

2016-06-30 Thread Goncalo Borges

Hi Greg

Opened this one

http://tracker.ceph.com/issues/16567

Let us see what they say.

Cheers

G.


On 07/01/2016 04:09 AM, Gregory Farnum wrote:

On Wed, Jun 29, 2016 at 10:50 PM, Goncalo Borges
 wrote:

Hi Shinobu


Sorry probably I don't understand your question properly.
Is what you're worry about that object mapped to specific pg could be 
overwritten on different osds?

Not really. I was worried by seeing object sizes changing on the fly.

I will try to clarify.

We are enabling cephfs for our user community.

My mind was set to a context where data is not changing. I got scared because I 
was not finding plausible for the object to change size and content. I thought 
this was a consequence of a bad repair.

However, thinking it over, if we have some application overwritting the same 
file over and over again (which I think we have), that means that we will see 
the same objects change size and content over time. In cephfs, the name of the 
objects is directly related to the file inode and how it is stripped so the 
object names do not actually change if a file is overwritten. Right?!  So, in 
summary, in this scenario, it is normal for objects to change size and content 
all the time.

That's true...


A consequence of this is that the very fast overwrite of files / objects could 
raise some scrub errors if, by chance, ceph is scrubbing pgs with objects which 
are changing on the fly.

...but that's definitely not the case. Scrubbing and client IO
shouldn't race with each other. If you're seeing size errors, it could
be some minor scrub race, but it's definitely a bug. You should
discuss on irc and/or open a ticket for the RADOS team to look at.
-Greg


--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] object size changing after a pg repair

2016-06-30 Thread Gregory Farnum
On Wed, Jun 29, 2016 at 10:50 PM, Goncalo Borges
 wrote:
> Hi Shinobu
>
>> Sorry probably I don't understand your question properly.
>> Is what you're worry about that object mapped to specific pg could be 
>> overwritten on different osds?
>
> Not really. I was worried by seeing object sizes changing on the fly.
>
> I will try to clarify.
>
> We are enabling cephfs for our user community.
>
> My mind was set to a context where data is not changing. I got scared because 
> I was not finding plausible for the object to change size and content. I 
> thought this was a consequence of a bad repair.
>
> However, thinking it over, if we have some application overwritting the same 
> file over and over again (which I think we have), that means that we will see 
> the same objects change size and content over time. In cephfs, the name of 
> the objects is directly related to the file inode and how it is stripped so 
> the object names do not actually change if a file is overwritten. Right?!  
> So, in summary, in this scenario, it is normal for objects to change size and 
> content all the time.

That's true...

> A consequence of this is that the very fast overwrite of files / objects 
> could raise some scrub errors if, by chance, ceph is scrubbing pgs with 
> objects which are changing on the fly.

...but that's definitely not the case. Scrubbing and client IO
shouldn't race with each other. If you're seeing size errors, it could
be some minor scrub race, but it's definitely a bug. You should
discuss on irc and/or open a ticket for the RADOS team to look at.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] object size changing after a pg repair

2016-06-30 Thread Shinobu Kinjo
Thank you for your clarification.

On Thu, Jun 30, 2016 at 2:50 PM, Goncalo Borges <
goncalo.bor...@sydney.edu.au> wrote:

> Hi Shinobu
>
> > Sorry probably I don't understand your question properly.
> > Is what you're worry about that object mapped to specific pg could be
> overwritten on different osds?
>
> Not really. I was worried by seeing object sizes changing on the fly.
>
> I will try to clarify.
>
> We are enabling cephfs for our user community.
>
> My mind was set to a context where data is not changing. I got scared
> because I was not finding plausible for the object to change size and
> content. I thought this was a consequence of a bad repair.
>

> However, thinking it over, if we have some application overwritting the
> same file over and over again (which I think we have), that means that we
> will see the same objects change size and content over time. In cephfs, the
> name of the objects is directly related to the file inode and how it is
> stripped so the object names do not actually change if a file is
> overwritten. Right?!  So, in summary, in this scenario, it is normal for
> objects to change size and content all the time.
>
> A consequence of this is that the very fast overwrite of files / objects
> could raise some scrub errors if, by chance, ceph is scrubbing pgs with
> objects which are changing on the fly.
>

IMHO in case that situation, clients write operations to rados will be
cancel (maybe `cancel` is not appropriate word in this sentence) until the
full epoch before touching same object.
Since clients must have latest OSD map.

Does it make sense?

Anyway in case I've been missing something, some will add more.


>
> Does this make sense?
> G.
>
>
>
> 
> From: Shinobu Kinjo [shinobu...@gmail.com]
> Sent: 30 June 2016 15:10
> To: Goncalo Borges
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] object size changing after a pg repair
>
> On Thu, Jun 30, 2016 at 1:48 PM, Goncalo Borges <
> goncalo.bor...@sydney.edu.au<mailto:goncalo.bor...@sydney.edu.au>> wrote:
>
> Hi Shinobu
>
> I've run
>
># ceph pg 6.263 query > 6.263query1.txt; sleep 30; ceph pg 6.263 query
> > 6.263query2.txt
>
> and I am sending the full 6.263query1.txt output as well as the results of
>
># diff -Nua  6.263query1.txt  6.263query2.txt
>
> Actually the sizes of the objects are fixed if the data isn't changed,
> right?! I am actually thinking in a scenario where a user application is
> overwriting the same file meaning that the objects will be overwritten all
> the time. If by change scrub runs in one of these objects, there is a
> chance that it finds differences depending on how fast the file / object
> are overwritten and replicated to the different osds.
>
> Sorry probably I don't understand your question properly.
> Is what you're worry about that object mapped to specific pg could be
> overwritten on different osds?
>
>
> Is this possible?
>
> Cheers
> Goncalo
>
>
> # cat 6.263query1.txt
> {
> "state": "active+clean",
> "snap_trimq": "[]",
> "epoch": 1005,
> "up": [
> 56,
> 39,
> 6
> ],
> "acting": [
> 56,
> 39,
> 6
> ],
> "actingbackfill": [
> "6",
> "39",
> "56"
> ],
> "info": {
> "pgid": "6.263",
> "last_update": "1005'2273061",
> "last_complete": "1005'2273061",
> "log_tail": "1005'227",
> "last_user_version": 2273061,
> "last_backfill": "MAX",
> "last_backfill_bitwise": 0,
> "purged_snaps": "[]",
> "history": {
> "epoch_created": 341,
> "last_epoch_started": 996,
> "last_epoch_clean": 996,
> "last_epoch_split": 0,
> "last_epoch_marked_full": 0,
> "same_up_since": 994,
> "same_interval_since": 995,
> "same_primary_since": 995,
> "last_scrub": "1005'2076134",
> "last_scrub_stamp": "2016-06-30 02:13:00.455256",
> "last_deep_scrub": "1005'2076134",
> "last_deep_scrub_stamp": "2016-06-30 02:13:00.455256",
>

Re: [ceph-users] object size changing after a pg repair

2016-06-29 Thread Goncalo Borges
Hi Shinobu

> Sorry probably I don't understand your question properly.
> Is what you're worry about that object mapped to specific pg could be 
> overwritten on different osds?

Not really. I was worried by seeing object sizes changing on the fly. 

I will try to clarify.

We are enabling cephfs for our user community. 

My mind was set to a context where data is not changing. I got scared because I 
was not finding plausible for the object to change size and content. I thought 
this was a consequence of a bad repair.

However, thinking it over, if we have some application overwritting the same 
file over and over again (which I think we have), that means that we will see 
the same objects change size and content over time. In cephfs, the name of the 
objects is directly related to the file inode and how it is stripped so the 
object names do not actually change if a file is overwritten. Right?!  So, in 
summary, in this scenario, it is normal for objects to change size and content 
all the time.

A consequence of this is that the very fast overwrite of files / objects could 
raise some scrub errors if, by chance, ceph is scrubbing pgs with objects which 
are changing on the fly.

Does this make sense?
G.
 

 

From: Shinobu Kinjo [shinobu...@gmail.com]
Sent: 30 June 2016 15:10
To: Goncalo Borges
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] object size changing after a pg repair

On Thu, Jun 30, 2016 at 1:48 PM, Goncalo Borges 
<goncalo.bor...@sydney.edu.au<mailto:goncalo.bor...@sydney.edu.au>> wrote:

Hi Shinobu

I've run

   # ceph pg 6.263 query > 6.263query1.txt; sleep 30; ceph pg 6.263 query > 
6.263query2.txt

and I am sending the full 6.263query1.txt output as well as the results of

   # diff -Nua  6.263query1.txt  6.263query2.txt

Actually the sizes of the objects are fixed if the data isn't changed, right?! 
I am actually thinking in a scenario where a user application is overwriting 
the same file meaning that the objects will be overwritten all the time. If by 
change scrub runs in one of these objects, there is a chance that it finds 
differences depending on how fast the file / object are overwritten and 
replicated to the different osds.

Sorry probably I don't understand your question properly.
Is what you're worry about that object mapped to specific pg could be 
overwritten on different osds?


Is this possible?

Cheers
Goncalo


# cat 6.263query1.txt
{
"state": "active+clean",
"snap_trimq": "[]",
"epoch": 1005,
"up": [
56,
39,
6
],
"acting": [
56,
39,
6
],
"actingbackfill": [
"6",
"39",
"56"
],
"info": {
"pgid": "6.263",
"last_update": "1005'2273061",
"last_complete": "1005'2273061",
"log_tail": "1005'227",
"last_user_version": 2273061,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": "[]",
"history": {
"epoch_created": 341,
"last_epoch_started": 996,
"last_epoch_clean": 996,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 994,
"same_interval_since": 995,
"same_primary_since": 995,
"last_scrub": "1005'2076134",
"last_scrub_stamp": "2016-06-30 02:13:00.455256",
"last_deep_scrub": "1005'2076134",
"last_deep_scrub_stamp": "2016-06-30 02:13:00.455256",
"last_clean_scrub_stamp": "2016-06-30 02:13:00.455256"
},
"stats": {
"version": "1005'2273061",
"reported_seq": "2937682",
"reported_epoch": "1005",
"state": "active+clean",
"last_fresh": "2016-06-30 04:38:13.270047",
"last_change": "2016-06-30 02:13:00.455293",
"last_active": "2016-06-30 04:38:13.270047",
"last_peered": "2016-06-30 04:38:13.270047",
"last_clean": "2016-06-30 04:38:13.270047",
"last_became_active": "2016-06-27 04:57:36.949798",
"last_became_peered": "2016-06-27 04:57:36.949798",
"last_unstale": "2016-06-30 04:38:13.270047"

Re: [ceph-users] object size changing after a pg repair

2016-06-29 Thread Shinobu Kinjo
shallow_scrub_errors": 0,
> "num_deep_scrub_errors": 0,
> "num_objects_recovered": 0,
> "num_bytes_recovered": 0,
> "num_keys_recovered": 0,
> "num_objects_omap": 0,
> "num_objects_hit_set_archive": 0,
> "num_bytes_hit_set_archive": 0,
> "num_flush": 0,
> "num_flush_kb": 0,
> "num_evict": 0,
> "num_evict_kb": 0,
> "num_promote": 0,
> "num_flush_mode_high": 0,
> "num_flush_mode_low": 0,
> "num_evict_mode_some": 0,
> "num_evict_mode_full": 0,
> "num_objects_pinned": 0
> },
> "up": [
> 56,
> 39,
> 6
> ],
> "acting": [
> 56,
> 39,
> 6
> ],
> "blocked_by": [],
> "up_primary": 56,
> "acting_primary": 56
> },
> "empty": 0,
> "dne": 0,
> "incomplete": 0,
> "last_epoch_started": 996,
> "hit_set_history": {
> "current_last_update": "0'0",
> "history": []
> }
> }
> ],
> "recovery_state": [
> {
> "name": "Started\/Primary\/Active",
> "enter_time": "2016-06-27 04:57:36.876639",
> "might_have_unfound": [],
> "recovery_progress": {
> "backfill_targets": [],
> "waiting_on_backfill": [],
> "last_backfill_started": "MIN",
>         "backfill_info": {
> "begin": "MIN",
> "end": "MIN",
> "objects": []
> },
> "peer_backfill_info": [],
> "backfills_in_flight": [],
> "recovering": [],
> "pg_backend": {
> "pull_from_peer": [],
> "pushing": []
> }
> },
> "scrub": {
> "scrubber.epoch_start": "995",
> "scrubber.active": 0,
> "scrubber.state": "INACTIVE",
> "scrubber.start": "MIN",
> "scrubber.end": "MIN",
> "scrubber.subset_last_update": "0'0",
> "scrubber.deep": false,
> "scrubber.seed": 0,
> "scrubber.waiting_on": 0,
> "scrubber.waiting_on_whom": []
> }
> },
> {
> "name": "Started",
> "enter_time": "2016-06-27 04:57:35.828117"
> }
> ],
> "agent_state": {}
> }
>
> --- * ---
>
> # diff -Nua 6.263query1.txt 6.263query2.txt
> --- 6.263query1.txt 2016-06-30 04:38:13.290371200 +
> +++ 6.263query2.txt 2016-06-30 04:38:43.412642932 +
> @@ -19,10 +19,10 @@
>  ],
>  "info": {
>  "pgid": "6.263",
> -"last_update": "1005'2273061",
> -"last_complete": "1005'2273061",
> -"log_tail": "1005'227",
> -"last_user_version": 2273061,
> +"last_update": "1005'2273745",
> +"last_complete": "1005'2273745",
> +"log_tail": "1005'2270700",
> +"last_user_version": 2273745,
>  "last_backfill": "MAX",
>  "last_backfill_bitwise": 0,
>  "purged_snaps": "[]",
> @@ -42,23 +42,23 @@
>  "last_clean_scrub_stamp"

Re: [ceph-users] object size changing after a pg repair

2016-06-29 Thread Goncalo Borges
 "waiting_on_backfill": [],
"last_backfill_started": "MIN",
"backfill_info": {
"begin": "MIN",
"end": "MIN",
"objects": []
},
"peer_backfill_info": [],
"backfills_in_flight": [],
"recovering": [],
"pg_backend": {
"pull_from_peer": [],
"pushing": []
}
},
"scrub": {
"scrubber.epoch_start": "995",
"scrubber.active": 0,
"scrubber.state": "INACTIVE",
"scrubber.start": "MIN",
"scrubber.end": "MIN",
"scrubber.subset_last_update": "0'0",
    "scrubber.deep": false,
        "scrubber.seed": 0,
"scrubber.waiting_on": 0,
"scrubber.waiting_on_whom": []
}
},
{
"name": "Started",
"enter_time": "2016-06-27 04:57:35.828117"
}
],
"agent_state": {}
}

--- * ---

# diff -Nua 6.263query1.txt 6.263query2.txt
--- 6.263query1.txt 2016-06-30 04:38:13.290371200 +
+++ 6.263query2.txt 2016-06-30 04:38:43.412642932 +
@@ -19,10 +19,10 @@
 ],
 "info": {
 "pgid": "6.263",
-"last_update": "1005'2273061",
-"last_complete": "1005'2273061",
-"log_tail": "1005'227",
-"last_user_version": 2273061,
+"last_update": "1005'2273745",
+"last_complete": "1005'2273745",
+"log_tail": "1005'2270700",
+"last_user_version": 2273745,
 "last_backfill": "MAX",
 "last_backfill_bitwise": 0,
 "purged_snaps": "[]",
@@ -42,23 +42,23 @@
 "last_clean_scrub_stamp": "2016-06-30 02:13:00.455256"
 },
 "stats": {
-"version": "1005'2273061",
-"reported_seq": "2937682",
+"version": "1005'2273745",
+"reported_seq": "2938345",
 "reported_epoch": "1005",
 "state": "active+clean",
-"last_fresh": "2016-06-30 04:38:13.270047",
+"last_fresh": "2016-06-30 04:38:43.320788",
 "last_change": "2016-06-30 02:13:00.455293",
-"last_active": "2016-06-30 04:38:13.270047",
-"last_peered": "2016-06-30 04:38:13.270047",
-"last_clean": "2016-06-30 04:38:13.270047",
+"last_active": "2016-06-30 04:38:43.320788",
+"last_peered": "2016-06-30 04:38:43.320788",
+"last_clean": "2016-06-30 04:38:43.320788",
 "last_became_active": "2016-06-27 04:57:36.949798",
 "last_became_peered": "2016-06-27 04:57:36.949798",
-"last_unstale": "2016-06-30 04:38:13.270047",
-"last_undegraded": "2016-06-30 04:38:13.270047",
-"last_fullsized": "2016-06-30 04:38:13.270047",
+"last_unstale": "2016-06-30 04:38:43.320788",
+"last_undegraded": "2016-06-30 04:38:43.320788",
+"last_fullsized": "2016-06-30 04:38:43.320788",
 "mapping_epoch": 994,
-"log_start": "1005'227",
-"ondisk_log_start": "1005'227",
+"log_start": "1005'2270700",
+"ondisk_log_start": "1005'2270700",
 "created": 341,
 "last_epoch_clean": 996,
 "parent": "0.0",
@@ -68,8 +68,8 @@
 "last_deep_scrub": "1005'2076134",
 "last_deep_scrub_stamp": "2016-06-30 02:13:00.455256",
 "last_clean_scrub_stamp": "2016-06-30 02:13:00.455256",
-"log_size": 3061,
-"ondisk_log_size"

Re: [ceph-users] object size changing after a pg repair

2016-06-29 Thread Shinobu Kinjo
What does `ceph pg 6.263 query` show you?


On Thu, Jun 30, 2016 at 12:02 PM, Goncalo Borges <
goncalo.bor...@sydney.edu.au> wrote:

> Dear Cephers...
>
> Today our ceph cluster gave us a couple of scrub errors regarding
> inconsistent pgs. We just upgraded from 9.2.0 to 10.2.2 two days ago.
>
> # ceph health detail
> HEALTH_ERR 2 pgs inconsistent; 2 scrub errors; crush map has legacy
> tunables (require bobtail, min is firefly)
> pg 6.39c is active+clean+inconsistent, acting [2,60,32]
> pg 6.263 is active+clean+inconsistent, acting [56,39,6]
> 2 scrub errors
> crush map has legacy tunables (require bobtail, min is firefly); see
> http://ceph.com/docs/master/rados/operations/crush-map/#tunables
>
> We have started by looking to pg 6.263. Errors were only appearing in
> osd.56 logs but not in others.
>
> # cat  ceph-osd.56.log-20160629 | grep -Hn 'ERR'
> (standard input):8569:2016-06-29 08:09:50.952397 7fd023322700 -1
> log_channel(cluster) log [ERR] : scrub 6.263
> 6:c645f18e:::12a343d.:head on disk size (1836) does not match
> object info size (41242) adjusted for ondisk to (41242)
> (standard input):8602:2016-06-29 08:11:11.227865 7fd023322700 -1
> log_channel(cluster) log [ERR] : 6.263 scrub 1 errors
>
> So, we did a 'ceph pg repair  6.263'.
>
> Eventually, that pg went back to 'active+clean'
>
> # ceph pg dump | grep ^6.263
> dumped all in format plain
> 6.263   10845   0   0   0   0   39592671010 3037
> 3037active+clean2016-06-30 02:13:00.455293  1005'2126237
> 1005:2795768[56,39,6]   56  [56,39,6]   56
> 1005'20761342016-06-30 02:13:00.455256  1005'20761342016-06-30
> 02:13:00.455256
>
> However, in the logs i found
>
> 2016-06-30 02:03:03.992240 osd.56 192.231.127.226:6801/21569 278 :
> cluster [INF] 6.263 repair starts
> 2016-06-30 02:13:00.455237 osd.56 192.231.127.226:6801/21569 279 :
> cluster [INF] 6.263 repair ok, 0 fixed
>
> I did not like the '0 fixed'.
>
> Inspecting a bit more, I found that the object inside the pg in all
> involved osds are changing size. For example in osd.56 (but the same thing
> is true in 39 and 6) I found in consecutive 'ls -l' commands:
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 8602 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> [root@rccephosd8 ceph]# ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 170 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 15436 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 26044 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 0 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 14076 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 31110 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 0 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 20230 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 23392 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> 

[ceph-users] object size changing after a pg repair

2016-06-29 Thread Goncalo Borges
Dear Cephers...

Today our ceph cluster gave us a couple of scrub errors regarding inconsistent 
pgs. We just upgraded from 9.2.0 to 10.2.2 two days ago.

# ceph health detail 
HEALTH_ERR 2 pgs inconsistent; 2 scrub errors; crush map has legacy tunables 
(require bobtail, min is firefly)
pg 6.39c is active+clean+inconsistent, acting [2,60,32]
pg 6.263 is active+clean+inconsistent, acting [56,39,6]
2 scrub errors
crush map has legacy tunables (require bobtail, min is firefly); see 
http://ceph.com/docs/master/rados/operations/crush-map/#tunables

We have started by looking to pg 6.263. Errors were only appearing in osd.56 
logs but not in others.

# cat  ceph-osd.56.log-20160629 | grep -Hn 'ERR' 
(standard input):8569:2016-06-29 08:09:50.952397 7fd023322700 -1 
log_channel(cluster) log [ERR] : scrub 6.263 
6:c645f18e:::12a343d.:head on disk size (1836) does not match 
object info size (41242) adjusted for ondisk to (41242)
(standard input):8602:2016-06-29 08:11:11.227865 7fd023322700 -1 
log_channel(cluster) log [ERR] : 6.263 scrub 1 errors

So, we did a 'ceph pg repair  6.263'.

Eventually, that pg went back to 'active+clean'

# ceph pg dump | grep ^6.263
dumped all in format plain
6.263   10845   0   0   0   0   39592671010 30373037
active+clean2016-06-30 02:13:00.455293  1005'21262371005:2795768
[56,39,6]   56  [56,39,6]   56  1005'20761342016-06-30 
02:13:00.455256  1005'20761342016-06-30 02:13:00.455256

However, in the logs i found

2016-06-30 02:03:03.992240 osd.56 192.231.127.226:6801/21569 278 : cluster 
[INF] 6.263 repair starts
2016-06-30 02:13:00.455237 osd.56 192.231.127.226:6801/21569 279 : cluster 
[INF] 6.263 repair ok, 0 fixed

I did not like the '0 fixed'. 

Inspecting a bit more, I found that the object inside the pg in all involved 
osds are changing size. For example in osd.56 (but the same thing is true in 39 
and 6) I found in consecutive 'ls -l' commands:

# ls -l 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
-rw-r--r-- 1 ceph ceph 8602 Jun 30 02:53 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
[root@rccephosd8 ceph]# ls -l 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
-rw-r--r-- 1 ceph ceph 170 Jun 30 02:53 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6

# ls -l 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
-rw-r--r-- 1 ceph ceph 15436 Jun 30 02:53 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6

# ls -l 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
-rw-r--r-- 1 ceph ceph 26044 Jun 30 02:53 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6

# ls -l 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
-rw-r--r-- 1 ceph ceph 0 Jun 30 02:53 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6

# ls -l 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
-rw-r--r-- 1 ceph ceph 14076 Jun 30 02:53 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6

# ls -l 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
-rw-r--r-- 1 ceph ceph 31110 Jun 30 02:53 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6

# ls -l 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
-rw-r--r-- 1 ceph ceph 0 Jun 30 02:53 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6

# ls -l 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
-rw-r--r-- 1 ceph ceph 20230 Jun 30 02:53 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6

# ls -l 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
-rw-r--r-- 1 ceph ceph 23392 Jun 30 02:53 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6

# ls -l 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
-rw-r--r-- 1 ceph ceph 0 Jun 30 02:53 
/var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6

# ls -l