Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-19 Thread yu2xiangyang


Hi john,all...


I have been using the patch 
ceph-fuse(http://gitbuilder.ceph.com/ceph-rpm-centos7-x86_64-basic/ref/wip-17270).

Ceph fuse with writing IO won't crash when adding osd .

But fuse-client with reading IO crush when adding osd.


Details log has been attached at http://tracker.ceph.com/issues/17270

Cheers,
xiangyang




At 2016-09-13 18:08:19, "John Spray"  wrote:
>On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang  wrote:
>> Hello everyone,
>>
>> I have met a ceph-fuse crash when i add osd to osd pool.
>>
>> I am writing data through ceph-fuse,then i add one osd to osd pool, after
>> less than 30 s, the ceph-fuse process crash.
>>
>> The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:
>
>I missed this version mismatch until someone pointed it out (thanks Brad)
>
>In theory the newer fuse client should still work with the older OSD,
>but it would be very interesting to know if this issue is still
>reproducible if you use all Jewel packages.
>
>John
>
>>
>> [root@localhost ~]# rpm -qa | grep ceph
>> libcephfs1-10.2.2-0.el7.centos.x86_64
>> python-cephfs-10.2.2-0.el7.centos.x86_64
>> ceph-common-0.94.3-0.el7.x86_64
>> ceph-fuse-10.2.2-0.el7.centos.x86_64
>> ceph-0.94.3-0.el7.x86_64
>> ceph-mds-10.2.2-0.el7.centos.x86_64
>> [root@localhost ~]#
>> [root@localhost ~]#
>> [root@localhost ~]# rpm -qa | grep rados
>> librados2-devel-0.94.3-0.el7.x86_64
>> librados2-0.94.3-0.el7.x86_64
>> libradosstriper1-0.94.3-0.el7.x86_64
>> python-rados-0.94.3-0.el7.x86_64
>>
>> ceph stat:
>>
>> [root@localhost ~]# ceph status
>> cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>>  health HEALTH_WARN
>> clock skew detected on mon.2, mon.0
>> 19 pgs stale
>> 19 pgs stuck stale
>> Monitor clock skew detected
>>  monmap e3: 3 mons at
>> {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
>> election epoch 26, quorum 0,1,2 1,2,0
>>  mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
>>  osdmap e324: 9 osds: 9 up, 9 in
>>   pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
>> 23373 MB used, 68695 MB / 92069 MB avail
>>  301 active+clean
>>   19 stale+active+clean
>>
>> ceph osd stat:
>> [root@localhost ~]# ceph osd dump
>> epoch 324
>> fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
>> created 2016-09-13 11:08:34.629245
>> modified 2016-09-13 16:21:53.285729
>> flags
>> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
>> pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool
>> crash_replay_interval 45 stripe_width 0
>> pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool
>> stripe_width 0
>> max_osd 9
>> osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242
>> last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780
>> 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up
>> 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
>> osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186
>> last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228
>> 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up
>> 3f3ad2fa-52b1-46fd-af6c-05178b814e25
>> osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186
>> last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259
>> 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up
>> 9199193e-9928-4c5d-8adc-2c32a4c8716b
>> osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303
>> last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592
>> 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up
>> 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
>> osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567
>> 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up
>> 0c719e5e-f8fc-46e0-926d-426bf6881ee0
>> osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678
>> 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up
>> 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
>> osd.6 up   in  weight 1 up_from 40 up_thru 313 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6807/15887 10.222.5.162:6808/15887
>> 10.222.5.162:6809/15887 10.222.5.162:6810/15887 exists,up
>> dea24f0f-4666-40af-98af-5ab8d42c37c6
>> osd.7 up   in  weight 1 up_from 45 up_thru 313 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6811/16040 10.222.5.162:6812/16040
>> 10.222.5.162:6813/16040 10.222.5.162:6814/16040 exists,up
>> 0e238745-0091-4790-9b39-c9d36f4ebbee
>> osd.8 up   in  weight 1 up_from 49 up_thru 314 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread yu2xiangyang

I have  tried all Jewel packages and it runs correctly and I think the problem 
is in osdc  at ceph-0.94-3.
There must be some previous commits which solved the problem.

At 2016-09-13 18:08:19, "John Spray"  wrote:
>On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang  wrote:
>> Hello everyone,
>>
>> I have met a ceph-fuse crash when i add osd to osd pool.
>>
>> I am writing data through ceph-fuse,then i add one osd to osd pool, after
>> less than 30 s, the ceph-fuse process crash.
>>
>> The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:
>
>I missed this version mismatch until someone pointed it out (thanks Brad)
>
>In theory the newer fuse client should still work with the older OSD,
>but it would be very interesting to know if this issue is still
>reproducible if you use all Jewel packages.
>
>John
>
>>
>> [root@localhost ~]# rpm -qa | grep ceph
>> libcephfs1-10.2.2-0.el7.centos.x86_64
>> python-cephfs-10.2.2-0.el7.centos.x86_64
>> ceph-common-0.94.3-0.el7.x86_64
>> ceph-fuse-10.2.2-0.el7.centos.x86_64
>> ceph-0.94.3-0.el7.x86_64
>> ceph-mds-10.2.2-0.el7.centos.x86_64
>> [root@localhost ~]#
>> [root@localhost ~]#
>> [root@localhost ~]# rpm -qa | grep rados
>> librados2-devel-0.94.3-0.el7.x86_64
>> librados2-0.94.3-0.el7.x86_64
>> libradosstriper1-0.94.3-0.el7.x86_64
>> python-rados-0.94.3-0.el7.x86_64
>>
>> ceph stat:
>>
>> [root@localhost ~]# ceph status
>> cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>>  health HEALTH_WARN
>> clock skew detected on mon.2, mon.0
>> 19 pgs stale
>> 19 pgs stuck stale
>> Monitor clock skew detected
>>  monmap e3: 3 mons at
>> {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
>> election epoch 26, quorum 0,1,2 1,2,0
>>  mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
>>  osdmap e324: 9 osds: 9 up, 9 in
>>   pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
>> 23373 MB used, 68695 MB / 92069 MB avail
>>  301 active+clean
>>   19 stale+active+clean
>>
>> ceph osd stat:
>> [root@localhost ~]# ceph osd dump
>> epoch 324
>> fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
>> created 2016-09-13 11:08:34.629245
>> modified 2016-09-13 16:21:53.285729
>> flags
>> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
>> pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool
>> crash_replay_interval 45 stripe_width 0
>> pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool
>> stripe_width 0
>> max_osd 9
>> osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242
>> last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780
>> 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up
>> 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
>> osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186
>> last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228
>> 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up
>> 3f3ad2fa-52b1-46fd-af6c-05178b814e25
>> osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186
>> last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259
>> 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up
>> 9199193e-9928-4c5d-8adc-2c32a4c8716b
>> osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303
>> last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592
>> 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up
>> 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
>> osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567
>> 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up
>> 0c719e5e-f8fc-46e0-926d-426bf6881ee0
>> osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678
>> 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up
>> 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
>> osd.6 up   in  weight 1 up_from 40 up_thru 313 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6807/15887 10.222.5.162:6808/15887
>> 10.222.5.162:6809/15887 10.222.5.162:6810/15887 exists,up
>> dea24f0f-4666-40af-98af-5ab8d42c37c6
>> osd.7 up   in  weight 1 up_from 45 up_thru 313 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6811/16040 10.222.5.162:6812/16040
>> 10.222.5.162:6813/16040 10.222.5.162:6814/16040 exists,up
>> 0e238745-0091-4790-9b39-c9d36f4ebbee
>> osd.8 up   in  weight 1 up_from 49 up_thru 314 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6815/16206 10.222.5.162:6816/16206
>> 10.222.5.162:6817/16206 10.222.5.162:6818/16206 exists,up
>> 59637f86-f283-4397-a63b-474976ee8047
>> [root@localhost ~]#
>> [root

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread John Spray
On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang  wrote:
> Hello everyone,
>
> I have met a ceph-fuse crash when i add osd to osd pool.
>
> I am writing data through ceph-fuse,then i add one osd to osd pool, after
> less than 30 s, the ceph-fuse process crash.
>
> The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:

I missed this version mismatch until someone pointed it out (thanks Brad)

In theory the newer fuse client should still work with the older OSD,
but it would be very interesting to know if this issue is still
reproducible if you use all Jewel packages.

John

>
> [root@localhost ~]# rpm -qa | grep ceph
> libcephfs1-10.2.2-0.el7.centos.x86_64
> python-cephfs-10.2.2-0.el7.centos.x86_64
> ceph-common-0.94.3-0.el7.x86_64
> ceph-fuse-10.2.2-0.el7.centos.x86_64
> ceph-0.94.3-0.el7.x86_64
> ceph-mds-10.2.2-0.el7.centos.x86_64
> [root@localhost ~]#
> [root@localhost ~]#
> [root@localhost ~]# rpm -qa | grep rados
> librados2-devel-0.94.3-0.el7.x86_64
> librados2-0.94.3-0.el7.x86_64
> libradosstriper1-0.94.3-0.el7.x86_64
> python-rados-0.94.3-0.el7.x86_64
>
> ceph stat:
>
> [root@localhost ~]# ceph status
> cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>  health HEALTH_WARN
> clock skew detected on mon.2, mon.0
> 19 pgs stale
> 19 pgs stuck stale
> Monitor clock skew detected
>  monmap e3: 3 mons at
> {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
> election epoch 26, quorum 0,1,2 1,2,0
>  mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
>  osdmap e324: 9 osds: 9 up, 9 in
>   pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
> 23373 MB used, 68695 MB / 92069 MB avail
>  301 active+clean
>   19 stale+active+clean
>
> ceph osd stat:
> [root@localhost ~]# ceph osd dump
> epoch 324
> fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
> created 2016-09-13 11:08:34.629245
> modified 2016-09-13 16:21:53.285729
> flags
> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
> pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool
> crash_replay_interval 45 stripe_width 0
> pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool
> stripe_width 0
> max_osd 9
> osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242
> last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780
> 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up
> 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
> osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186
> last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228
> 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up
> 3f3ad2fa-52b1-46fd-af6c-05178b814e25
> osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186
> last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259
> 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up
> 9199193e-9928-4c5d-8adc-2c32a4c8716b
> osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303
> last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592
> 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up
> 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
> osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval
> [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567
> 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up
> 0c719e5e-f8fc-46e0-926d-426bf6881ee0
> osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval
> [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678
> 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up
> 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
> osd.6 up   in  weight 1 up_from 40 up_thru 313 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6807/15887 10.222.5.162:6808/15887
> 10.222.5.162:6809/15887 10.222.5.162:6810/15887 exists,up
> dea24f0f-4666-40af-98af-5ab8d42c37c6
> osd.7 up   in  weight 1 up_from 45 up_thru 313 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6811/16040 10.222.5.162:6812/16040
> 10.222.5.162:6813/16040 10.222.5.162:6814/16040 exists,up
> 0e238745-0091-4790-9b39-c9d36f4ebbee
> osd.8 up   in  weight 1 up_from 49 up_thru 314 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6815/16206 10.222.5.162:6816/16206
> 10.222.5.162:6817/16206 10.222.5.162:6818/16206 exists,up
> 59637f86-f283-4397-a63b-474976ee8047
> [root@localhost ~]#
> [root@localhost ~]# ceph osd tree
> ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 9.0 root default
> -5 3.0 host yxy02
>  1 1.0 osd.1   up  1.0  1.0
>  2 1.0 osd.2   up  1.0  1.0
>  0 1.0 osd.0   up  1.0  1

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread yu2xiangyang
I have submitted the issue at  "http://tracker.ceph.com/issues/17270";.


At 2016-09-13 17:01:09, "John Spray"  wrote:
>On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang  wrote:
>> Hello everyone,
>>
>> I have met a ceph-fuse crash when i add osd to osd pool.
>>
>> I am writing data through ceph-fuse,then i add one osd to osd pool, after
>> less than 30 s, the ceph-fuse process crash.
>
>It looks like this could be an ObjectCacher bug that is only being
>exposed because of an unusual timing caused by the cluster slowing
>down during PG creation.  Was this reproducible or a one-off
>occurence?
>
>Please could you create a ticket on tracker.ceph.com with all this info.
>
>Thanks,
>John
>
>> The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:
>>
>> [root@localhost ~]# rpm -qa | grep ceph
>> libcephfs1-10.2.2-0.el7.centos.x86_64
>> python-cephfs-10.2.2-0.el7.centos.x86_64
>> ceph-common-0.94.3-0.el7.x86_64
>> ceph-fuse-10.2.2-0.el7.centos.x86_64
>> ceph-0.94.3-0.el7.x86_64
>> ceph-mds-10.2.2-0.el7.centos.x86_64
>> [root@localhost ~]#
>> [root@localhost ~]#
>> [root@localhost ~]# rpm -qa | grep rados
>> librados2-devel-0.94.3-0.el7.x86_64
>> librados2-0.94.3-0.el7.x86_64
>> libradosstriper1-0.94.3-0.el7.x86_64
>> python-rados-0.94.3-0.el7.x86_64
>>
>> ceph stat:
>>
>> [root@localhost ~]# ceph status
>> cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>>  health HEALTH_WARN
>> clock skew detected on mon.2, mon.0
>> 19 pgs stale
>> 19 pgs stuck stale
>> Monitor clock skew detected
>>  monmap e3: 3 mons at
>> {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
>> election epoch 26, quorum 0,1,2 1,2,0
>>  mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
>>  osdmap e324: 9 osds: 9 up, 9 in
>>   pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
>> 23373 MB used, 68695 MB / 92069 MB avail
>>  301 active+clean
>>   19 stale+active+clean
>>
>> ceph osd stat:
>> [root@localhost ~]# ceph osd dump
>> epoch 324
>> fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
>> created 2016-09-13 11:08:34.629245
>> modified 2016-09-13 16:21:53.285729
>> flags
>> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
>> pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool
>> crash_replay_interval 45 stripe_width 0
>> pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool
>> stripe_width 0
>> max_osd 9
>> osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242
>> last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780
>> 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up
>> 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
>> osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186
>> last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228
>> 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up
>> 3f3ad2fa-52b1-46fd-af6c-05178b814e25
>> osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186
>> last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259
>> 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up
>> 9199193e-9928-4c5d-8adc-2c32a4c8716b
>> osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303
>> last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592
>> 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up
>> 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
>> osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567
>> 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up
>> 0c719e5e-f8fc-46e0-926d-426bf6881ee0
>> osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678
>> 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up
>> 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
>> osd.6 up   in  weight 1 up_from 40 up_thru 313 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6807/15887 10.222.5.162:6808/15887
>> 10.222.5.162:6809/15887 10.222.5.162:6810/15887 exists,up
>> dea24f0f-4666-40af-98af-5ab8d42c37c6
>> osd.7 up   in  weight 1 up_from 45 up_thru 313 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6811/16040 10.222.5.162:6812/16040
>> 10.222.5.162:6813/16040 10.222.5.162:6814/16040 exists,up
>> 0e238745-0091-4790-9b39-c9d36f4ebbee
>> osd.8 up   in  weight 1 up_from 49 up_thru 314 down_at 0 last_clean_interval
>> [0,0) 10.222.5.162:6815/16206 10.222.5.162:6816/16206
>> 10.222.5.162:6817/16206 10.222.5.162:6818/16206 exists,up
>> 59637f86-f283-4397-a63b-474976ee8047
>> [root@localhost ~]#
>> [root@localhost ~]# ceph osd tree
>> ID WEIGHT  TYPE NAME  UP/DOW

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread yu2xiangyang


This problem was reproducible.


I remove one osd from the osd tree and after one minute, I add the same osd to 
osd pool and then fuse client crush.


Ceph fuse is writing data through smallfile too, and the script is


" python smallfile_cli.py --top /mnt/test --threads 8 --files 20 --file-size 
10240 --record-size 512 --operation create"


and my remove osd steps are:


1. kill -9 $pid_num 2. ceph osd out $id 3. ceph osd down $id 4. ceph osd crush 
remove osd.$id 5. ceph auth del osd.$id 6. ceph osd rm osd.$id


and my add osd steps are:


1. mkfs.xfs and remount my the osd remove..


2. ceph osd create 3. ceph-osd -i $id --mkfs --osd-data=/data/osd/osd.$id 
--mkkey 4. ceph auth add osd.$id osd 'allow *' mon 'allow rwx' -i 
/data/osd/osd.$id/keyring 5. ceph osd crush create-or-move osd.$id 1.0 
host= 6. ceph-osd -i $id






At 2016-09-13 17:01:09, "John Spray"  wrote:
>On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang  wrote:
>> Hello everyone, 
>>
>> I have met a ceph-fuse crash when i add osd to osd pool.
>>
>> I am writing data through ceph-fuse,then i add one osd to osd pool, after
>> less than 30 s, the ceph-fuse process crash.
>
>It looks like this could be an ObjectCacher bug that is only being
>exposed because of an unusual timing caused by the cluster slowing
>down during PG creation.  Was this reproducible or a one-off
>occurence?
>
>Please could you create a ticket on tracker.ceph.com with all this info.
>
>Thanks,
>John
>
>> The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:
>>
>> [root@localhost ~]# rpm -qa | grep ceph
>> libcephfs1-10.2.2-0.el7.centos.x86_64
>> python-cephfs-10.2.2-0.el7.centos.x86_64
>> ceph-common-0.94.3-0.el7.x86_64
>> ceph-fuse-10.2.2-0.el7.centos.x86_64
>> ceph-0.94.3-0.el7.x86_64
>> ceph-mds-10.2.2-0.el7.centos.x86_64
>> [root@localhost ~]#
>> [root@localhost ~]#
>> [root@localhost ~]# rpm -qa | grep rados
>> librados2-devel-0.94.3-0.el7.x86_64
>> librados2-0.94.3-0.el7.x86_64
>> libradosstriper1-0.94.3-0.el7.x86_64
>> python-rados-0.94.3-0.el7.x86_64
>>
>> ceph stat:
>>
>> [root@localhost ~]# ceph status
>> cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>>  health HEALTH_WARN
>> clock skew detected on mon.2, mon.0
>> 19 pgs stale
>> 19 pgs stuck stale
>> Monitor clock skew detected
>>  monmap e3: 3 mons at
>> {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
>> election epoch 26, quorum 0,1,2 1,2,0
>>  mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
>>  osdmap e324: 9 osds: 9 up, 9 in
>>   pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
>> 23373 MB used, 68695 MB / 92069 MB avail
>>  301 active+clean
>>   19 stale+active+clean
>>
>> ceph osd stat:
>> [root@localhost ~]# ceph osd dump
>> epoch 324
>> fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
>> created 2016-09-13 11:08:34.629245
>> modified 2016-09-13 16:21:53.285729
>> flags
>> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
>> pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool
>> crash_replay_interval 45 stripe_width 0
>> pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool
>> stripe_width 0
>> max_osd 9
>> osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242
>> last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780
>> 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up
>> 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
>> osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186
>> last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228
>> 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up
>> 3f3ad2fa-52b1-46fd-af6c-05178b814e25
>> osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186
>> last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259
>> 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up
>> 9199193e-9928-4c5d-8adc-2c32a4c8716b
>> osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303
>> last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592
>> 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up
>> 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
>> osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567
>> 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up
>> 0c719e5e-f8fc-46e0-926d-426bf6881ee0
>> osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval
>> [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678
>> 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up
>> 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
>> osd.6 up   in  weight 1 up_from 4

Re: [ceph-users] [cephfs] fuse client crash when adding a new osd

2016-09-13 Thread John Spray
On Tue, Sep 13, 2016 at 2:12 PM, yu2xiangyang  wrote:
> Hello everyone,
>
> I have met a ceph-fuse crash when i add osd to osd pool.
>
> I am writing data through ceph-fuse,then i add one osd to osd pool, after
> less than 30 s, the ceph-fuse process crash.

It looks like this could be an ObjectCacher bug that is only being
exposed because of an unusual timing caused by the cluster slowing
down during PG creation.  Was this reproducible or a one-off
occurence?

Please could you create a ticket on tracker.ceph.com with all this info.

Thanks,
John

> The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:
>
> [root@localhost ~]# rpm -qa | grep ceph
> libcephfs1-10.2.2-0.el7.centos.x86_64
> python-cephfs-10.2.2-0.el7.centos.x86_64
> ceph-common-0.94.3-0.el7.x86_64
> ceph-fuse-10.2.2-0.el7.centos.x86_64
> ceph-0.94.3-0.el7.x86_64
> ceph-mds-10.2.2-0.el7.centos.x86_64
> [root@localhost ~]#
> [root@localhost ~]#
> [root@localhost ~]# rpm -qa | grep rados
> librados2-devel-0.94.3-0.el7.x86_64
> librados2-0.94.3-0.el7.x86_64
> libradosstriper1-0.94.3-0.el7.x86_64
> python-rados-0.94.3-0.el7.x86_64
>
> ceph stat:
>
> [root@localhost ~]# ceph status
> cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
>  health HEALTH_WARN
> clock skew detected on mon.2, mon.0
> 19 pgs stale
> 19 pgs stuck stale
> Monitor clock skew detected
>  monmap e3: 3 mons at
> {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
> election epoch 26, quorum 0,1,2 1,2,0
>  mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
>  osdmap e324: 9 osds: 9 up, 9 in
>   pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
> 23373 MB used, 68695 MB / 92069 MB avail
>  301 active+clean
>   19 stale+active+clean
>
> ceph osd stat:
> [root@localhost ~]# ceph osd dump
> epoch 324
> fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
> created 2016-09-13 11:08:34.629245
> modified 2016-09-13 16:21:53.285729
> flags
> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
> pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool
> crash_replay_interval 45 stripe_width 0
> pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool
> stripe_width 0
> max_osd 9
> osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242
> last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780
> 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up
> 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
> osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186
> last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228
> 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up
> 3f3ad2fa-52b1-46fd-af6c-05178b814e25
> osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186
> last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259
> 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up
> 9199193e-9928-4c5d-8adc-2c32a4c8716b
> osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303
> last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592
> 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up
> 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
> osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval
> [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567
> 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up
> 0c719e5e-f8fc-46e0-926d-426bf6881ee0
> osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval
> [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678
> 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up
> 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
> osd.6 up   in  weight 1 up_from 40 up_thru 313 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6807/15887 10.222.5.162:6808/15887
> 10.222.5.162:6809/15887 10.222.5.162:6810/15887 exists,up
> dea24f0f-4666-40af-98af-5ab8d42c37c6
> osd.7 up   in  weight 1 up_from 45 up_thru 313 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6811/16040 10.222.5.162:6812/16040
> 10.222.5.162:6813/16040 10.222.5.162:6814/16040 exists,up
> 0e238745-0091-4790-9b39-c9d36f4ebbee
> osd.8 up   in  weight 1 up_from 49 up_thru 314 down_at 0 last_clean_interval
> [0,0) 10.222.5.162:6815/16206 10.222.5.162:6816/16206
> 10.222.5.162:6817/16206 10.222.5.162:6818/16206 exists,up
> 59637f86-f283-4397-a63b-474976ee8047
> [root@localhost ~]#
> [root@localhost ~]# ceph osd tree
> ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 9.0 root default
> -5 3.0 host yxy02
>  1 1.0 osd.1   up  1.0  1.0
>  2 1.0 osd.2   up  1.0  1.0
>  0 1.0 o