Re: [ceph-users] MDS in read-only mode

2016-08-08 Thread Dmitriy Lysenko
08.08.2016 13:51, Wido den Hollander пишет:
> 
>> Op 8 augustus 2016 om 12:49 schreef John Spray :
>>
>>
>> On Mon, Aug 8, 2016 at 9:26 AM, Dmitriy Lysenko  wrote:
>>> Good day.
>>>
>>> My CephFS switched to read only
>>> This problem was previously on Hammer, but i recreated cephfs, upgraded to 
>>> Jewel and problem was solved, but appeared after some time.
>>>
>>> ceph.log
>>> 2016-08-07 18:11:31.226960 mon.0 192.168.13.100:6789/0 148601 : cluster 
>>> [INF] HEALTH_WARN; mds0: MDS in read-only mode
>>>
>>> ceph-mds.log:
>>> 2016-08-07 18:10:58.699731 7f9fa2ba6700  1 mds.0.cache.dir(1000afe) 
>>> commit error -22 v 1
>>> 2016-08-07 18:10:58.699755 7f9fa2ba6700 -1 log_channel(cluster) log [ERR] : 
>>> failed to commit dir 1000afe object, errno -22
>>> 2016-08-07 18:10:58.699763 7f9fa2ba6700 -1 mds.0.2271 unhandled write error 
>>> (22) Invalid argument, force readonly...
>>> 2016-08-07 18:10:58.699773 7f9fa2ba6700  1 mds.0.cache force file system 
>>> read-only
>>> 2016-08-07 18:10:58.699777 7f9fa2ba6700  0 log_channel(cluster) log [WRN] : 
>>> force file system read-only
>>
>> The MDS is going read only because it received an error (22, aka
>> EINVAL) from an OSD when trying to write a metadata object.  You need
>> to investigate why the error occurred.  Are your OSDs using the same
>> Ceph version as your MDS?  Look in the OSD logs for the time at which
>> the error happened to see if there is more detail about why.
>>

All OSDs using same version:

# ceph tell osd.* version
osd.0: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.1: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.2: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.3: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.4: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.5: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.6: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.7: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.8: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.9: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.10: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.11: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.12: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}
osd.14: {
"version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
}

I did not find any errors in osd logs from 18:00 to 19:00 (included in this 
message)

> 
> You might want to add this to the mds config:
> 
> debug_rados = 20
> 
> That should show you which RADOS operations it is performing and you can also 
> figure out which one failed.
> 
> Like John said, might be a issue with a specific OSD.
> 
> Wido
I added debug_rados into [mds] section in ceph.conf
But i've already fixed error by 
http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/:

cephfs-journal-tool event recover_dentries summary
cephfs-table-tool all reset session
cephfs-journal-tool journal reset
cephfs-data-scan init
cephfs-data-scan scan_extents data
cephfs-data-scan scan_inodes data


> 
>> The readonly flag will clear if you restart your MDS (but it will get
>> set again if it keeps encountering errors writing to OSDs)
>>
>> John
>>
>>> I founded this object:
>>> $ rados --pool metadata ls | grep 1000afe
>>> 1000afe.
>>>
>>> and successfully got it:
>>> $ rados --pool metadata get 1000afe. obj
>>> $ echo $?
>>> 0
>>>
>>> How to switchout MDS from readonly mode?
>>> Are there any tools to test the CephFS system for errors?
>>>
>>> $ ceph -v
>>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>>
>>> $ ceph fs ls
>>> name: cephfs, metadata pool: metadata, data pools: [data ]
>>>
>>> $ ceph mds stat
>>> e2283: 1/1/1 up {0=drop-03=up:active}, 3 up:standby
>>>
>>> $ ceph osd lspools
>>> 0 data,1 metadata,6 one,
>>>
>>> $ ceph osd dump | grep 'replicated size'
>>> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
>>> rjenkins pg_num 256 pgp_num 256 last_change 45647 crash_replay_interval 45 
>>> min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width >>> 0
>>> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash 
>>> rjenkins pg_num 256 pgp_num 256 last_change 45649 
>>> min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width >>> 0
>>> pool 6 'one' replicated size 3 min_size 1 crush_ruleset 0 object_hash 
>>> rjenkins pg_num 512 pgp_num 512 last_change 53462 flags hashpspool 
>>> 

Re: [ceph-users] MDS in read-only mode

2016-08-08 Thread Wido den Hollander

> Op 8 augustus 2016 om 12:49 schreef John Spray :
> 
> 
> On Mon, Aug 8, 2016 at 9:26 AM, Dmitriy Lysenko  wrote:
> > Good day.
> >
> > My CephFS switched to read only
> > This problem was previously on Hammer, but i recreated cephfs, upgraded to 
> > Jewel and problem was solved, but appeared after some time.
> >
> > ceph.log
> > 2016-08-07 18:11:31.226960 mon.0 192.168.13.100:6789/0 148601 : cluster 
> > [INF] HEALTH_WARN; mds0: MDS in read-only mode
> >
> > ceph-mds.log:
> > 2016-08-07 18:10:58.699731 7f9fa2ba6700  1 mds.0.cache.dir(1000afe) 
> > commit error -22 v 1
> > 2016-08-07 18:10:58.699755 7f9fa2ba6700 -1 log_channel(cluster) log [ERR] : 
> > failed to commit dir 1000afe object, errno -22
> > 2016-08-07 18:10:58.699763 7f9fa2ba6700 -1 mds.0.2271 unhandled write error 
> > (22) Invalid argument, force readonly...
> > 2016-08-07 18:10:58.699773 7f9fa2ba6700  1 mds.0.cache force file system 
> > read-only
> > 2016-08-07 18:10:58.699777 7f9fa2ba6700  0 log_channel(cluster) log [WRN] : 
> > force file system read-only
> 
> The MDS is going read only because it received an error (22, aka
> EINVAL) from an OSD when trying to write a metadata object.  You need
> to investigate why the error occurred.  Are your OSDs using the same
> Ceph version as your MDS?  Look in the OSD logs for the time at which
> the error happened to see if there is more detail about why.
> 

You might want to add this to the mds config:

debug_rados = 20

That should show you which RADOS operations it is performing and you can also 
figure out which one failed.

Like John said, might be a issue with a specific OSD.

Wido

> The readonly flag will clear if you restart your MDS (but it will get
> set again if it keeps encountering errors writing to OSDs)
> 
> John
> 
> > I founded this object:
> > $ rados --pool metadata ls | grep 1000afe
> > 1000afe.
> >
> > and successfully got it:
> > $ rados --pool metadata get 1000afe. obj
> > $ echo $?
> > 0
> >
> > How to switchout MDS from readonly mode?
> > Are there any tools to test the CephFS system for errors?
> >
> > $ ceph -v
> > ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
> >
> > $ ceph fs ls
> > name: cephfs, metadata pool: metadata, data pools: [data ]
> >
> > $ ceph mds stat
> > e2283: 1/1/1 up {0=drop-03=up:active}, 3 up:standby
> >
> > $ ceph osd lspools
> > 0 data,1 metadata,6 one,
> >
> > $ ceph osd dump | grep 'replicated size'
> > pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
> > rjenkins pg_num 256 pgp_num 256 last_change 45647 crash_replay_interval 45 
> > min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width > > 0
> > pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash 
> > rjenkins pg_num 256 pgp_num 256 last_change 45649 
> > min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width > > 0
> > pool 6 'one' replicated size 3 min_size 1 crush_ruleset 0 object_hash 
> > rjenkins pg_num 512 pgp_num 512 last_change 53462 flags hashpspool 
> > min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width > > 0
> >
> >
> > Thank you for help.
> >
> > --
> > Dmitry Lysenko
> > ISP Sovtest, Kursk, Russia
> > jabber: t...@jabber.sovtest.ru
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS in read-only mode

2016-08-08 Thread John Spray
On Mon, Aug 8, 2016 at 9:26 AM, Dmitriy Lysenko  wrote:
> Good day.
>
> My CephFS switched to read only
> This problem was previously on Hammer, but i recreated cephfs, upgraded to 
> Jewel and problem was solved, but appeared after some time.
>
> ceph.log
> 2016-08-07 18:11:31.226960 mon.0 192.168.13.100:6789/0 148601 : cluster [INF] 
> HEALTH_WARN; mds0: MDS in read-only mode
>
> ceph-mds.log:
> 2016-08-07 18:10:58.699731 7f9fa2ba6700  1 mds.0.cache.dir(1000afe) 
> commit error -22 v 1
> 2016-08-07 18:10:58.699755 7f9fa2ba6700 -1 log_channel(cluster) log [ERR] : 
> failed to commit dir 1000afe object, errno -22
> 2016-08-07 18:10:58.699763 7f9fa2ba6700 -1 mds.0.2271 unhandled write error 
> (22) Invalid argument, force readonly...
> 2016-08-07 18:10:58.699773 7f9fa2ba6700  1 mds.0.cache force file system 
> read-only
> 2016-08-07 18:10:58.699777 7f9fa2ba6700  0 log_channel(cluster) log [WRN] : 
> force file system read-only

The MDS is going read only because it received an error (22, aka
EINVAL) from an OSD when trying to write a metadata object.  You need
to investigate why the error occurred.  Are your OSDs using the same
Ceph version as your MDS?  Look in the OSD logs for the time at which
the error happened to see if there is more detail about why.

The readonly flag will clear if you restart your MDS (but it will get
set again if it keeps encountering errors writing to OSDs)

John

> I founded this object:
> $ rados --pool metadata ls | grep 1000afe
> 1000afe.
>
> and successfully got it:
> $ rados --pool metadata get 1000afe. obj
> $ echo $?
> 0
>
> How to switchout MDS from readonly mode?
> Are there any tools to test the CephFS system for errors?
>
> $ ceph -v
> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>
> $ ceph fs ls
> name: cephfs, metadata pool: metadata, data pools: [data ]
>
> $ ceph mds stat
> e2283: 1/1/1 up {0=drop-03=up:active}, 3 up:standby
>
> $ ceph osd lspools
> 0 data,1 metadata,6 one,
>
> $ ceph osd dump | grep 'replicated size'
> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
> rjenkins pg_num 256 pgp_num 256 last_change 45647 crash_replay_interval 45 
> min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash 
> rjenkins pg_num 256 pgp_num 256 last_change 45649 
> min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
> pool 6 'one' replicated size 3 min_size 1 crush_ruleset 0 object_hash 
> rjenkins pg_num 512 pgp_num 512 last_change 53462 flags hashpspool 
> min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
>
>
> Thank you for help.
>
> --
> Dmitry Lysenko
> ISP Sovtest, Kursk, Russia
> jabber: t...@jabber.sovtest.ru
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS in read-only mode

2016-08-08 Thread Dmitriy Lysenko
Good day.

My CephFS switched to read only
This problem was previously on Hammer, but i recreated cephfs, upgraded to 
Jewel and problem was solved, but appeared after some time.

ceph.log
2016-08-07 18:11:31.226960 mon.0 192.168.13.100:6789/0 148601 : cluster [INF] 
HEALTH_WARN; mds0: MDS in read-only mode

ceph-mds.log:
2016-08-07 18:10:58.699731 7f9fa2ba6700  1 mds.0.cache.dir(1000afe) commit 
error -22 v 1
2016-08-07 18:10:58.699755 7f9fa2ba6700 -1 log_channel(cluster) log [ERR] : 
failed to commit dir 1000afe object, errno -22
2016-08-07 18:10:58.699763 7f9fa2ba6700 -1 mds.0.2271 unhandled write error 
(22) Invalid argument, force readonly...
2016-08-07 18:10:58.699773 7f9fa2ba6700  1 mds.0.cache force file system 
read-only
2016-08-07 18:10:58.699777 7f9fa2ba6700  0 log_channel(cluster) log [WRN] : 
force file system read-only

I founded this object:
$ rados --pool metadata ls | grep 1000afe
1000afe.

and successfully got it:
$ rados --pool metadata get 1000afe. obj
$ echo $?
0

How to switchout MDS from readonly mode?
Are there any tools to test the CephFS system for errors?

$ ceph -v
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

$ ceph fs ls
name: cephfs, metadata pool: metadata, data pools: [data ]

$ ceph mds stat
e2283: 1/1/1 up {0=drop-03=up:active}, 3 up:standby

$ ceph osd lspools
0 data,1 metadata,6 one,

$ ceph osd dump | grep 'replicated size'
pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins 
pg_num 256 pgp_num 256 last_change 45647 crash_replay_interval 45 
min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash 
rjenkins pg_num 256 pgp_num 256 last_change 45649 min_read_recency_for_promote 
1 min_write_recency_for_promote 1 stripe_width 0
pool 6 'one' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
pg_num 512 pgp_num 512 last_change 53462 flags hashpspool 
min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0


Thank you for help.

--
Dmitry Lysenko
ISP Sovtest, Kursk, Russia
jabber: t...@jabber.sovtest.ru
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com