Re: [ceph-users] MDS in read-only mode
08.08.2016 13:51, Wido den Hollander пишет: > >> Op 8 augustus 2016 om 12:49 schreef John Spray: >> >> >> On Mon, Aug 8, 2016 at 9:26 AM, Dmitriy Lysenko wrote: >>> Good day. >>> >>> My CephFS switched to read only >>> This problem was previously on Hammer, but i recreated cephfs, upgraded to >>> Jewel and problem was solved, but appeared after some time. >>> >>> ceph.log >>> 2016-08-07 18:11:31.226960 mon.0 192.168.13.100:6789/0 148601 : cluster >>> [INF] HEALTH_WARN; mds0: MDS in read-only mode >>> >>> ceph-mds.log: >>> 2016-08-07 18:10:58.699731 7f9fa2ba6700 1 mds.0.cache.dir(1000afe) >>> commit error -22 v 1 >>> 2016-08-07 18:10:58.699755 7f9fa2ba6700 -1 log_channel(cluster) log [ERR] : >>> failed to commit dir 1000afe object, errno -22 >>> 2016-08-07 18:10:58.699763 7f9fa2ba6700 -1 mds.0.2271 unhandled write error >>> (22) Invalid argument, force readonly... >>> 2016-08-07 18:10:58.699773 7f9fa2ba6700 1 mds.0.cache force file system >>> read-only >>> 2016-08-07 18:10:58.699777 7f9fa2ba6700 0 log_channel(cluster) log [WRN] : >>> force file system read-only >> >> The MDS is going read only because it received an error (22, aka >> EINVAL) from an OSD when trying to write a metadata object. You need >> to investigate why the error occurred. Are your OSDs using the same >> Ceph version as your MDS? Look in the OSD logs for the time at which >> the error happened to see if there is more detail about why. >> All OSDs using same version: # ceph tell osd.* version osd.0: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.1: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.2: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.3: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.4: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.5: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.6: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.7: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.8: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.9: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.10: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.11: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.12: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } osd.14: { "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)" } I did not find any errors in osd logs from 18:00 to 19:00 (included in this message) > > You might want to add this to the mds config: > > debug_rados = 20 > > That should show you which RADOS operations it is performing and you can also > figure out which one failed. > > Like John said, might be a issue with a specific OSD. > > Wido I added debug_rados into [mds] section in ceph.conf But i've already fixed error by http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/: cephfs-journal-tool event recover_dentries summary cephfs-table-tool all reset session cephfs-journal-tool journal reset cephfs-data-scan init cephfs-data-scan scan_extents data cephfs-data-scan scan_inodes data > >> The readonly flag will clear if you restart your MDS (but it will get >> set again if it keeps encountering errors writing to OSDs) >> >> John >> >>> I founded this object: >>> $ rados --pool metadata ls | grep 1000afe >>> 1000afe. >>> >>> and successfully got it: >>> $ rados --pool metadata get 1000afe. obj >>> $ echo $? >>> 0 >>> >>> How to switchout MDS from readonly mode? >>> Are there any tools to test the CephFS system for errors? >>> >>> $ ceph -v >>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) >>> >>> $ ceph fs ls >>> name: cephfs, metadata pool: metadata, data pools: [data ] >>> >>> $ ceph mds stat >>> e2283: 1/1/1 up {0=drop-03=up:active}, 3 up:standby >>> >>> $ ceph osd lspools >>> 0 data,1 metadata,6 one, >>> >>> $ ceph osd dump | grep 'replicated size' >>> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash >>> rjenkins pg_num 256 pgp_num 256 last_change 45647 crash_replay_interval 45 >>> min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width >>> 0 >>> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash >>> rjenkins pg_num 256 pgp_num 256 last_change 45649 >>> min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width >>> 0 >>> pool 6 'one' replicated size 3 min_size 1 crush_ruleset 0 object_hash >>> rjenkins pg_num 512 pgp_num 512 last_change 53462 flags hashpspool >>>
Re: [ceph-users] MDS in read-only mode
> Op 8 augustus 2016 om 12:49 schreef John Spray: > > > On Mon, Aug 8, 2016 at 9:26 AM, Dmitriy Lysenko wrote: > > Good day. > > > > My CephFS switched to read only > > This problem was previously on Hammer, but i recreated cephfs, upgraded to > > Jewel and problem was solved, but appeared after some time. > > > > ceph.log > > 2016-08-07 18:11:31.226960 mon.0 192.168.13.100:6789/0 148601 : cluster > > [INF] HEALTH_WARN; mds0: MDS in read-only mode > > > > ceph-mds.log: > > 2016-08-07 18:10:58.699731 7f9fa2ba6700 1 mds.0.cache.dir(1000afe) > > commit error -22 v 1 > > 2016-08-07 18:10:58.699755 7f9fa2ba6700 -1 log_channel(cluster) log [ERR] : > > failed to commit dir 1000afe object, errno -22 > > 2016-08-07 18:10:58.699763 7f9fa2ba6700 -1 mds.0.2271 unhandled write error > > (22) Invalid argument, force readonly... > > 2016-08-07 18:10:58.699773 7f9fa2ba6700 1 mds.0.cache force file system > > read-only > > 2016-08-07 18:10:58.699777 7f9fa2ba6700 0 log_channel(cluster) log [WRN] : > > force file system read-only > > The MDS is going read only because it received an error (22, aka > EINVAL) from an OSD when trying to write a metadata object. You need > to investigate why the error occurred. Are your OSDs using the same > Ceph version as your MDS? Look in the OSD logs for the time at which > the error happened to see if there is more detail about why. > You might want to add this to the mds config: debug_rados = 20 That should show you which RADOS operations it is performing and you can also figure out which one failed. Like John said, might be a issue with a specific OSD. Wido > The readonly flag will clear if you restart your MDS (but it will get > set again if it keeps encountering errors writing to OSDs) > > John > > > I founded this object: > > $ rados --pool metadata ls | grep 1000afe > > 1000afe. > > > > and successfully got it: > > $ rados --pool metadata get 1000afe. obj > > $ echo $? > > 0 > > > > How to switchout MDS from readonly mode? > > Are there any tools to test the CephFS system for errors? > > > > $ ceph -v > > ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) > > > > $ ceph fs ls > > name: cephfs, metadata pool: metadata, data pools: [data ] > > > > $ ceph mds stat > > e2283: 1/1/1 up {0=drop-03=up:active}, 3 up:standby > > > > $ ceph osd lspools > > 0 data,1 metadata,6 one, > > > > $ ceph osd dump | grep 'replicated size' > > pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash > > rjenkins pg_num 256 pgp_num 256 last_change 45647 crash_replay_interval 45 > > min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width > > 0 > > pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash > > rjenkins pg_num 256 pgp_num 256 last_change 45649 > > min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width > > 0 > > pool 6 'one' replicated size 3 min_size 1 crush_ruleset 0 object_hash > > rjenkins pg_num 512 pgp_num 512 last_change 53462 flags hashpspool > > min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width > > 0 > > > > > > Thank you for help. > > > > -- > > Dmitry Lysenko > > ISP Sovtest, Kursk, Russia > > jabber: t...@jabber.sovtest.ru > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS in read-only mode
On Mon, Aug 8, 2016 at 9:26 AM, Dmitriy Lysenkowrote: > Good day. > > My CephFS switched to read only > This problem was previously on Hammer, but i recreated cephfs, upgraded to > Jewel and problem was solved, but appeared after some time. > > ceph.log > 2016-08-07 18:11:31.226960 mon.0 192.168.13.100:6789/0 148601 : cluster [INF] > HEALTH_WARN; mds0: MDS in read-only mode > > ceph-mds.log: > 2016-08-07 18:10:58.699731 7f9fa2ba6700 1 mds.0.cache.dir(1000afe) > commit error -22 v 1 > 2016-08-07 18:10:58.699755 7f9fa2ba6700 -1 log_channel(cluster) log [ERR] : > failed to commit dir 1000afe object, errno -22 > 2016-08-07 18:10:58.699763 7f9fa2ba6700 -1 mds.0.2271 unhandled write error > (22) Invalid argument, force readonly... > 2016-08-07 18:10:58.699773 7f9fa2ba6700 1 mds.0.cache force file system > read-only > 2016-08-07 18:10:58.699777 7f9fa2ba6700 0 log_channel(cluster) log [WRN] : > force file system read-only The MDS is going read only because it received an error (22, aka EINVAL) from an OSD when trying to write a metadata object. You need to investigate why the error occurred. Are your OSDs using the same Ceph version as your MDS? Look in the OSD logs for the time at which the error happened to see if there is more detail about why. The readonly flag will clear if you restart your MDS (but it will get set again if it keeps encountering errors writing to OSDs) John > I founded this object: > $ rados --pool metadata ls | grep 1000afe > 1000afe. > > and successfully got it: > $ rados --pool metadata get 1000afe. obj > $ echo $? > 0 > > How to switchout MDS from readonly mode? > Are there any tools to test the CephFS system for errors? > > $ ceph -v > ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) > > $ ceph fs ls > name: cephfs, metadata pool: metadata, data pools: [data ] > > $ ceph mds stat > e2283: 1/1/1 up {0=drop-03=up:active}, 3 up:standby > > $ ceph osd lspools > 0 data,1 metadata,6 one, > > $ ceph osd dump | grep 'replicated size' > pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 256 pgp_num 256 last_change 45647 crash_replay_interval 45 > min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0 > pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash > rjenkins pg_num 256 pgp_num 256 last_change 45649 > min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0 > pool 6 'one' replicated size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 512 pgp_num 512 last_change 53462 flags hashpspool > min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0 > > > Thank you for help. > > -- > Dmitry Lysenko > ISP Sovtest, Kursk, Russia > jabber: t...@jabber.sovtest.ru > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] MDS in read-only mode
Good day. My CephFS switched to read only This problem was previously on Hammer, but i recreated cephfs, upgraded to Jewel and problem was solved, but appeared after some time. ceph.log 2016-08-07 18:11:31.226960 mon.0 192.168.13.100:6789/0 148601 : cluster [INF] HEALTH_WARN; mds0: MDS in read-only mode ceph-mds.log: 2016-08-07 18:10:58.699731 7f9fa2ba6700 1 mds.0.cache.dir(1000afe) commit error -22 v 1 2016-08-07 18:10:58.699755 7f9fa2ba6700 -1 log_channel(cluster) log [ERR] : failed to commit dir 1000afe object, errno -22 2016-08-07 18:10:58.699763 7f9fa2ba6700 -1 mds.0.2271 unhandled write error (22) Invalid argument, force readonly... 2016-08-07 18:10:58.699773 7f9fa2ba6700 1 mds.0.cache force file system read-only 2016-08-07 18:10:58.699777 7f9fa2ba6700 0 log_channel(cluster) log [WRN] : force file system read-only I founded this object: $ rados --pool metadata ls | grep 1000afe 1000afe. and successfully got it: $ rados --pool metadata get 1000afe. obj $ echo $? 0 How to switchout MDS from readonly mode? Are there any tools to test the CephFS system for errors? $ ceph -v ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) $ ceph fs ls name: cephfs, metadata pool: metadata, data pools: [data ] $ ceph mds stat e2283: 1/1/1 up {0=drop-03=up:active}, 3 up:standby $ ceph osd lspools 0 data,1 metadata,6 one, $ ceph osd dump | grep 'replicated size' pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 45647 crash_replay_interval 45 min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0 pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 45649 min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0 pool 6 'one' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 53462 flags hashpspool min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0 Thank you for help. -- Dmitry Lysenko ISP Sovtest, Kursk, Russia jabber: t...@jabber.sovtest.ru ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com