Re: [ceph-users] rados rm: device or resource busy
On 06/09/2017 10:47 AM, Jan Kasprzak wrote: All, This is where i wrote https://github.com/Mosibi/ceph_stripe_fixer for. With regards, Richard. Hello, Brad Hubbard wrote: : I can reproduce this. [...] : That's here where you will notice it is returning EBUSY which is error : code 16, "Device or resource busy". : : https://github.com/badone/ceph/blob/wip-ceph_test_admin_socket_output/src/cls/lock/cls_lock.cc#L189 : : In order to remove the existing parts of the file you should be able : to just run "rados --pool testpool ls" and remove the listed objects : belonging to "testfile". : : Example: : rados --pool testpool ls : testfile.0004 : testfile.0001 : testfile. : testfile.0003 : testfile.0005 : testfile.0002 : : rados --pool testpool rm testfile. : rados --pool testpool rm testfile.0001 : ... This works for me, thanks! : Please open a tracker for this so it can be investigated further. Done: http://tracker.ceph.com/issues/20233 -Yenya -- With regards, Richard Arends. Snow BV / http://snow.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rados rm: device or resource busy
Hello, Brad Hubbard wrote: : I can reproduce this. [...] : That's here where you will notice it is returning EBUSY which is error : code 16, "Device or resource busy". : : https://github.com/badone/ceph/blob/wip-ceph_test_admin_socket_output/src/cls/lock/cls_lock.cc#L189 : : In order to remove the existing parts of the file you should be able : to just run "rados --pool testpool ls" and remove the listed objects : belonging to "testfile". : : Example: : rados --pool testpool ls : testfile.0004 : testfile.0001 : testfile. : testfile.0003 : testfile.0005 : testfile.0002 : : rados --pool testpool rm testfile. : rados --pool testpool rm testfile.0001 : ... This works for me, thanks! : Please open a tracker for this so it can be investigated further. Done: http://tracker.ceph.com/issues/20233 -Yenya -- | Jan "Yenya" Kasprzak | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | > That's why this kind of vulnerability is a concern: deploying stuff is < > often about collecting an obscene number of .jar files and pushing them < > up to the application server. --pboddie at LWN < ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rados rm: device or resource busy
I can reproduce this. The key is to look at debug logging on the primary. 2017-06-09 09:30:14.776355 7f9cf26a4700 20 /home/brad/working/src/ceph3/src/cls/lock/cls_lock.cc:247: lock_op 2017-06-09 09:30:14.776359 7f9cf26a4700 20 /home/brad/working/src/ceph3/src/cls/lock/cls_lock.cc:162: requested lock_type=exclusive fail_if_exists=1 2017-06-09 09:30:14.776363 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6( v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f 9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27 active+clean] do_osd_op 0:6d521d9c:::testfile.:head [getxattr lock.striper.lock] 2017-06-09 09:30:14.776372 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6( v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f 9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27 active+clean] do_osd_op getxattr lock.striper.lock 2017-06-09 09:30:14.776383 7f9cf26a4700 15 filestore(/home/brad/working/src/ceph3/build/dev/osd0) getattr 0.6_head/#0:6d521d9c:::testfile.:head# '_lock.striper.lock' 2017-06-09 09:30:14.776408 7f9cf26a4700 10 filestore(/home/brad/working/src/ceph3/build/dev/osd0) getattr 0.6_head/#0:6d521d9c:::testfile.:head# '_lock.striper.lock' = 126 2017-06-09 09:30:14.776419 7f9cf26a4700 20 /home/brad/working/src/ceph3/src/cls/lock/cls_lock.cc:189: cannot take lock on object, conflicting tag 2017-06-09 09:30:14.776422 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6( v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f 9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27 active+clean] method called response length=0 2017-06-09 09:30:14.776432 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6( v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f 9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27 active+clean] dropping ondisk_read_lock 2017-06-09 09:30:14.776445 7f9cf26a4700 20 osd.0 pg_epoch: 10 pg[0.6( v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f 9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27 active+clean] op order client.4122 tid 1 (first) 2017-06-09 09:30:14.776453 7f9cf26a4700 20 osd.0 pg_epoch: 10 pg[0.6( v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f 9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27 active+clean] execute_ctx update_log_only -- result=-16 2017-06-09 09:30:14.776468 7f9cf26a4700 20 osd.0 pg_epoch: 10 pg[0.6( v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f 9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27 active+clean] record_write_error r=-16 2017-06-09 09:30:14.776478 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6( v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f 9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27 active+clean] submit_log_entries 10'32 (0'0) error 0:6d521d9c:::testfile.:head by client.4122.0:1 0.00 -16 2017-06-09 09:30:14.776490 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6( v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f 9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27 active+clean] new_repop: repgather(0x565246704a80 10'32 rep_tid=33 committed?=0 applied?=0 r=-16) 2017-06-09 09:30:14.776502 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6( v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f 9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27 active+clean] merge_new_log_entries 10'32 (0'0) error 0:6d521d9c:::testfile.:head by client.4122.0:1 0.00 -16 2017-06-09 09:30:14.776514 7f9cf26a4700 20 update missing, append 10'32 (0'0) error0:6d521d9c:::testfile.:head by client.4122.0:1 0.00 -16 Specifically this. /home/brad/working/src/ceph3/src/cls/lock/cls_lock.cc:189: cannot take lock on object, conflicting tag That's here where you will notice it is returning EBUSY which is error code 16, "Device or resource busy". https://github.com/badone/ceph/blob/wip-ceph_test_admin_socket_output/src/cls/lock/cls_lock.cc#L189 In order to remove the existing parts of the file you should be able to just run "rados --pool testpool ls" and remove the listed objects belonging to "testfile". Example: rados --pool testpool ls testfile.0004 testfile.0001 testfile. testfile.0003 testfile.0005 testfile.0002 rados --pool testpool rm testfile. rados --pool testpool rm testfile.0001 ... Please open a tracker for this so it can be investigated further. On Fri, Jun 9, 2017 at 1:43 AM, Jan Kasprzakwrote: > Hello, > > David Turner wrote: > : How long have you waited? > > About a day. > > : I don't do much with rados objects directly. I usually use RBDs and > : cephfs. If you just need to clean things up, you can delete the pool and > : recreate it since it looks like it's testing.
Re: [ceph-users] rados rm: device or resource busy
Hello, David Turner wrote: : How long have you waited? About a day. : I don't do much with rados objects directly. I usually use RBDs and : cephfs. If you just need to clean things up, you can delete the pool and : recreate it since it looks like it's testing. However this is probably a : prime time to figure out how to get past this in case it happens in the : future in production. Yes. This is why I am asking now. -Yenya : On Thu, Jun 8, 2017 at 11:04 AM Jan Kasprzakwrote: : > I have created a RADOS striped object using : > : > $ dd someargs | rados --pool testpool --striper put testfile - : > : > and interrupted it in the middle of writing. Now I cannot remove this : > object: : > : > $ rados --pool testpool --striper rm testfile : > error removing testpool>testfile: (16) Device or resource busy : > : > How can I tell CEPH that the writer is no longer around and does not come : > back, : > so that I can remove the object "testfile"? -- | Jan "Yenya" Kasprzak | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | > That's why this kind of vulnerability is a concern: deploying stuff is < > often about collecting an obscene number of .jar files and pushing them < > up to the application server. --pboddie at LWN < ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rados rm: device or resource busy
How long have you waited? Watchers of objects in ceph time out after a while and you should be able to delete it. I'm talking around the range of 30 minutes, so it's likely this isn't the problem if you've been wrestling with it long enough to write in about. I don't do much with rados objects directly. I usually use RBDs and cephfs. If you just need to clean things up, you can delete the pool and recreate it since it looks like it's testing. However this is probably a prime time to figure out how to get past this in case it happens in the future in production. Hopefully someone that has more experience with manually creating and removing rados objects chimes in. On Thu, Jun 8, 2017 at 11:04 AM Jan Kasprzakwrote: > Hello, > > I have created a RADOS striped object using > > $ dd someargs | rados --pool testpool --striper put testfile - > > and interrupted it in the middle of writing. Now I cannot remove this > object: > > $ rados --pool testpool --striper rm testfile > error removing testpool>testfile: (16) Device or resource busy > > How can I tell CEPH that the writer is no longer around and does not come > back, > so that I can remove the object "testfile"? > > Thanks, > > -Yenya > > -- > | Jan "Yenya" Kasprzak > | > | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 > | > > That's why this kind of vulnerability is a concern: deploying stuff is < > > often about collecting an obscene number of .jar files and pushing them < > > up to the application server. --pboddie at LWN < > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rados rm: device or resource busy
Hello, I have created a RADOS striped object using $ dd someargs | rados --pool testpool --striper put testfile - and interrupted it in the middle of writing. Now I cannot remove this object: $ rados --pool testpool --striper rm testfile error removing testpool>testfile: (16) Device or resource busy How can I tell CEPH that the writer is no longer around and does not come back, so that I can remove the object "testfile"? Thanks, -Yenya -- | Jan "Yenya" Kasprzak | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | > That's why this kind of vulnerability is a concern: deploying stuff is < > often about collecting an obscene number of .jar files and pushing them < > up to the application server. --pboddie at LWN < ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com