Re: [ceph-users] RBD Mirror DR Testing

2019-11-25 Thread Jason Dillaman
On Mon, Nov 25, 2019 at 12:24 PM Vikas Rana  wrote:
>
> Hi All,
> I believe we forgot to take the snapshot in the previous test. Here's the 
> output from current test where we took snapshot on Primary side but the 
> snapshot did not replicated to DR side?
> VTIER1 is the Primary box with cluster ceph. Vtier2a is the DR box with 
> cluster name cephdr.
>
> root@VTIER1:~# rbd ls -l nfs
> NAME   SIZE PARENT FMT PROT LOCK
> dir_research 200TiB  2  excl
> dir_research@dr_test 200TiB  2
> test01   100MiB  2
> root@VTIER1:~#
>
>
> root@vtier2a:~# rbd ls -l nfs
> NAME   SIZE PARENT FMT PROT LOCK
> dir_research 200TiB  2  excl
> test01   100MiB  2  excl
>
> root@vtier2a:~# rbd mirror pool status nfs --verbose --cluster=cephdr
> health: OK
> images: 2 total
> 2 replaying
>
> dir_research:
>   global_id:   92f46320-d43d-48eb-8a09-b68a1945cc77
>   state:   up+replaying
>   description: replaying, master_position=[object_number=597902, tag_tid=3, 
> entry_tid=705172054], mirror_position=[object_number=311129, tag_tid=3, 
> entry_tid=283416457], entries_behind_master=421755597
>   last_update: 2019-11-25 12:14:52

The "entries_behind_master=421755597" is telling me that your
"rbd-mirror" daemon is *very* far behind. Assuming each entry is a
4KiB IO, that would be over 1.5TiBs behind.


> test01:
>   global_id:   06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7
>   state:   up+replaying
>   description: replaying, master_position=[object_number=3, tag_tid=1, 
> entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], 
> entries_behind_master=0
>   last_update: 2019-11-25 12:14:50
>
> root@vtier2a:~# rbd-nbd --cluster=cephdr map nfs/dir_research@dr_test
> 2019-11-25 12:17:45.764091 7f8bd73c5dc0 -1 asok(0x55fd9a7202a0) 
> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
> bind the UNIX domain socket to '/var/run/ceph/cephdr-client.admin.asok': (17) 
> File exists
>
>
>
> Did we missed anything and why the snapshot didn't replicated to DR side?
>
> Thanks,
> -Vikas
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Thursday, November 21, 2019 10:24 AM
> To: Vikas Rana 
> Cc: dillaman ; ceph-users 
> Subject: Re: [ceph-users] RBD Mirror DR Testing
>
> On Thu, Nov 21, 2019 at 10:16 AM Vikas Rana  wrote:
> >
> > Thanks Jason.
> > We are just mounting and verifying the directory structure and make sure it 
> > looks good.
> >
> > My understanding was, in 12.2.10, we can't mount the DR snapshot as the RBD 
> > image is non-primary. Is this wrong?
>
> You have always been able to access non-primary images for read-only 
> operations (only writes are prevented):
>
> $ rbd info test
> rbd image 'test':
> <... snip ...>
> mirroring primary: false
>
> $ rbd device --device-type nbd map test@1
> /dev/nbd0
> $ mount /dev/nbd0 /mnt/
> mount: /mnt: WARNING: device write-protected, mounted read-only.
> $ ll /mnt/
> total 0
> -rw-r--r--. 1 root root 0 Nov 21 10:20 hello.world
>
> > Thanks,
> > -Vikas
> >
> > -Original Message-
> > From: Jason Dillaman 
> > Sent: Thursday, November 21, 2019 9:58 AM
> > To: Vikas Rana 
> > Cc: ceph-users 
> > Subject: Re: [ceph-users] RBD Mirror DR Testing
> >
> > On Thu, Nov 21, 2019 at 9:56 AM Jason Dillaman  wrote:
> > >
> > > On Thu, Nov 21, 2019 at 8:49 AM Vikas Rana  wrote:
> > > >
> > > > Thanks Jason for such a quick response. We are on 12.2.10.
> > > >
> > > > Checksuming a 200TB image will take a long time.
> > >
> > > How would mounting an RBD image and scanning the image be faster?
> > > Are you only using a small percentage of the image?
> >
> > ... and of course, you can mount an RBD snapshot in read-only mode.
> >
> > > > To test the DR copy by mounting it, these are the steps I'm
> > > > planning to follow 1. Demote the Prod copy and promote the DR copy
> > > > 2. Do we have to recreate the rbd mirror relationship going from DR to 
> > > > primary?
> > > > 3. Mount and validate the data
> > > > 4. Demote the DR copy and promote the Prod copy 5. Revert the peer
> > > > relationship if required?
> > > >
> > > > Did I do it right or miss anything?
> > >
> > > You cannot change the peers or you will lose the relationship. If
> > > you insist on your course of action, you just need to be configured
> > &

Re: [ceph-users] RBD Mirror DR Testing

2019-11-22 Thread Jason Dillaman
On Fri, Nov 22, 2019 at 11:16 AM Vikas Rana  wrote:
>
> Hi All,
>
> We have a XFS filesystems on Prod side and when we trying to mount the DR 
> copy, we get superblock error
>
> root@:~# rbd-nbd map nfs/dir
> /dev/nbd0
> root@:~# mount /dev/nbd0 /mnt
> mount: /dev/nbd0: can't read superblock

Doesn't look like you are mapping at a snapshot.

>
> Any suggestions to test the DR copy any other way or if I'm doing something 
> wrong?
>
> Thanks,
> -Vikas
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Thursday, November 21, 2019 10:24 AM
> To: Vikas Rana 
> Cc: dillaman ; ceph-users 
> Subject: Re: [ceph-users] RBD Mirror DR Testing
>
> On Thu, Nov 21, 2019 at 10:16 AM Vikas Rana  wrote:
> >
> > Thanks Jason.
> > We are just mounting and verifying the directory structure and make sure it 
> > looks good.
> >
> > My understanding was, in 12.2.10, we can't mount the DR snapshot as the RBD 
> > image is non-primary. Is this wrong?
>
> You have always been able to access non-primary images for read-only 
> operations (only writes are prevented):
>
> $ rbd info test
> rbd image 'test':
> <... snip ...>
> mirroring primary: false
>
> $ rbd device --device-type nbd map test@1
> /dev/nbd0
> $ mount /dev/nbd0 /mnt/
> mount: /mnt: WARNING: device write-protected, mounted read-only.
> $ ll /mnt/
> total 0
> -rw-r--r--. 1 root root 0 Nov 21 10:20 hello.world
>
> > Thanks,
> > -Vikas
> >
> > -Original Message-
> > From: Jason Dillaman 
> > Sent: Thursday, November 21, 2019 9:58 AM
> > To: Vikas Rana 
> > Cc: ceph-users 
> > Subject: Re: [ceph-users] RBD Mirror DR Testing
> >
> > On Thu, Nov 21, 2019 at 9:56 AM Jason Dillaman  wrote:
> > >
> > > On Thu, Nov 21, 2019 at 8:49 AM Vikas Rana  wrote:
> > > >
> > > > Thanks Jason for such a quick response. We are on 12.2.10.
> > > >
> > > > Checksuming a 200TB image will take a long time.
> > >
> > > How would mounting an RBD image and scanning the image be faster?
> > > Are you only using a small percentage of the image?
> >
> > ... and of course, you can mount an RBD snapshot in read-only mode.
> >
> > > > To test the DR copy by mounting it, these are the steps I'm
> > > > planning to follow 1. Demote the Prod copy and promote the DR copy
> > > > 2. Do we have to recreate the rbd mirror relationship going from DR to 
> > > > primary?
> > > > 3. Mount and validate the data
> > > > 4. Demote the DR copy and promote the Prod copy 5. Revert the peer
> > > > relationship if required?
> > > >
> > > > Did I do it right or miss anything?
> > >
> > > You cannot change the peers or you will lose the relationship. If
> > > you insist on your course of action, you just need to be configured
> > > for two-way mirroring and leave it that way.
> > >
> > > >
> > > > Thanks,
> > > > -Vikas
> > > >
> > > > -Original Message-
> > > > From: Jason Dillaman 
> > > > Sent: Thursday, November 21, 2019 8:33 AM
> > > > To: Vikas Rana 
> > > > Cc: ceph-users 
> > > > Subject: Re: [ceph-users] RBD Mirror DR Testing
> > > >
> > > > On Thu, Nov 21, 2019 at 8:29 AM Vikas Rana  wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > >
> > > > >
> > > > > We have a 200TB RBD image which we are replicating using RBD 
> > > > > mirroring.
> > > > >
> > > > > We want to test the DR copy and make sure that we have a consistent 
> > > > > copy in case primary site is lost.
> > > > >
> > > > >
> > > > >
> > > > > We did it previously and promoted the DR copy which broken the DR 
> > > > > copy from primary and we have to resync the whole 200TB data.
> > > > >
> > > > >
> > > > >
> > > > > Is there any correct way of doing it so we don’t have to resync all 
> > > > > 200TB again?
> > > >
> > > > Yes, create a snapshot on the primary site and let it propagate to the 
> > > > non-primary site. Then you can compare checksums at the snapshot w/o 
> > > > having to worry about the data changing. Once you have finished, delete 
> > > > the snapshot on the primary site and it will prop

Re: [ceph-users] RBD Mirror DR Testing

2019-11-21 Thread Jason Dillaman
On Thu, Nov 21, 2019 at 10:16 AM Vikas Rana  wrote:
>
> Thanks Jason.
> We are just mounting and verifying the directory structure and make sure it 
> looks good.
>
> My understanding was, in 12.2.10, we can't mount the DR snapshot as the RBD 
> image is non-primary. Is this wrong?

You have always been able to access non-primary images for read-only
operations (only writes are prevented):

$ rbd info test
rbd image 'test':
<... snip ...>
mirroring primary: false

$ rbd device --device-type nbd map test@1
/dev/nbd0
$ mount /dev/nbd0 /mnt/
mount: /mnt: WARNING: device write-protected, mounted read-only.
$ ll /mnt/
total 0
-rw-r--r--. 1 root root 0 Nov 21 10:20 hello.world

> Thanks,
> -Vikas
>
> -----Original Message-
> From: Jason Dillaman 
> Sent: Thursday, November 21, 2019 9:58 AM
> To: Vikas Rana 
> Cc: ceph-users 
> Subject: Re: [ceph-users] RBD Mirror DR Testing
>
> On Thu, Nov 21, 2019 at 9:56 AM Jason Dillaman  wrote:
> >
> > On Thu, Nov 21, 2019 at 8:49 AM Vikas Rana  wrote:
> > >
> > > Thanks Jason for such a quick response. We are on 12.2.10.
> > >
> > > Checksuming a 200TB image will take a long time.
> >
> > How would mounting an RBD image and scanning the image be faster? Are
> > you only using a small percentage of the image?
>
> ... and of course, you can mount an RBD snapshot in read-only mode.
>
> > > To test the DR copy by mounting it, these are the steps I'm planning
> > > to follow 1. Demote the Prod copy and promote the DR copy 2. Do we
> > > have to recreate the rbd mirror relationship going from DR to primary?
> > > 3. Mount and validate the data
> > > 4. Demote the DR copy and promote the Prod copy 5. Revert the peer
> > > relationship if required?
> > >
> > > Did I do it right or miss anything?
> >
> > You cannot change the peers or you will lose the relationship. If you
> > insist on your course of action, you just need to be configured for
> > two-way mirroring and leave it that way.
> >
> > >
> > > Thanks,
> > > -Vikas
> > >
> > > -Original Message-
> > > From: Jason Dillaman 
> > > Sent: Thursday, November 21, 2019 8:33 AM
> > > To: Vikas Rana 
> > > Cc: ceph-users 
> > > Subject: Re: [ceph-users] RBD Mirror DR Testing
> > >
> > > On Thu, Nov 21, 2019 at 8:29 AM Vikas Rana  wrote:
> > > >
> > > > Hi all,
> > > >
> > > >
> > > >
> > > > We have a 200TB RBD image which we are replicating using RBD mirroring.
> > > >
> > > > We want to test the DR copy and make sure that we have a consistent 
> > > > copy in case primary site is lost.
> > > >
> > > >
> > > >
> > > > We did it previously and promoted the DR copy which broken the DR copy 
> > > > from primary and we have to resync the whole 200TB data.
> > > >
> > > >
> > > >
> > > > Is there any correct way of doing it so we don’t have to resync all 
> > > > 200TB again?
> > >
> > > Yes, create a snapshot on the primary site and let it propagate to the 
> > > non-primary site. Then you can compare checksums at the snapshot w/o 
> > > having to worry about the data changing. Once you have finished, delete 
> > > the snapshot on the primary site and it will propagate over to the 
> > > non-primary site.
> > >
> > > >
> > > >
> > > > Can we demote current primary and then promote the DR copy and test and 
> > > > then revert back? Will that require the complete 200TB sync?
> > > >
> > >
> > > It's only the forced-promotion that causes split-brain. If you gracefully 
> > > demote from site A and promote site B, and then demote site B and promote 
> > > site A, that will not require a sync. However, again, it's probably just 
> > > easier to use a snapshot.
> > >
> > > >
> > > > Thanks in advance for your help and suggestions.
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > -Vikas
> > > >
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > >
> > >
> > > --
> > > Jason
> > >
> > >
> >
> >
> > --
> > Jason
>
>
>
> --
> Jason
>
>


-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Mirror DR Testing

2019-11-21 Thread Jason Dillaman
On Thu, Nov 21, 2019 at 9:56 AM Jason Dillaman  wrote:
>
> On Thu, Nov 21, 2019 at 8:49 AM Vikas Rana  wrote:
> >
> > Thanks Jason for such a quick response. We are on 12.2.10.
> >
> > Checksuming a 200TB image will take a long time.
>
> How would mounting an RBD image and scanning the image be faster? Are
> you only using a small percentage of the image?

... and of course, you can mount an RBD snapshot in read-only mode.

> > To test the DR copy by mounting it, these are the steps I'm planning to 
> > follow
> > 1. Demote the Prod copy and promote the DR copy
> > 2. Do we have to recreate the rbd mirror relationship going from DR to 
> > primary?
> > 3. Mount and validate the data
> > 4. Demote the DR copy and promote the Prod copy
> > 5. Revert the peer relationship if required?
> >
> > Did I do it right or miss anything?
>
> You cannot change the peers or you will lose the relationship. If you
> insist on your course of action, you just need to be configured for
> two-way mirroring and leave it that way.
>
> >
> > Thanks,
> > -Vikas
> >
> > -Original Message-
> > From: Jason Dillaman 
> > Sent: Thursday, November 21, 2019 8:33 AM
> > To: Vikas Rana 
> > Cc: ceph-users 
> > Subject: Re: [ceph-users] RBD Mirror DR Testing
> >
> > On Thu, Nov 21, 2019 at 8:29 AM Vikas Rana  wrote:
> > >
> > > Hi all,
> > >
> > >
> > >
> > > We have a 200TB RBD image which we are replicating using RBD mirroring.
> > >
> > > We want to test the DR copy and make sure that we have a consistent copy 
> > > in case primary site is lost.
> > >
> > >
> > >
> > > We did it previously and promoted the DR copy which broken the DR copy 
> > > from primary and we have to resync the whole 200TB data.
> > >
> > >
> > >
> > > Is there any correct way of doing it so we don’t have to resync all 200TB 
> > > again?
> >
> > Yes, create a snapshot on the primary site and let it propagate to the 
> > non-primary site. Then you can compare checksums at the snapshot w/o having 
> > to worry about the data changing. Once you have finished, delete the 
> > snapshot on the primary site and it will propagate over to the non-primary 
> > site.
> >
> > >
> > >
> > > Can we demote current primary and then promote the DR copy and test and 
> > > then revert back? Will that require the complete 200TB sync?
> > >
> >
> > It's only the forced-promotion that causes split-brain. If you gracefully 
> > demote from site A and promote site B, and then demote site B and promote 
> > site A, that will not require a sync. However, again, it's probably just 
> > easier to use a snapshot.
> >
> > >
> > > Thanks in advance for your help and suggestions.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > -Vikas
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > --
> > Jason
> >
> >
>
>
> --
> Jason



-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Mirror DR Testing

2019-11-21 Thread Jason Dillaman
On Thu, Nov 21, 2019 at 8:49 AM Vikas Rana  wrote:
>
> Thanks Jason for such a quick response. We are on 12.2.10.
>
> Checksuming a 200TB image will take a long time.

How would mounting an RBD image and scanning the image be faster? Are
you only using a small percentage of the image?

> To test the DR copy by mounting it, these are the steps I'm planning to follow
> 1. Demote the Prod copy and promote the DR copy
> 2. Do we have to recreate the rbd mirror relationship going from DR to 
> primary?
> 3. Mount and validate the data
> 4. Demote the DR copy and promote the Prod copy
> 5. Revert the peer relationship if required?
>
> Did I do it right or miss anything?

You cannot change the peers or you will lose the relationship. If you
insist on your course of action, you just need to be configured for
two-way mirroring and leave it that way.

>
> Thanks,
> -Vikas
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Thursday, November 21, 2019 8:33 AM
> To: Vikas Rana 
> Cc: ceph-users 
> Subject: Re: [ceph-users] RBD Mirror DR Testing
>
> On Thu, Nov 21, 2019 at 8:29 AM Vikas Rana  wrote:
> >
> > Hi all,
> >
> >
> >
> > We have a 200TB RBD image which we are replicating using RBD mirroring.
> >
> > We want to test the DR copy and make sure that we have a consistent copy in 
> > case primary site is lost.
> >
> >
> >
> > We did it previously and promoted the DR copy which broken the DR copy from 
> > primary and we have to resync the whole 200TB data.
> >
> >
> >
> > Is there any correct way of doing it so we don’t have to resync all 200TB 
> > again?
>
> Yes, create a snapshot on the primary site and let it propagate to the 
> non-primary site. Then you can compare checksums at the snapshot w/o having 
> to worry about the data changing. Once you have finished, delete the snapshot 
> on the primary site and it will propagate over to the non-primary site.
>
> >
> >
> > Can we demote current primary and then promote the DR copy and test and 
> > then revert back? Will that require the complete 200TB sync?
> >
>
> It's only the forced-promotion that causes split-brain. If you gracefully 
> demote from site A and promote site B, and then demote site B and promote 
> site A, that will not require a sync. However, again, it's probably just 
> easier to use a snapshot.
>
> >
> > Thanks in advance for your help and suggestions.
> >
> >
> >
> > Thanks,
> >
> > -Vikas
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason
>
>


-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Mirror DR Testing

2019-11-21 Thread Jason Dillaman
On Thu, Nov 21, 2019 at 8:29 AM Vikas Rana  wrote:
>
> Hi all,
>
>
>
> We have a 200TB RBD image which we are replicating using RBD mirroring.
>
> We want to test the DR copy and make sure that we have a consistent copy in 
> case primary site is lost.
>
>
>
> We did it previously and promoted the DR copy which broken the DR copy from 
> primary and we have to resync the whole 200TB data.
>
>
>
> Is there any correct way of doing it so we don’t have to resync all 200TB 
> again?

Yes, create a snapshot on the primary site and let it propagate to the
non-primary site. Then you can compare checksums at the snapshot w/o
having to worry about the data changing. Once you have finished,
delete the snapshot on the primary site and it will propagate over to
the non-primary site.

>
>
> Can we demote current primary and then promote the DR copy and test and then 
> revert back? Will that require the complete 200TB sync?
>

It's only the forced-promotion that causes split-brain. If you
gracefully demote from site A and promote site B, and then demote site
B and promote site A, that will not require a sync. However, again,
it's probably just easier to use a snapshot.

>
> Thanks in advance for your help and suggestions.
>
>
>
> Thanks,
>
> -Vikas
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Jason Dillaman
On Tue, Nov 19, 2019 at 4:42 PM Florian Haas  wrote:
>
> On 19/11/2019 22:34, Jason Dillaman wrote:
> >> Oh totally, I wasn't arguing it was a bad idea for it to do what it
> >> does! I just got confused by the fact that our mon logs showed what
> >> looked like a (failed) attempt to blacklist an entire client IP address.
> >
> > There should have been an associated client nonce after the IP address
> > to uniquely identify which client connection is blacklisted --
> > something like "1.2.3.4:0/5678". Let me know if that's not the case
> > since that would definitely be wrong.
>
> English lacks a universally understood way to answer a negated question
> in the affirmative, so this is tricky to get right, but I'll try: No,
> that *is* the case, thus nothing is wrong. :)

Haha -- thanks!

> Cheers,
> Florian
>


-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Jason Dillaman
On Tue, Nov 19, 2019 at 4:31 PM Florian Haas  wrote:
>
> On 19/11/2019 22:19, Jason Dillaman wrote:
> > On Tue, Nov 19, 2019 at 4:09 PM Florian Haas  wrote:
> >>
> >> On 19/11/2019 21:32, Jason Dillaman wrote:
> >>>> What, exactly, is the "reasonably configured hypervisor" here, in other
> >>>> words, what is it that grabs and releases this lock? It's evidently not
> >>>> Nova that does this, but is it libvirt, or Qemu/KVM, and if so, what
> >>>> magic in there makes this happen, and what "reasonable configuration"
> >>>> influences this?
> >>>
> >>> librbd and krbd perform this logic when the exclusive-lock feature is
> >>> enabled.
> >>
> >> Right. So the "reasonable configuration" applies to the features they
> >> enable when they *create* an image, rather than what they do to the
> >> image at runtime. Is that fair to say?
> >
> > The exclusive-lock ownership is enforced at image use (i.e. when the
> > feature is a property of the image, not specifically just during the
> > action of enabling the property) -- so this implies "what they do to
> > the image at runtime"
>
> OK, gotcha.
>
> >>> In this case, librbd sees that the previous lock owner is
> >>> dead / missing, but before it can steal the lock (since librbd did not
> >>> cleanly close the image), it needs to ensure it cannot come back from
> >>> the dead to issue future writes against the RBD image by blacklisting
> >>> it from the cluster.
> >>
> >> Thanks. I'm probably sounding dense here, sorry for that, but yes, this
> >> makes perfect sense to me when I want to fence a whole node off —
> >> however, how exactly does this work with VM recovery in place?
> >
> > How would librbd / krbd know under what situation a VM was being
> > "recovered"? Should librbd be expected to integrate w/ IPMI devices
> > where the VM is being run or w/ Zabbix alert monitoring to know that
> > this was a power failure so don't expect that the lock owner will come
> > back up? The safe and generic thing for librbd / krbd to do in this
> > situation is to just blacklist the old lock owner to ensure it cannot
> > talk to the cluster. Obviously in the case of a physically failed
> > node, that won't ever happen -- but I think we can all agree this is
> > the sane recovery path that covers all bases.
>
> Oh totally, I wasn't arguing it was a bad idea for it to do what it
> does! I just got confused by the fact that our mon logs showed what
> looked like a (failed) attempt to blacklist an entire client IP address.

There should have been an associated client nonce after the IP address
to uniquely identify which client connection is blacklisted --
something like "1.2.3.4:0/5678". Let me know if that's not the case
since that would definitely be wrong.

> > Yup, with the correct permissions librbd / rbd will be able to
> > blacklist the lock owner, break the old lock, and acquire the lock
> > themselves for R/W operations -- and the operator would not need to
> > intervene.
>
> Ack. Thanks!
>
> Cheers,
> Florian
>


-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Jason Dillaman
On Tue, Nov 19, 2019 at 4:09 PM Florian Haas  wrote:
>
> On 19/11/2019 21:32, Jason Dillaman wrote:
> >> What, exactly, is the "reasonably configured hypervisor" here, in other
> >> words, what is it that grabs and releases this lock? It's evidently not
> >> Nova that does this, but is it libvirt, or Qemu/KVM, and if so, what
> >> magic in there makes this happen, and what "reasonable configuration"
> >> influences this?
> >
> > librbd and krbd perform this logic when the exclusive-lock feature is
> > enabled.
>
> Right. So the "reasonable configuration" applies to the features they
> enable when they *create* an image, rather than what they do to the
> image at runtime. Is that fair to say?

The exclusive-lock ownership is enforced at image use (i.e. when the
feature is a property of the image, not specifically just during the
action of enabling the property) -- so this implies "what they do to
the image at runtime"

> > In this case, librbd sees that the previous lock owner is
> > dead / missing, but before it can steal the lock (since librbd did not
> > cleanly close the image), it needs to ensure it cannot come back from
> > the dead to issue future writes against the RBD image by blacklisting
> > it from the cluster.
>
> Thanks. I'm probably sounding dense here, sorry for that, but yes, this
> makes perfect sense to me when I want to fence a whole node off —
> however, how exactly does this work with VM recovery in place?

How would librbd / krbd know under what situation a VM was being
"recovered"? Should librbd be expected to integrate w/ IPMI devices
where the VM is being run or w/ Zabbix alert monitoring to know that
this was a power failure so don't expect that the lock owner will come
back up? The safe and generic thing for librbd / krbd to do in this
situation is to just blacklist the old lock owner to ensure it cannot
talk to the cluster. Obviously in the case of a physically failed
node, that won't ever happen -- but I think we can all agree this is
the sane recovery path that covers all bases.

> From further upthread:
>
> > Semi-relatedly, as I understand it OSD blacklisting happens based either
> > on an IP address, or on a socket address (IP:port). While this comes in
> > handy in host evacuation, it doesn't in in-place recovery (see question
> > 4 in my original message).
> >
> > - If the blacklist happens based on IP address alone (and that's what
> > seems to be what the client attempts to be doing, based on our log
> > messages), then it would break recovery-in-place after a hard reboot
> > altogether.
> >
> > - Even if the client would blacklist based on an address:port pair, it
> > would be just very unlikely that an RBD client used the same source port
> > to connect after the node recovers in place, but not impossible.
>
> Clearly though, if people set their permissions correctly then this
> blacklisting seems to work fine even for recovery-in-place, so no reason
> for me to doubt that, I'd just really like to understand the mechanics. :)

Yup, with the correct permissions librbd / rbd will be able to
blacklist the lock owner, break the old lock, and acquire the lock
themselves for R/W operations -- and the operator would not need to
intervene.

> Thanks again!
>
> Cheers,
> Florian
>

-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Jason Dillaman
On Tue, Nov 19, 2019 at 2:49 PM Florian Haas  wrote:
>
> On 19/11/2019 20:03, Jason Dillaman wrote:
> > On Tue, Nov 19, 2019 at 1:51 PM shubjero  wrote:
> >>
> >> Florian,
> >>
> >> Thanks for posting about this issue. This is something that we have
> >> been experiencing (stale exclusive locks) with our OpenStack and Ceph
> >> cloud more frequently as our datacentre has had some reliability
> >> issues recently with power and cooling causing several unexpected
> >> shutdowns.
> >>
> >> At this point we are on Ceph Mimic 13.2.6 and reading through this
> >> thread and related links I just wanted to confirm if I have the
> >> correct caps for cinder clients as listed below as we have upgraded
> >> through many major Ceph versions over the years and I'm sure a lot of
> >> our configs and settings still contain deprecated options.
> >>
> >> client.cinder
> >> key: sanitized==
> >> caps: [mgr] allow r
> >> caps: [mon] profile rbd
> >> caps: [osd] allow class-read object_prefix rbd_children, profile rbd
> >> pool=volumes, profile rbd pool=vms, profile rbd pool=images
> >
> > Only use "profile rbd" for 'mon' and 'osd' caps -- it's documented
> > here [1]. Once you use 'profile rbd', you don't need the extra "allow
> > class-read object_prefix rbd_children" since it is included within the
> > profile (along with other things like support for clone v2). Octopus
> > will also include "profile rbd" for the 'mgr' cap to support the new
> > functionality in the "rbd_support" manager module (like running "rbd
> > perf image top" w/o the admin caps).
> >
> >> From what I read, the blacklist permission was something that was
> >> supposed to be applied pre-Luminous upgrade but once you are on
> >> Luminous or later, it's no longer needed assuming you have switched to
> >> using the rbd profile.
> >
> > Correct. The "blacklist" permission was an intermediate state
> > pre-upgrade since your older OSDs wouldn't have support for "profile
> > rbd" yet but Luminous OSDs started to enforce caps on the 'blacklist
> > add' op so that rogue users w/ read-only permissions couldn't just
> > blacklist all clients. Once you are at Luminous or later, you can just
> > use the profile.
>
> OK, great. This gives me something to start with for a doc patch.
> Thanks! However, I'm still curious about this bit:
>
> >> On Fri, Nov 15, 2019 at 11:05 AM Paul Emmerich  
> >> wrote:
> >>> * This is unrelated to openstack and will happen with *any* reasonably
> >>> configured hypervisor that uses exclusive locking
>
> What, exactly, is the "reasonably configured hypervisor" here, in other
> words, what is it that grabs and releases this lock? It's evidently not
> Nova that does this, but is it libvirt, or Qemu/KVM, and if so, what
> magic in there makes this happen, and what "reasonable configuration"
> influences this?

librbd and krbd perform this logic when the exclusive-lock feature is
enabled. In this case, librbd sees that the previous lock owner is
dead / missing, but before it can steal the lock (since librbd did not
cleanly close the image), it needs to ensure it cannot come back from
the dead to issue future writes against the RBD image by blacklisting
it from the cluster.

> Thanks again!
>
> Cheers,
> Florian
>


-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Jason Dillaman
On Tue, Nov 19, 2019 at 1:51 PM shubjero  wrote:
>
> Florian,
>
> Thanks for posting about this issue. This is something that we have
> been experiencing (stale exclusive locks) with our OpenStack and Ceph
> cloud more frequently as our datacentre has had some reliability
> issues recently with power and cooling causing several unexpected
> shutdowns.
>
> At this point we are on Ceph Mimic 13.2.6 and reading through this
> thread and related links I just wanted to confirm if I have the
> correct caps for cinder clients as listed below as we have upgraded
> through many major Ceph versions over the years and I'm sure a lot of
> our configs and settings still contain deprecated options.
>
> client.cinder
> key: sanitized==
> caps: [mgr] allow r
> caps: [mon] profile rbd
> caps: [osd] allow class-read object_prefix rbd_children, profile rbd
> pool=volumes, profile rbd pool=vms, profile rbd pool=images

Only use "profile rbd" for 'mon' and 'osd' caps -- it's documented
here [1]. Once you use 'profile rbd', you don't need the extra "allow
class-read object_prefix rbd_children" since it is included within the
profile (along with other things like support for clone v2). Octopus
will also include "profile rbd" for the 'mgr' cap to support the new
functionality in the "rbd_support" manager module (like running "rbd
perf image top" w/o the admin caps).

> From what I read, the blacklist permission was something that was
> supposed to be applied pre-Luminous upgrade but once you are on
> Luminous or later, it's no longer needed assuming you have switched to
> using the rbd profile.

Correct. The "blacklist" permission was an intermediate state
pre-upgrade since your older OSDs wouldn't have support for "profile
rbd" yet but Luminous OSDs started to enforce caps on the 'blacklist
add' op so that rogue users w/ read-only permissions couldn't just
blacklist all clients. Once you are at Luminous or later, you can just
use the profile.

> On Fri, Nov 15, 2019 at 11:05 AM Paul Emmerich  wrote:
> >
> > To clear up a few misconceptions here:
> >
> > * RBD keyrings should use the "profile rbd" permissions, everything
> > else is *wrong* and should be fixed asap
> > * Manually adding the blacklist permission might work but isn't
> > future-proof, fix the keyring instead
> > * The suggestion to mount them elsewhere to fix this only works
> > because "elsewhere" probably has an admin keyring, this is a bad
> > work-around, fix the keyring instead
> > * This is unrelated to openstack and will happen with *any* reasonably
> > configured hypervisor that uses exclusive locking
> >
> > This problem usually happens after upgrading to Luminous without
> > reading the change log. The change log tells you to adjust the keyring
> > permissions accordingly
> >
> > Paul
> >
> > --
> > Paul Emmerich
> >
> > Looking for help with your Ceph cluster? Contact us at https://croit.io
> >
> > croit GmbH
> > Freseniusstr. 31h
> > 81247 München
> > www.croit.io
> > Tel: +49 89 1896585 90
> >
> > On Fri, Nov 15, 2019 at 4:56 PM Joshua M. Boniface  
> > wrote:
> > >
> > > Thanks Simon! I've implemented it, I guess I'll test it out next time my 
> > > homelab's power dies :-)
> > >
> > > On 2019-11-15 10:54 a.m., Simon Ironside wrote:
> > >
> > > On 15/11/2019 15:44, Joshua M. Boniface wrote:
> > >
> > > Hey All:
> > >
> > > I've also quite frequently experienced this sort of issue with my Ceph 
> > > RBD-backed QEMU/KVM
> > >
> > > cluster (not OpenStack specifically). Should this workaround of allowing 
> > > the 'osd blacklist'
> > >
> > > command in the caps help in that scenario as well, or is this an 
> > > OpenStack-specific
> > >
> > > functionality?
> > >
> > > Yes, my use case is RBD backed QEMU/KVM too, not Openstack. It's
> > > required for all RBD clients.
> > >
> > > Simon
> > >
> > > ___
> > >
> > > ceph-users mailing list
> > >
> > > ceph-users@lists.ceph.com
> > >
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] 
https://docs.ceph.com/docs/master/rbd/rbd-openstack/#setup-ceph-client-authentication

-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iscsi resize -vmware datastore cannot increase size

2019-10-25 Thread Jason Dillaman
On Fri, Oct 25, 2019 at 9:49 AM Steven Vacaroaia  wrote:
>
> Thanks for your prompt response
> Unfortunately , still no luck
> Device shows with correct size under "Device backing" but not showing at all  
> under "increase datastore capacity)
>
> resize rbd.rep01 7T
> ok
> /disks> ls
> o- disks 
> .
>  [13.0T, Disks: 2]
>   o- rbd.rep01 
> 
>  [rep01 (7T)]

Did you rescan the LUNs in VMware after this latest resize attempt?
What kernel and tcmu-runner version are you using?

> On Fri, 25 Oct 2019 at 09:24, Jason Dillaman  wrote:
>>
>> On Fri, Oct 25, 2019 at 9:13 AM Steven Vacaroaia  wrote:
>> >
>> > Hi,
>> > I am trying to increase size of a datastore made available through ceph 
>> > iscsi rbd
>> > The steps I followed are depicted below
>> > Basically gwcli report correct data and even VMware device capacity is 
>> > correct but when tried to increase it there is no device listed
>> >
>> > I am using ceph-iscsi-config-2.6-42.gccca57d.el7 and ceph 13.2.2
>> >
>> > Any guidance/help will be appreciated
>> >
>> > 1. increase rbd size
>> > rbd -p rbd resize --size 6T rep01
>> > Resizing image: 100% complete...done.
>>
>> Never resize the RBD images backing a LUN via the "rbd" CLI -- use
>> "gwcli" to resize the images and it will handle resizing the LUNs.
>>
>> > 2. restart gwcli
>> >  systemctl restart rbd-target-gw &&  systemctl restart rbd-target-api
>> >
>> > 3 check size
>> >  gwcli
>> > /iscsi-target...go-ceph/hosts> ls
>> > o- hosts 
>> > 
>> >  [Hosts: 8: Auth: CHAP]
>> >   o- iqn.1998-01.com.vmware:vsan5-66c18541 
>> >  [LOGGED-IN, Auth: CHAP, 
>> > Disks: 2(12.0T)]
>> >   | o- lun 0 
>> > 
>> >  [rbd.vmware01(6.0T), Owner: osd01]
>> >   | o- lun 1 
>> > ...
>> >  [rbd.rep01(6.0T), Owner: osd02]
>> >
>> > 4. VMware rescan devices
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Jason
>>


-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iscsi resize -vmware datastore cannot increase size

2019-10-25 Thread Jason Dillaman
On Fri, Oct 25, 2019 at 9:13 AM Steven Vacaroaia  wrote:
>
> Hi,
> I am trying to increase size of a datastore made available through ceph iscsi 
> rbd
> The steps I followed are depicted below
> Basically gwcli report correct data and even VMware device capacity is 
> correct but when tried to increase it there is no device listed
>
> I am using ceph-iscsi-config-2.6-42.gccca57d.el7 and ceph 13.2.2
>
> Any guidance/help will be appreciated
>
> 1. increase rbd size
> rbd -p rbd resize --size 6T rep01
> Resizing image: 100% complete...done.

Never resize the RBD images backing a LUN via the "rbd" CLI -- use
"gwcli" to resize the images and it will handle resizing the LUNs.

> 2. restart gwcli
>  systemctl restart rbd-target-gw &&  systemctl restart rbd-target-api
>
> 3 check size
>  gwcli
> /iscsi-target...go-ceph/hosts> ls
> o- hosts 
> 
>  [Hosts: 8: Auth: CHAP]
>   o- iqn.1998-01.com.vmware:vsan5-66c18541 
>  [LOGGED-IN, Auth: CHAP, 
> Disks: 2(12.0T)]
>   | o- lun 0 
> 
>  [rbd.vmware01(6.0T), Owner: osd01]
>   | o- lun 1 
> ...
>  [rbd.rep01(6.0T), Owner: osd02]
>
> 4. VMware rescan devices
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph iscsi question

2019-10-17 Thread Jason Dillaman
Have you updated your "/etc/multipath.conf" as documented here [1]?
You should have ALUA configured but it doesn't appear that's the case
w/ your provided output.

On Wed, Oct 16, 2019 at 11:36 PM 展荣臻(信泰)  wrote:
>
>
>
>
> > -----原始邮件-----
> > 发件人: "Jason Dillaman" 
> > 发送时间: 2019-10-17 09:54:30 (星期四)
> > 收件人: "展荣臻(信泰)" 
> > 抄送: dillaman , ceph-users 
> > 主题: Re: [ceph-users] ceph iscsi question
> >
> > On Wed, Oct 16, 2019 at 9:52 PM 展荣臻(信泰)  wrote:
> > >
> > >
> > >
> > >
> > > > -原始邮件-
> > > > 发件人: "Jason Dillaman" 
> > > > 发送时间: 2019-10-16 20:33:47 (星期三)
> > > > 收件人: "展荣臻(信泰)" 
> > > > 抄送: ceph-users 
> > > > 主题: Re: [ceph-users] ceph iscsi question
> > > >
> > > > On Wed, Oct 16, 2019 at 2:35 AM 展荣臻(信泰)  
> > > > wrote:
> > > > >
> > > > > hi,all
> > > > >   we deploy ceph with ceph-ansible.osds,mons and daemons of iscsi 
> > > > > runs in docker.
> > > > >   I create iscsi target according to 
> > > > > https://docs.ceph.com/docs/luminous/rbd/iscsi-target-cli/.
> > > > >   I discovered and logined iscsi target on another host,as show below:
> > > > >
> > > > > [root@node1 tmp]# iscsiadm -m discovery -t sendtargets -p 
> > > > > 192.168.42.110
> > > > > 192.168.42.110:3260,1 iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw
> > > > > 192.168.42.111:3260,2 iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw
> > > > > [root@node1 tmp]# iscsiadm -m node -T 
> > > > > iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw -p 192.168.42.110 -l
> > > > > Logging in to [iface: default, target: 
> > > > > iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw, portal: 
> > > > > 192.168.42.110,3260] (multiple)
> > > > > Login to [iface: default, target: 
> > > > > iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw, portal: 
> > > > > 192.168.42.110,3260] successful.
> > > > >
> > > > >  /dev/sde is mapped,when i mkfs.xfs -f /dev/sde, an Error occur,
> > > > >
> > > > > [root@node1 tmp]# mkfs.xfs -f /dev/sde
> > > > > meta-data=/dev/sde   isize=512agcount=4, 
> > > > > agsize=1966080 blks
> > > > >  =   sectsz=512   attr=2, projid32bit=1
> > > > >  =   crc=1finobt=0, sparse=0
> > > > > data =   bsize=4096   blocks=7864320, 
> > > > > imaxpct=25
> > > > >  =   sunit=0  swidth=0 blks
> > > > > naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
> > > > > log  =internal log   bsize=4096   blocks=3840, version=2
> > > > >  =   sectsz=512   sunit=0 blks, 
> > > > > lazy-count=1
> > > > > realtime =none   extsz=4096   blocks=0, rtextents=0
> > > > > existing superblock read failed: Input/output error
> > > > > mkfs.xfs: pwrite64 failed: Input/output error
> > > > >
> > > > > message in /var/log/messages:
> > > > > Oct 16 14:01:44 localhost kernel: Dev sde: unable to read RDB block 0
> > > > > Oct 16 14:01:44 localhost kernel: sde: unable to read partition table
> > > > > Oct 16 14:02:17 localhost kernel: Dev sde: unable to read RDB block 0
> > > > > Oct 16 14:02:17 localhost kernel: sde: unable to read partition table
> > > > >
> > > > > we use Luminous ceph.
> > > > > what cause this error? how debug it.any suggestion is appreciative.
> > > >
> > > > Please use the associated multipath device, not the raw block device.
> > > >
> > > hi,Jason
> > >   Thanks for your reply
> > >   The multipath device is the same error as raw block device.
> > >
> >
> > What does "multipath -ll" show?
> >
> [root@node1 ~]# multipath -ll
> mpathf (36001405366100aeda2044f286329b57a) dm-2 LIO-ORG ,TCMU device
> size=30G features='0' hwhandler='0' wp=rw
> |-+- policy='service-time 0' prio=0 status=enabled
> | `- 13:0:0:0 sde 8:64 failed faulty running
> `-+- policy='service-time 0' prio=0 status=enabled
>   `- 14:0:0:0 sdf 8:80 failed faulty running
> [root@node1 ~]#
>
> I don't know if it is related to that our all daemons run in docker while 
> docker runs on kvm.
>
>
>
>
>
>
>

[1] https://docs.ceph.com/ceph-prs/30912/rbd/iscsi-initiator-linux/

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph iscsi question

2019-10-16 Thread Jason Dillaman
On Wed, Oct 16, 2019 at 9:52 PM 展荣臻(信泰)  wrote:
>
>
>
>
> > -原始邮件-----
> > 发件人: "Jason Dillaman" 
> > 发送时间: 2019-10-16 20:33:47 (星期三)
> > 收件人: "展荣臻(信泰)" 
> > 抄送: ceph-users 
> > 主题: Re: [ceph-users] ceph iscsi question
> >
> > On Wed, Oct 16, 2019 at 2:35 AM 展荣臻(信泰)  wrote:
> > >
> > > hi,all
> > >   we deploy ceph with ceph-ansible.osds,mons and daemons of iscsi runs in 
> > > docker.
> > >   I create iscsi target according to 
> > > https://docs.ceph.com/docs/luminous/rbd/iscsi-target-cli/.
> > >   I discovered and logined iscsi target on another host,as show below:
> > >
> > > [root@node1 tmp]# iscsiadm -m discovery -t sendtargets -p 192.168.42.110
> > > 192.168.42.110:3260,1 iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw
> > > 192.168.42.111:3260,2 iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw
> > > [root@node1 tmp]# iscsiadm -m node -T 
> > > iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw -p 192.168.42.110 -l
> > > Logging in to [iface: default, target: 
> > > iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw, portal: 192.168.42.110,3260] 
> > > (multiple)
> > > Login to [iface: default, target: 
> > > iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw, portal: 192.168.42.110,3260] 
> > > successful.
> > >
> > >  /dev/sde is mapped,when i mkfs.xfs -f /dev/sde, an Error occur,
> > >
> > > [root@node1 tmp]# mkfs.xfs -f /dev/sde
> > > meta-data=/dev/sde   isize=512agcount=4, agsize=1966080 
> > > blks
> > >  =   sectsz=512   attr=2, projid32bit=1
> > >  =   crc=1finobt=0, sparse=0
> > > data =   bsize=4096   blocks=7864320, imaxpct=25
> > >  =   sunit=0  swidth=0 blks
> > > naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
> > > log  =internal log   bsize=4096   blocks=3840, version=2
> > >  =   sectsz=512   sunit=0 blks, lazy-count=1
> > > realtime =none   extsz=4096   blocks=0, rtextents=0
> > > existing superblock read failed: Input/output error
> > > mkfs.xfs: pwrite64 failed: Input/output error
> > >
> > > message in /var/log/messages:
> > > Oct 16 14:01:44 localhost kernel: Dev sde: unable to read RDB block 0
> > > Oct 16 14:01:44 localhost kernel: sde: unable to read partition table
> > > Oct 16 14:02:17 localhost kernel: Dev sde: unable to read RDB block 0
> > > Oct 16 14:02:17 localhost kernel: sde: unable to read partition table
> > >
> > > we use Luminous ceph.
> > > what cause this error? how debug it.any suggestion is appreciative.
> >
> > Please use the associated multipath device, not the raw block device.
> >
> hi,Jason
>   Thanks for your reply
>   The multipath device is the same error as raw block device.
>

What does "multipath -ll" show?

>
>


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph iscsi question

2019-10-16 Thread Jason Dillaman
On Wed, Oct 16, 2019 at 2:35 AM 展荣臻(信泰)  wrote:
>
> hi,all
>   we deploy ceph with ceph-ansible.osds,mons and daemons of iscsi runs in 
> docker.
>   I create iscsi target according to 
> https://docs.ceph.com/docs/luminous/rbd/iscsi-target-cli/.
>   I discovered and logined iscsi target on another host,as show below:
>
> [root@node1 tmp]# iscsiadm -m discovery -t sendtargets -p 192.168.42.110
> 192.168.42.110:3260,1 iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw
> 192.168.42.111:3260,2 iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw
> [root@node1 tmp]# iscsiadm -m node -T 
> iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw -p 192.168.42.110 -l
> Logging in to [iface: default, target: 
> iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw, portal: 192.168.42.110,3260] 
> (multiple)
> Login to [iface: default, target: iqn.2003-01.com.teamsun.iscsi-gw:iscsi-igw, 
> portal: 192.168.42.110,3260] successful.
>
>  /dev/sde is mapped,when i mkfs.xfs -f /dev/sde, an Error occur,
>
> [root@node1 tmp]# mkfs.xfs -f /dev/sde
> meta-data=/dev/sde   isize=512agcount=4, agsize=1966080 blks
>  =   sectsz=512   attr=2, projid32bit=1
>  =   crc=1finobt=0, sparse=0
> data =   bsize=4096   blocks=7864320, imaxpct=25
>  =   sunit=0  swidth=0 blks
> naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
> log  =internal log   bsize=4096   blocks=3840, version=2
>  =   sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none   extsz=4096   blocks=0, rtextents=0
> existing superblock read failed: Input/output error
> mkfs.xfs: pwrite64 failed: Input/output error
>
> message in /var/log/messages:
> Oct 16 14:01:44 localhost kernel: Dev sde: unable to read RDB block 0
> Oct 16 14:01:44 localhost kernel: sde: unable to read partition table
> Oct 16 14:02:17 localhost kernel: Dev sde: unable to read RDB block 0
> Oct 16 14:02:17 localhost kernel: sde: unable to read partition table
>
> we use Luminous ceph.
> what cause this error? how debug it.any suggestion is appreciative.

Please use the associated multipath device, not the raw block device.

>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tcmu-runner: mismatched sizes for rbd image size

2019-10-02 Thread Jason Dillaman
On Wed, Oct 2, 2019 at 9:50 AM Kilian Ries  wrote:
>
> Hi,
>
>
> i'm running a ceph mimic cluster with 4x ISCSI gateway nodes. Cluster was 
> setup via ceph-ansible v3.2-stable. I just checked my nodes and saw that only 
> two of the four configured iscsi gw nodes are working correct. I first 
> noticed via gwcli:
>
>
> ###
>
>
> $gwcli -d ls
>
> Traceback (most recent call last):
>
>   File "/usr/bin/gwcli", line 191, in 
>
> main()
>
>   File "/usr/bin/gwcli", line 103, in main
>
> root_node.refresh()
>
>   File "/usr/lib/python2.7/site-packages/gwcli/gateway.py", line 87, in 
> refresh
>
> raise GatewayError
>
> gwcli.utils.GatewayError
>
>
> ###
>
>
> I investigated and noticed that both "rbd-target-api" and "rbd-target-gw" are 
> not running. I were not able to restart them via systemd. I then found that 
> even tcmu-runner is not running and it exits with the following error:
>
>
>
> ###
>
>
> tcmu_rbd_check_image_size:827 rbd/production.lun1: Mismatched sizes. RBD 
> image size 5498631880704. Requested new size 5497558138880.
>
>
> ###
>
>
> Now i have the situation that two nodes are running correct and two cant 
> start tcmu-runner. I don't know where the image size mismatches are coming 
> from - i haven't configured or resized any of the images.
>
>
> Is there any chance to get my two iscsi gw nodes back working?

It sounds like you are potentially hitting [1]. The ceph-iscsi-config
library thinks your image size is 5TiB but you actually have a 5121GiB
(~5.001TiB) RBD image. Any clue how your RBD image got to be 1GiB
larger than an even 5TiB?

>
>
>
> The following packets are installed:
>
>
> rpm -qa |egrep "ceph|iscsi|tcmu|rst|kernel"
>
>
> libtcmu-1.4.0-106.gd17d24e.el7.x86_64
>
> ceph-iscsi-cli-2.7-2.7.el7.noarch
>
> kernel-3.10.0-957.5.1.el7.x86_64
>
> ceph-base-13.2.5-0.el7.x86_64
>
> ceph-iscsi-config-2.6-2.6.el7.noarch
>
> ceph-common-13.2.5-0.el7.x86_64
>
> ceph-selinux-13.2.5-0.el7.x86_64
>
> kernel-tools-libs-3.10.0-957.5.1.el7.x86_64
>
> python-cephfs-13.2.5-0.el7.x86_64
>
> ceph-osd-13.2.5-0.el7.x86_64
>
> kernel-headers-3.10.0-957.5.1.el7.x86_64
>
> kernel-tools-3.10.0-957.5.1.el7.x86_64
>
> kernel-3.10.0-957.1.3.el7.x86_64
>
> libcephfs2-13.2.5-0.el7.x86_64
>
> kernel-3.10.0-862.14.4.el7.x86_64
>
> tcmu-runner-1.4.0-106.gd17d24e.el7.x86_64
>
>
>
> Thanks,
>
> Greets
>
>
> Kilian
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] https://github.com/ceph/ceph-iscsi-config/pull/68

-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] please fix ceph-iscsi yum repo

2019-09-30 Thread Jason Dillaman
On Fri, Sep 27, 2019 at 5:18 AM Matthias Leopold
 wrote:
>
>
> Hi,
>
> I was positively surprised to to see ceph-iscsi-3.3 available today.
> Unfortunately there's an error when trying to install it from yum repo:
>
> ceph-iscsi-3.3-1.el7.noarch.rp FAILED
> 100%
> [==]
>   0.0 B/s | 200 kB  --:--:-- ETA
> http://download.ceph.com/ceph-iscsi/3/rpm/el7/noarch/ceph-iscsi-3.3-1.el7.noarch.rpm:
> [Errno -1] Package does not match intended download. Suggestion: run yum
> --enablerepo=ceph-iscsi clean metadata
>
> "yum --enablerepo=ceph-iscsi clean metadata" does not fix it
>
> I know there are other ways to install it, but since I'm close to
> putting my iscsi gateway into production I want to be "clean" (and I'm a
> bit impatient, sorry...)

This should hopefully be fixed already. Let us know if you are still
having issues.

> thx
> matthias
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph RBD Mirroring

2019-09-14 Thread Jason Dillaman
I was able to repeat this issue locally by restarting the primary OSD
for the "rbd_mirroring" object. It seems that a regression was
introduced w/ the introduction of Ceph msgr2 in that upon reconnect,
the connection type for the client switches from ANY to V2 -- but only
for the watcher session and not the status updates. I've opened a
tracker ticker for this issue [1].

Thanks.

On Fri, Sep 13, 2019 at 12:44 PM Oliver Freyermuth
 wrote:
>
> Am 13.09.19 um 18:38 schrieb Jason Dillaman:
> > On Fri, Sep 13, 2019 at 11:30 AM Oliver Freyermuth
> >  wrote:
> >>
> >> Am 13.09.19 um 17:18 schrieb Jason Dillaman:
> >>> On Fri, Sep 13, 2019 at 10:41 AM Oliver Freyermuth
> >>>  wrote:
> >>>>
> >>>> Am 13.09.19 um 16:30 schrieb Jason Dillaman:
> >>>>> On Fri, Sep 13, 2019 at 10:17 AM Jason Dillaman  
> >>>>> wrote:
> >>>>>>
> >>>>>> On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>> Dear Jason,
> >>>>>>>
> >>>>>>> thanks for the very detailed explanation! This was very instructive.
> >>>>>>> Sadly, the watchers look correct - see details inline.
> >>>>>>>
> >>>>>>> Am 13.09.19 um 15:02 schrieb Jason Dillaman:
> >>>>>>>> On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth
> >>>>>>>>  wrote:
> >>>>>>>>>
> >>>>>>>>> Dear Jason,
> >>>>>>>>>
> >>>>>>>>> thanks for taking care and developing a patch so quickly!
> >>>>>>>>>
> >>>>>>>>> I have another strange observation to share. In our test setup, 
> >>>>>>>>> only a single RBD mirroring daemon is running for 51 images.
> >>>>>>>>> It works fine with a constant stream of 1-2 MB/s, but at some point 
> >>>>>>>>> after roughly 20 hours, _all_ images go to this interesting state:
> >>>>>>>>> -
> >>>>>>>>> # rbd mirror image status test-vm.X-disk2
> >>>>>>>>> test-vm.X-disk2:
> >>>>>>>>>   global_id:   XXX
> >>>>>>>>>   state:   down+replaying
> >>>>>>>>>   description: replaying, master_position=[object_number=14, 
> >>>>>>>>> tag_tid=6, entry_tid=6338], mirror_position=[object_number=14, 
> >>>>>>>>> tag_tid=6, entry_tid=6338], entries_behind_master=0
> >>>>>>>>>   last_update: 2019-09-13 03:45:43
> >>>>>>>>> -
> >>>>>>>>> Running this command several times, I see entry_tid increasing at 
> >>>>>>>>> both ends, so mirroring seems to be working just fine.
> >>>>>>>>>
> >>>>>>>>> However:
> >>>>>>>>> -
> >>>>>>>>> # rbd mirror pool status
> >>>>>>>>> health: WARNING
> >>>>>>>>> images: 51 total
> >>>>>>>>> 51 unknown
> >>>>>>>>> -
> >>>>>>>>> The health warning is not visible in the dashboard (also not in the 
> >>>>>>>>> mirroring menu), the daemon still seems to be running, dropped 
> >>>>>>>>> nothing in the logs,
> >>>>>>>>> and claims to be "ok" in the dashboard - it's only that all images 
> >>>>>>>>> show up in unknown state even though all seems to be working fine.
> >>>>>>>>>
> >>>>>>>>> Any idea on how to debug this?
> >>>>>>>>> When I restart the rbd-mirror service, all images come back as 
> >>>>>>>>> green. I already encountered this twice in 3 days.
> >>>>>>>>
> >>>>>>>> The dashboard relies on the rbd-mirror daemon to provide it errors 
> >>>>>>>> and
> >>>>>>>> warn

Re: [ceph-users] Ceph RBD Mirroring

2019-09-13 Thread Jason Dillaman
On Fri, Sep 13, 2019 at 11:30 AM Oliver Freyermuth
 wrote:
>
> Am 13.09.19 um 17:18 schrieb Jason Dillaman:
> > On Fri, Sep 13, 2019 at 10:41 AM Oliver Freyermuth
> >  wrote:
> >>
> >> Am 13.09.19 um 16:30 schrieb Jason Dillaman:
> >>> On Fri, Sep 13, 2019 at 10:17 AM Jason Dillaman  
> >>> wrote:
> >>>>
> >>>> On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth
> >>>>  wrote:
> >>>>>
> >>>>> Dear Jason,
> >>>>>
> >>>>> thanks for the very detailed explanation! This was very instructive.
> >>>>> Sadly, the watchers look correct - see details inline.
> >>>>>
> >>>>> Am 13.09.19 um 15:02 schrieb Jason Dillaman:
> >>>>>> On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>> Dear Jason,
> >>>>>>>
> >>>>>>> thanks for taking care and developing a patch so quickly!
> >>>>>>>
> >>>>>>> I have another strange observation to share. In our test setup, only 
> >>>>>>> a single RBD mirroring daemon is running for 51 images.
> >>>>>>> It works fine with a constant stream of 1-2 MB/s, but at some point 
> >>>>>>> after roughly 20 hours, _all_ images go to this interesting state:
> >>>>>>> -
> >>>>>>> # rbd mirror image status test-vm.X-disk2
> >>>>>>> test-vm.X-disk2:
> >>>>>>>  global_id:   XXX
> >>>>>>>  state:   down+replaying
> >>>>>>>  description: replaying, master_position=[object_number=14, 
> >>>>>>> tag_tid=6, entry_tid=6338], mirror_position=[object_number=14, 
> >>>>>>> tag_tid=6, entry_tid=6338], entries_behind_master=0
> >>>>>>>  last_update: 2019-09-13 03:45:43
> >>>>>>> -
> >>>>>>> Running this command several times, I see entry_tid increasing at 
> >>>>>>> both ends, so mirroring seems to be working just fine.
> >>>>>>>
> >>>>>>> However:
> >>>>>>> -
> >>>>>>> # rbd mirror pool status
> >>>>>>> health: WARNING
> >>>>>>> images: 51 total
> >>>>>>>51 unknown
> >>>>>>> -
> >>>>>>> The health warning is not visible in the dashboard (also not in the 
> >>>>>>> mirroring menu), the daemon still seems to be running, dropped 
> >>>>>>> nothing in the logs,
> >>>>>>> and claims to be "ok" in the dashboard - it's only that all images 
> >>>>>>> show up in unknown state even though all seems to be working fine.
> >>>>>>>
> >>>>>>> Any idea on how to debug this?
> >>>>>>> When I restart the rbd-mirror service, all images come back as green. 
> >>>>>>> I already encountered this twice in 3 days.
> >>>>>>
> >>>>>> The dashboard relies on the rbd-mirror daemon to provide it errors and
> >>>>>> warnings. You can see the status reported by rbd-mirror by running
> >>>>>> "ceph service status":
> >>>>>>
> >>>>>> $ ceph service status
> >>>>>> {
> >>>>>>"rbd-mirror": {
> >>>>>>"4152": {
> >>>>>>"status_stamp": "2019-09-13T08:58:41.937491-0400",
> >>>>>>"last_beacon": "2019-09-13T08:58:41.937491-0400",
> >>>>>>"status": {
> >>>>>>"json":
> >>>>>> "{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\"

Re: [ceph-users] Ceph RBD Mirroring

2019-09-13 Thread Jason Dillaman
On Fri, Sep 13, 2019 at 10:41 AM Oliver Freyermuth
 wrote:
>
> Am 13.09.19 um 16:30 schrieb Jason Dillaman:
> > On Fri, Sep 13, 2019 at 10:17 AM Jason Dillaman  wrote:
> >>
> >> On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth
> >>  wrote:
> >>>
> >>> Dear Jason,
> >>>
> >>> thanks for the very detailed explanation! This was very instructive.
> >>> Sadly, the watchers look correct - see details inline.
> >>>
> >>> Am 13.09.19 um 15:02 schrieb Jason Dillaman:
> >>>> On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth
> >>>>  wrote:
> >>>>>
> >>>>> Dear Jason,
> >>>>>
> >>>>> thanks for taking care and developing a patch so quickly!
> >>>>>
> >>>>> I have another strange observation to share. In our test setup, only a 
> >>>>> single RBD mirroring daemon is running for 51 images.
> >>>>> It works fine with a constant stream of 1-2 MB/s, but at some point 
> >>>>> after roughly 20 hours, _all_ images go to this interesting state:
> >>>>> -
> >>>>> # rbd mirror image status test-vm.X-disk2
> >>>>> test-vm.X-disk2:
> >>>>> global_id:   XXX
> >>>>> state:   down+replaying
> >>>>> description: replaying, master_position=[object_number=14, 
> >>>>> tag_tid=6, entry_tid=6338], mirror_position=[object_number=14, 
> >>>>> tag_tid=6, entry_tid=6338], entries_behind_master=0
> >>>>> last_update: 2019-09-13 03:45:43
> >>>>> -
> >>>>> Running this command several times, I see entry_tid increasing at both 
> >>>>> ends, so mirroring seems to be working just fine.
> >>>>>
> >>>>> However:
> >>>>> -
> >>>>> # rbd mirror pool status
> >>>>> health: WARNING
> >>>>> images: 51 total
> >>>>>   51 unknown
> >>>>> -
> >>>>> The health warning is not visible in the dashboard (also not in the 
> >>>>> mirroring menu), the daemon still seems to be running, dropped nothing 
> >>>>> in the logs,
> >>>>> and claims to be "ok" in the dashboard - it's only that all images show 
> >>>>> up in unknown state even though all seems to be working fine.
> >>>>>
> >>>>> Any idea on how to debug this?
> >>>>> When I restart the rbd-mirror service, all images come back as green. I 
> >>>>> already encountered this twice in 3 days.
> >>>>
> >>>> The dashboard relies on the rbd-mirror daemon to provide it errors and
> >>>> warnings. You can see the status reported by rbd-mirror by running
> >>>> "ceph service status":
> >>>>
> >>>> $ ceph service status
> >>>> {
> >>>>   "rbd-mirror": {
> >>>>   "4152": {
> >>>>   "status_stamp": "2019-09-13T08:58:41.937491-0400",
> >>>>   "last_beacon": "2019-09-13T08:58:41.937491-0400",
> >>>>   "status": {
> >>>>   "json":
> >>>> "{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\",\"leader\":true},\"2\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4156\",\"leader\":true}}"
> >>>>   }
> >>>>   }
> >>>>   }
> >>>> }
> >>>>
> >>>> In your case, most likely it seems like rbd-mirror thinks all is good
> >>>> with the world so it's not reporting any errors.
> >

Re: [ceph-users] Ceph RBD Mirroring

2019-09-13 Thread Jason Dillaman
On Fri, Sep 13, 2019 at 10:17 AM Jason Dillaman  wrote:
>
> On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth
>  wrote:
> >
> > Dear Jason,
> >
> > thanks for the very detailed explanation! This was very instructive.
> > Sadly, the watchers look correct - see details inline.
> >
> > Am 13.09.19 um 15:02 schrieb Jason Dillaman:
> > > On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth
> > >  wrote:
> > >>
> > >> Dear Jason,
> > >>
> > >> thanks for taking care and developing a patch so quickly!
> > >>
> > >> I have another strange observation to share. In our test setup, only a 
> > >> single RBD mirroring daemon is running for 51 images.
> > >> It works fine with a constant stream of 1-2 MB/s, but at some point 
> > >> after roughly 20 hours, _all_ images go to this interesting state:
> > >> -
> > >> # rbd mirror image status test-vm.X-disk2
> > >> test-vm.X-disk2:
> > >>global_id:   XXX
> > >>state:   down+replaying
> > >>description: replaying, master_position=[object_number=14, tag_tid=6, 
> > >> entry_tid=6338], mirror_position=[object_number=14, tag_tid=6, 
> > >> entry_tid=6338], entries_behind_master=0
> > >>last_update: 2019-09-13 03:45:43
> > >> -
> > >> Running this command several times, I see entry_tid increasing at both 
> > >> ends, so mirroring seems to be working just fine.
> > >>
> > >> However:
> > >> -
> > >> # rbd mirror pool status
> > >> health: WARNING
> > >> images: 51 total
> > >>  51 unknown
> > >> -
> > >> The health warning is not visible in the dashboard (also not in the 
> > >> mirroring menu), the daemon still seems to be running, dropped nothing 
> > >> in the logs,
> > >> and claims to be "ok" in the dashboard - it's only that all images show 
> > >> up in unknown state even though all seems to be working fine.
> > >>
> > >> Any idea on how to debug this?
> > >> When I restart the rbd-mirror service, all images come back as green. I 
> > >> already encountered this twice in 3 days.
> > >
> > > The dashboard relies on the rbd-mirror daemon to provide it errors and
> > > warnings. You can see the status reported by rbd-mirror by running
> > > "ceph service status":
> > >
> > > $ ceph service status
> > > {
> > >  "rbd-mirror": {
> > >  "4152": {
> > >  "status_stamp": "2019-09-13T08:58:41.937491-0400",
> > >  "last_beacon": "2019-09-13T08:58:41.937491-0400",
> > >  "status": {
> > >  "json":
> > > "{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\",\"leader\":true},\"2\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4156\",\"leader\":true}}"
> > >  }
> > >  }
> > >  }
> > > }
> > >
> > > In your case, most likely it seems like rbd-mirror thinks all is good
> > > with the world so it's not reporting any errors.
> >
> > This is indeed the case:
> >
> > # ceph service status
> > {
> >  "rbd-mirror": {
> >  "84243": {
> >  "status_stamp": "2019-09-13 15:40:01.149815",
> >  "last_beacon": "2019-09-13 15:40:26.151381",
> >  "status": {
> >  "json": 
> > "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"im

Re: [ceph-users] Ceph RBD Mirroring

2019-09-13 Thread Jason Dillaman
On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth
 wrote:
>
> Dear Jason,
>
> thanks for the very detailed explanation! This was very instructive.
> Sadly, the watchers look correct - see details inline.
>
> Am 13.09.19 um 15:02 schrieb Jason Dillaman:
> > On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth
> >  wrote:
> >>
> >> Dear Jason,
> >>
> >> thanks for taking care and developing a patch so quickly!
> >>
> >> I have another strange observation to share. In our test setup, only a 
> >> single RBD mirroring daemon is running for 51 images.
> >> It works fine with a constant stream of 1-2 MB/s, but at some point after 
> >> roughly 20 hours, _all_ images go to this interesting state:
> >> -
> >> # rbd mirror image status test-vm.X-disk2
> >> test-vm.X-disk2:
> >>global_id:   XXX
> >>state:   down+replaying
> >>description: replaying, master_position=[object_number=14, tag_tid=6, 
> >> entry_tid=6338], mirror_position=[object_number=14, tag_tid=6, 
> >> entry_tid=6338], entries_behind_master=0
> >>last_update: 2019-09-13 03:45:43
> >> -
> >> Running this command several times, I see entry_tid increasing at both 
> >> ends, so mirroring seems to be working just fine.
> >>
> >> However:
> >> -
> >> # rbd mirror pool status
> >> health: WARNING
> >> images: 51 total
> >>  51 unknown
> >> -
> >> The health warning is not visible in the dashboard (also not in the 
> >> mirroring menu), the daemon still seems to be running, dropped nothing in 
> >> the logs,
> >> and claims to be "ok" in the dashboard - it's only that all images show up 
> >> in unknown state even though all seems to be working fine.
> >>
> >> Any idea on how to debug this?
> >> When I restart the rbd-mirror service, all images come back as green. I 
> >> already encountered this twice in 3 days.
> >
> > The dashboard relies on the rbd-mirror daemon to provide it errors and
> > warnings. You can see the status reported by rbd-mirror by running
> > "ceph service status":
> >
> > $ ceph service status
> > {
> >  "rbd-mirror": {
> >  "4152": {
> >  "status_stamp": "2019-09-13T08:58:41.937491-0400",
> >  "last_beacon": "2019-09-13T08:58:41.937491-0400",
> >  "status": {
> >  "json":
> > "{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\",\"leader\":true},\"2\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4156\",\"leader\":true}}"
> >  }
> >  }
> >  }
> > }
> >
> > In your case, most likely it seems like rbd-mirror thinks all is good
> > with the world so it's not reporting any errors.
>
> This is indeed the case:
>
> # ceph service status
> {
>  "rbd-mirror": {
>  "84243": {
>  "status_stamp": "2019-09-13 15:40:01.149815",
>  "last_beacon": "2019-09-13 15:40:26.151381",
>  "status": {
>  "json": 
> "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"84247\",\"leader\":true}}"
>  }
>  }
>  },
>  "rgw": {
> ...
>  }
> }
>
> > The "down" state indicates that the rbd-mirror daemon isn't correctly
> > watching the "rbd_mirroring" object in the pool. You can see who it
> > watching that object by running th

Re: [ceph-users] Ceph RBD Mirroring

2019-09-13 Thread Jason Dillaman
On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth
 wrote:
>
> Dear Jason,
>
> thanks for taking care and developing a patch so quickly!
>
> I have another strange observation to share. In our test setup, only a single 
> RBD mirroring daemon is running for 51 images.
> It works fine with a constant stream of 1-2 MB/s, but at some point after 
> roughly 20 hours, _all_ images go to this interesting state:
> -
> # rbd mirror image status test-vm.X-disk2
> test-vm.X-disk2:
>   global_id:   XXX
>   state:   down+replaying
>   description: replaying, master_position=[object_number=14, tag_tid=6, 
> entry_tid=6338], mirror_position=[object_number=14, tag_tid=6, 
> entry_tid=6338], entries_behind_master=0
>   last_update: 2019-09-13 03:45:43
> -
> Running this command several times, I see entry_tid increasing at both ends, 
> so mirroring seems to be working just fine.
>
> However:
> -
> # rbd mirror pool status
> health: WARNING
> images: 51 total
> 51 unknown
> -
> The health warning is not visible in the dashboard (also not in the mirroring 
> menu), the daemon still seems to be running, dropped nothing in the logs,
> and claims to be "ok" in the dashboard - it's only that all images show up in 
> unknown state even though all seems to be working fine.
>
> Any idea on how to debug this?
> When I restart the rbd-mirror service, all images come back as green. I 
> already encountered this twice in 3 days.

The dashboard relies on the rbd-mirror daemon to provide it errors and
warnings. You can see the status reported by rbd-mirror by running
"ceph service status":

$ ceph service status
{
"rbd-mirror": {
"4152": {
"status_stamp": "2019-09-13T08:58:41.937491-0400",
"last_beacon": "2019-09-13T08:58:41.937491-0400",
"status": {
"json":
"{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\",\"leader\":true},\"2\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4156\",\"leader\":true}}"
}
}
}
}

In your case, most likely it seems like rbd-mirror thinks all is good
with the world so it's not reporting any errors.

The "down" state indicates that the rbd-mirror daemon isn't correctly
watching the "rbd_mirroring" object in the pool. You can see who it
watching that object by running the "rados" "listwatchers" command:

$ rados -p  listwatchers rbd_mirroring
watcher=1.2.3.4:0/199388543 client.4154 cookie=94769010788992
watcher=1.2.3.4:0/199388543 client.4154 cookie=94769061031424

In my case, the "4154" from "client.4154" is the unique global id for
my connection to the cluster, which relates back to the "ceph service
status" dump which also shows status by daemon using the unique global
id.

> Any idea on this (or how I can extract more information)?
> I fear keeping high-level debug logs active for ~24h is not feasible.
>
> Cheers,
> Oliver
>
>
> On 2019-09-11 19:14, Jason Dillaman wrote:
> > On Wed, Sep 11, 2019 at 12:57 PM Oliver Freyermuth
> >  wrote:
> >>
> >> Dear Jason,
> >>
> >> I played a bit more with rbd mirroring and learned that deleting an image 
> >> at the source (or disabling journaling on it) immediately moves the image 
> >> to trash at the target -
> >> but setting rbd_mirroring_delete_delay helps to have some more grace time 
> >> to catch human mistakes.
> >>
> >> However, I have issues restoring such an image which has been moved to 
> >> trash by the RBD-mirror daemon as user:
> >> ---
> >> [root@mon001 ~]# rbd trash ls -la
> >> ID   NAME SOURCEDELETED_AT 
> >>   STATUS   PARENT
> >> d4fbe8f63905 test-vm-XX-disk2 MIRRORING Wed Sep 11 
> >> 18:43:14 2019 protected until Thu Sep 12 18:43:14 2019
> >

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-12 Thread Jason Dillaman
On Thu, Sep 12, 2019 at 3:31 AM Marc Schöchlin  wrote:
>
> Hello Jason,
>
> yesterday i started rbd-nbd in forground mode to see if there are any 
> additional informations.
>
> root@int-nfs-001:/etc/ceph# rbd-nbd map rbd_hdd/int-nfs-001_srv-ceph -d --id 
> nfs
> 2019-09-11 13:07:41.444534 77fe1040  0 ceph version 12.2.12 
> (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable), process 
> rbd-nbd, pid 14735
> 2019-09-11 13:07:41.444555 77fe1040  0 pidfile_write: ignore empty 
> --pid-file
> /dev/nbd0
> -
>
>
> 2019-09-11 21:31:03.126223 7fffc3fff700 -1 rbd-nbd: failed to read nbd 
> request header: (33) Numerical argument out of domain
>
> Whats that, have we seen that before? ("Numerical argument out of domain")

It's the error that rbd-nbd prints when the kernel prematurely closes
the socket ... and as we have already discussed, it's closing the
socket due to the IO timeout being hit ... and it's hitting the IO
timeout due to a deadlock due to memory pressure from rbd-nbd causing
IO to pushed from the XFS cache back down into rbd-nbd.

> Am 10.09.19 um 16:10 schrieb Jason Dillaman:
> > [Tue Sep 10 14:46:51 2019]  ? __schedule+0x2c5/0x850
> > [Tue Sep 10 14:46:51 2019]  kthread+0x121/0x140
> > [Tue Sep 10 14:46:51 2019]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> > [Tue Sep 10 14:46:51 2019]  ? kthread+0x121/0x140
> > [Tue Sep 10 14:46:51 2019]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> > [Tue Sep 10 14:46:51 2019]  ? kthread_park+0x90/0x90
> > [Tue Sep 10 14:46:51 2019]  ret_from_fork+0x35/0x40
> > Perhaps try it w/ ext4 instead of XFS?
>
> I can try that, but i am skeptical, i am note sure that we are searching on 
> the right place...
>
> Why?
> - we run hundreds of heavy use rbd-nbd instances in our xen dom-0 systems for 
> 1.5 years now
> - we never experienced problems like that in xen dom0 systems
> - as described these instances run 12.2.5 ceph components with kernel 4.4.0+10
> - the domU (virtual machines) are interacting heavily with that dom0 are 
> using various filesystems
>-> probably the architecture of the blktap components leads to different 
> io scenario : https://wiki.xenproject.org/wiki/Blktap

Are you running a XFS (or any) file system on top of the NBD block
device in dom0? I suspect you are just passing raw block devices to
the VMs and therefore they cannot see the same IO back pressure
feedback loop.

> Nevertheless i will try EXT4 on another system.
>
> Regards
> Marc
>


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph RBD Mirroring

2019-09-11 Thread Jason Dillaman
On Wed, Sep 11, 2019 at 12:57 PM Oliver Freyermuth
 wrote:
>
> Dear Jason,
>
> I played a bit more with rbd mirroring and learned that deleting an image at 
> the source (or disabling journaling on it) immediately moves the image to 
> trash at the target -
> but setting rbd_mirroring_delete_delay helps to have some more grace time to 
> catch human mistakes.
>
> However, I have issues restoring such an image which has been moved to trash 
> by the RBD-mirror daemon as user:
> ---
> [root@mon001 ~]# rbd trash ls -la
> ID   NAME SOURCEDELETED_AT
>STATUS   PARENT
> d4fbe8f63905 test-vm-XX-disk2 MIRRORING Wed Sep 11 18:43:14 
> 2019 protected until Thu Sep 12 18:43:14 2019
> [root@mon001 ~]# rbd trash restore --image foo-image d4fbe8f63905
> rbd: restore error: 2019-09-11 18:50:15.387 7f5fa9590b00 -1 
> librbd::api::Trash: restore: Current trash source: mirroring does not match 
> expected: user
> (22) Invalid argument
> ---
> This is issued on the mon, which has the client.admin key, so it should not 
> be a permission issue.
> It also fails when I try that in the Dashboard.
>
> Sadly, the error message is not clear enough for me to figure out what could 
> be the problem - do you see what I did wrong?

Good catch, it looks like we accidentally broke this in Nautilus when
image live-migration support was added. I've opened a new tracker
ticket to fix this [1].

> Cheers and thanks again,
> Oliver
>
> On 2019-09-10 23:17, Oliver Freyermuth wrote:
> > Dear Jason,
> >
> > On 2019-09-10 23:04, Jason Dillaman wrote:
> >> On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth
> >>  wrote:
> >>>
> >>> Dear Jason,
> >>>
> >>> On 2019-09-10 18:50, Jason Dillaman wrote:
> >>>> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth
> >>>>  wrote:
> >>>>>
> >>>>> Dear Cephalopodians,
> >>>>>
> >>>>> I have two questions about RBD mirroring.
> >>>>>
> >>>>> 1) I can not get it to work - my setup is:
> >>>>>
> >>>>>  - One cluster holding the live RBD volumes and snapshots, in pool 
> >>>>> "rbd", cluster name "ceph",
> >>>>>running latest Mimic.
> >>>>>I ran "rbd mirror pool enable rbd pool" on that cluster and 
> >>>>> created a cephx user "rbd_mirror" with (is there a better way?):
> >>>>>ceph auth get-or-create client.rbd_mirror mon 'allow r' osd 
> >>>>> 'allow class-read object_prefix rbd_children, allow pool rbd r' -o 
> >>>>> ceph.client.rbd_mirror.keyring --cluster ceph
> >>>>>In that pool, two images have the journaling feature activated, 
> >>>>> all others have it disabled still (so I would expect these two to be 
> >>>>> mirrored).
> >>>>
> >>>> You can just use "mon 'profile rbd' osd 'profile rbd'" for the caps --
> >>>> but you definitely need more than read-only permissions to the remote
> >>>> cluster since it needs to be able to create snapshots of remote images
> >>>> and update/trim the image journals.
> >>>
> >>> these profiles really make life a lot easier. I should have thought of 
> >>> them rather than "guessing" a potentially good configuration...
> >>>
> >>>>
> >>>>>  - Another (empty) cluster running latest Nautilus, cluster name 
> >>>>> "ceph", pool "rbd".
> >>>>>I've used the dashboard to activate mirroring for the RBD pool, 
> >>>>> and then added a peer with cluster name "ceph-virt", cephx-ID 
> >>>>> "rbd_mirror", filled in the mons and key created above.
> >>>>>I've then run:
> >>>>>ceph auth get-or-create client.rbd_mirror_backup mon 'allow r' 
> >>>>> osd 'allow class-read object_prefix rbd_children, allow pool rbd rwx' 
> >>>>> -o client.rbd_mirror_backup.keyring --cluster ceph
> >>>>>and deployed that key on the rbd-mirror machine, and started the 
> >>>>> service with:
> >>>>
> >>>> Please use "mon 'profile rbd-mirror' osd 'profile r

Re: [ceph-users] RBD error when run under cron

2019-09-11 Thread Jason Dillaman
On Wed, Sep 11, 2019 at 7:48 AM Mike O'Connor  wrote:
>
> Hi All
>
> I'm having a problem running rbd export from cron, rbd expects a tty which 
> cron does not provide.
> I tried the --no-progress but this did not help.
>
> Any ideas ?

I don't think that error is coming from the 'rbd' CLI:

$ (setsid /bin/bash -c 'tty ; ./bin/rbd export-diff --from-snap 1
foo@2 - > export') < /dev/null
not a tty
Exporting image: 100% complete...done.

>
> ---
> rbd export-diff --from-snap 1909091751 rbd/vm-100-disk-1@1909091817 - | 
> seccure-encrypt  | aws s3 cp  - s3://1909091817.diff
> FATAL: Cannot open tty: No such device or address.
> rbd: export-diff error: (32) Broken pipe
> Error in upload
> ---
>
> Thanks
> Mike
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph RBD Mirroring

2019-09-10 Thread Jason Dillaman
On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth
 wrote:
>
> Dear Jason,
>
> On 2019-09-10 18:50, Jason Dillaman wrote:
> > On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth
> >  wrote:
> >>
> >> Dear Cephalopodians,
> >>
> >> I have two questions about RBD mirroring.
> >>
> >> 1) I can not get it to work - my setup is:
> >>
> >> - One cluster holding the live RBD volumes and snapshots, in pool 
> >> "rbd", cluster name "ceph",
> >>   running latest Mimic.
> >>   I ran "rbd mirror pool enable rbd pool" on that cluster and created 
> >> a cephx user "rbd_mirror" with (is there a better way?):
> >>   ceph auth get-or-create client.rbd_mirror mon 'allow r' osd 'allow 
> >> class-read object_prefix rbd_children, allow pool rbd r' -o 
> >> ceph.client.rbd_mirror.keyring --cluster ceph
> >>   In that pool, two images have the journaling feature activated, all 
> >> others have it disabled still (so I would expect these two to be mirrored).
> >
> > You can just use "mon 'profile rbd' osd 'profile rbd'" for the caps --
> > but you definitely need more than read-only permissions to the remote
> > cluster since it needs to be able to create snapshots of remote images
> > and update/trim the image journals.
>
> these profiles really make life a lot easier. I should have thought of them 
> rather than "guessing" a potentially good configuration...
>
> >
> >> - Another (empty) cluster running latest Nautilus, cluster name 
> >> "ceph", pool "rbd".
> >>   I've used the dashboard to activate mirroring for the RBD pool, and 
> >> then added a peer with cluster name "ceph-virt", cephx-ID "rbd_mirror", 
> >> filled in the mons and key created above.
> >>   I've then run:
> >>   ceph auth get-or-create client.rbd_mirror_backup mon 'allow r' osd 
> >> 'allow class-read object_prefix rbd_children, allow pool rbd rwx' -o 
> >> client.rbd_mirror_backup.keyring --cluster ceph
> >>   and deployed that key on the rbd-mirror machine, and started the 
> >> service with:
> >
> > Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for your caps [1].
>
> That did the trick (in combination with the above)!
> Again a case of PEBKAC: I should have read the documentation until the end, 
> clearly my fault.
>
> It works well now, even though it seems to run a bit slow (~35 MB/s for the 
> initial sync when everything is 1 GBit/s),
> but that may also be caused by combination of some very limited hardware on 
> the receiving end (which will be scaled up in the future).
> A single host with 6 disks, replica 3 and a RAID controller which can only do 
> RAID0 and not JBOD is certainly not ideal, so commit latency may cause this 
> slow bandwidth.

You could try increasing "rbd_concurrent_management_ops" from the
default of 10 ops to something higher to attempt to account for the
latency. However, I wouldn't expect near-line speed w/ RBD mirroring.

> >
> >>   systemctl start ceph-rbd-mirror@rbd_mirror_backup.service
> >>
> >>After this, everything looks fine:
> >> # rbd mirror pool info
> >>   Mode: pool
> >>   Peers:
> >>UUID NAME  CLIENT
> >>XXX  ceph-virt client.rbd_mirror
> >>
> >>The service also seems to start fine, but logs show (debug 
> >> rbd_mirror=20):
> >>
> >>rbd::mirror::ClusterWatcher:0x5575e2a7d390 resolve_peer_config_keys: 
> >> retrieving config-key: pool_id=2, pool_name=rbd, peer_uuid=XXX
> >>rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: enter
> >>rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: restarting 
> >> failed pool replayer for uuid: XXX cluster: ceph-virt client: 
> >> client.rbd_mirror
> >>rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying for uuid: 
> >> XXX cluster: ceph-virt client: client.rbd_mirror
> >>rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados: error connecting 
> >> to remote peer uuid: XXX cluster: ceph-virt client: 
> >> client.rbd_mirror: (95) Operation not supported
> >>rbd::mirror::ServiceDaemon: 0x5575e29c8d70 add_or_update_callout: 
> >> pool_id=2, callout_id=2, callout_level=error, text=unable to connect to 
> >> remote

Re: [ceph-users] Ceph RBD Mirroring

2019-09-10 Thread Jason Dillaman
On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth
 wrote:
>
> Dear Cephalopodians,
>
> I have two questions about RBD mirroring.
>
> 1) I can not get it to work - my setup is:
>
> - One cluster holding the live RBD volumes and snapshots, in pool "rbd", 
> cluster name "ceph",
>   running latest Mimic.
>   I ran "rbd mirror pool enable rbd pool" on that cluster and created a 
> cephx user "rbd_mirror" with (is there a better way?):
>   ceph auth get-or-create client.rbd_mirror mon 'allow r' osd 'allow 
> class-read object_prefix rbd_children, allow pool rbd r' -o 
> ceph.client.rbd_mirror.keyring --cluster ceph
>   In that pool, two images have the journaling feature activated, all 
> others have it disabled still (so I would expect these two to be mirrored).

You can just use "mon 'profile rbd' osd 'profile rbd'" for the caps --
but you definitely need more than read-only permissions to the remote
cluster since it needs to be able to create snapshots of remote images
and update/trim the image journals.

> - Another (empty) cluster running latest Nautilus, cluster name "ceph", 
> pool "rbd".
>   I've used the dashboard to activate mirroring for the RBD pool, and 
> then added a peer with cluster name "ceph-virt", cephx-ID "rbd_mirror", 
> filled in the mons and key created above.
>   I've then run:
>   ceph auth get-or-create client.rbd_mirror_backup mon 'allow r' osd 
> 'allow class-read object_prefix rbd_children, allow pool rbd rwx' -o 
> client.rbd_mirror_backup.keyring --cluster ceph
>   and deployed that key on the rbd-mirror machine, and started the 
> service with:

Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for your caps [1].

>   systemctl start ceph-rbd-mirror@rbd_mirror_backup.service
>
>After this, everything looks fine:
> # rbd mirror pool info
>   Mode: pool
>   Peers:
>UUID NAME  CLIENT
>XXX  ceph-virt client.rbd_mirror
>
>The service also seems to start fine, but logs show (debug rbd_mirror=20):
>
>rbd::mirror::ClusterWatcher:0x5575e2a7d390 resolve_peer_config_keys: 
> retrieving config-key: pool_id=2, pool_name=rbd, peer_uuid=XXX
>rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: enter
>rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: restarting 
> failed pool replayer for uuid: XXX cluster: ceph-virt client: 
> client.rbd_mirror
>rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying for uuid: 
> XXX cluster: ceph-virt client: client.rbd_mirror
>rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados: error connecting to 
> remote peer uuid: XXX cluster: ceph-virt client: client.rbd_mirror: 
> (95) Operation not supported
>rbd::mirror::ServiceDaemon: 0x5575e29c8d70 add_or_update_callout: 
> pool_id=2, callout_id=2, callout_level=error, text=unable to connect to 
> remote cluster

If it's still broken after fixing your caps above, perhaps increase
debugging for "rados", "monc", "auth", and "ms" to see if you can
determine the source of the op not supported error.

> I already tried storing the ceph.client.rbd_mirror.keyring (i.e. from the 
> cluster with the live images) on the rbd-mirror machine explicitly (i.e. not 
> only in mon config storage),
> and after doing that:
>   rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls
> works fine. So it's not a connectivity issue. Maybe a permission issue? Or 
> did I miss something?
>
> Any idea what "operation not supported" means?
> It's unclear to me whether things should work well using Mimic with Nautilus, 
> and enabling pool mirroring but only having journaling on for two images is a 
> supported case.

Yes and yes.

> 2) Since there is a performance drawback (about 2x) for journaling, is it 
> also possible to only mirror snapshots, and leave the live volumes alone?
> This would cover the common backup usecase before deferred mirroring is 
> implemented (or is it there already?).

This is in-development right now and will hopefully land for the
Octopus release.

> Cheers and thanks in advance,
> Oliver
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#rbd-mirror-daemon

--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-10 Thread Jason Dillaman
On Tue, Sep 10, 2019 at 9:46 AM Marc Schöchlin  wrote:
>
> Hello Mike,
>
> as described i set all the settings.
>
> Unfortunately it crashed also with these settings :-(
>
> Regards
> Marc
>
> [Tue Sep 10 12:25:56 2019] Btrfs loaded, crc32c=crc32c-intel
> [Tue Sep 10 12:25:57 2019] EXT4-fs (dm-0): mounted filesystem with ordered 
> data mode. Opts: (null)
> [Tue Sep 10 12:25:59 2019] systemd[1]: systemd 237 running in system mode. 
> (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP 
> +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN 
> -PCRE2 default-hierarchy=hybrid)
> [Tue Sep 10 12:25:59 2019] systemd[1]: Detected virtualization xen.
> [Tue Sep 10 12:25:59 2019] systemd[1]: Detected architecture x86-64.
> [Tue Sep 10 12:25:59 2019] systemd[1]: Set hostname to .
> [Tue Sep 10 12:26:01 2019] systemd[1]: Started ntp-systemd-netif.path.
> [Tue Sep 10 12:26:01 2019] systemd[1]: Created slice System Slice.
> [Tue Sep 10 12:26:01 2019] systemd[1]: Listening on udev Kernel Socket.
> [Tue Sep 10 12:26:01 2019] systemd[1]: Created slice 
> system-serial\x2dgetty.slice.
> [Tue Sep 10 12:26:01 2019] systemd[1]: Listening on Journal Socket.
> [Tue Sep 10 12:26:01 2019] systemd[1]: Mounting POSIX Message Queue File 
> System...
> [Tue Sep 10 12:26:01 2019] RPC: Registered named UNIX socket transport module.
> [Tue Sep 10 12:26:01 2019] RPC: Registered udp transport module.
> [Tue Sep 10 12:26:01 2019] RPC: Registered tcp transport module.
> [Tue Sep 10 12:26:01 2019] RPC: Registered tcp NFSv4.1 backchannel transport 
> module.
> [Tue Sep 10 12:26:01 2019] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro
> [Tue Sep 10 12:26:01 2019] Loading iSCSI transport class v2.0-870.
> [Tue Sep 10 12:26:01 2019] iscsi: registered transport (tcp)
> [Tue Sep 10 12:26:01 2019] systemd-journald[497]: Received request to flush 
> runtime journal from PID 1
> [Tue Sep 10 12:26:01 2019] Installing knfsd (copyright (C) 1996 
> o...@monad.swb.de).
> [Tue Sep 10 12:26:01 2019] iscsi: registered transport (iser)
> [Tue Sep 10 12:26:01 2019] systemd-journald[497]: File 
> /var/log/journal/cef15a6d1b80c9fbcb31a3a65aec21ad/system.journal corrupted or 
> uncleanly shut down, renaming and replacing.
> [Tue Sep 10 12:26:04 2019] EXT4-fs (dm-1): mounted filesystem with ordered 
> data mode. Opts: (null)
> [Tue Sep 10 12:26:05 2019] EXT4-fs (xvda1): mounted filesystem with ordered 
> data mode. Opts: (null)
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.659:2): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="/usr/bin/lxc-start" pid=902 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.675:3): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="/usr/bin/man" pid=904 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.675:4): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="man_filter" pid=904 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.675:5): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="man_groff" pid=904 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:6): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="lxc-container-default" pid=900 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:7): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="lxc-container-default-cgns" pid=900 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:8): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="lxc-container-default-with-mounting" pid=900 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.687:9): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="lxc-container-default-with-nesting" pid=900 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.723:10): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="/usr/lib/snapd/snap-confine" pid=905 comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] audit: type=1400 audit(156866.723:11): 
> apparmor="STATUS" operation="profile_load" profile="unconfined" 
> name="/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=905 
> comm="apparmor_parser"
> [Tue Sep 10 12:26:06 2019] new mount options do not match the existing 
> superblock, will be ignored
> [Tue Sep 10 12:26:09 2019] SGI XFS with ACLs, security attributes, realtime, 
> no debug enabled
> [Tue Sep 10 12:26:09 2019] XFS (nbd0): Mounting V5 Filesystem
> [Tue Sep 10 12:26:11 2019] XFS (nbd0): Starting recovery (logdev: internal)
> [Tue Sep 10 12:26:12 2019] XFS (nbd0): Ending recovery (logdev: internal)
> [Tue Sep 10 12:26:12 2019] NFSD: Using 

Re: [ceph-users] MON DNS Lookup & Version 2 Protocol

2019-08-27 Thread Jason Dillaman
On Wed, Jul 17, 2019 at 3:07 PM  wrote:
>
> All;
>
> I'm trying to firm up my understanding of how Ceph works, and ease of 
> management tools and capabilities.
>
> I stumbled upon this: 
> http://docs.ceph.com/docs/nautilus/rados/configuration/mon-lookup-dns/
>
> It got me wondering; how do you convey protocol version 2 capabilities in 
> this format?
>
> The examples all list port 6789, which is the port for protocol version 1.  
> Would I add SRV records for port 3300?  How does the client distinguish v1 
> from v2 in this case?

If you specify the default v1 port it assumes the v1 protocol and if
you specify the default v2 port it assumes the v2 protocol. If you
don't specify a port, it will try both v1 and v2 at the default port
locations. Otherwise, it again tries both protocols against the
specified custom port. [1]

> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] https://github.com/ceph/ceph/blob/master/src/mon/MonMap.cc#L398

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [question] one-way RBD mirroring doesn't work

2019-08-26 Thread Jason Dillaman
On Mon, Aug 26, 2019 at 7:54 AM V A Prabha  wrote:
>
> Dear Jason
>   I shall explain my setup first
>   The DR centre is 300 Kms apart from the site
>   Site-A   - OSD 0 - 1 TB  Mon - 10.236.248.XX /24
>   Site-B   - OSD 0  - 1 TB  Mon - 10.236.228.XX/27  - RBD-Mirror deamon 
> running
>   All ports are open and no firewall..Connectivity is there between
>
>   My initial setup I used a common L2 connectivity between both the 
> sites..The same error as now
>   I have changed the configuration to L3 still I get the same
>
> root@meghdootctr:~# rbd mirror image status volumes/meghdoot
> meghdoot:
>   global_id:   52d9e812-75fe-4a54-8e19-0897d9204af9
>   state:   up+syncing
>   description: bootstrapping, IMAGE_COPY/COPY_OBJECT 0%
>   last_update: 2019-08-26 17:00:21
> Please do specify where I do the mistake or whats wrong with my configuration

No clue what's wrong w/ your site. Best suggestion that I could offer
would be to enable "debug rbd_mirror=20" / "debug rbd=20" logging for
rbd-mirror and see where it's hanging.

> Site-A Site-B
>  [global]
> fsid = 494971c1-75e7-4866-b9fb-e98cb8171473
> mon_initial_members = clouddr
> mon_host = 10.236.247.XX
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> public network = 10.236.247.0/24
> osd pool default size = 1
> mon_allow_pool_delete= true
> rbd default features = 125[global]
> fsid = 494971c1-75e7-4866-b9fb-e98cb8171473
> mon_initial_members = meghdootctr
> mon_host = 10.236.228.XX
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> public network = 10.236.228.64/27
> osd pool default size = 1
> mon_allow_pool_delete= true
> rbd default features = 125
>
> Regards
> V.A.Prabha
>
> On August 20, 2019 at 7:00 PM Jason Dillaman  wrote:
>
> On Tue, Aug 20, 2019 at 9:23 AM V A Prabha < prab...@cdac.in> wrote:
>
> I too face the same problem as mentioned by Sat
>   All the images created at the primary site are in the state : down+ unknown
>   Hence in the secondary site the images is 0 % up + syncing all time No 
> progress
>   The only error log that is continuously hitting is
>   2019-08-20 18:04:38.556908 7f7d4cba3700 -1 rbd::mirror::InstanceWatcher: 
> C_NotifyInstanceRequest: 0x7f7d4000f650 finish: resending after timeout
>
>
> This sounds like your rbd-mirror daemon cannot contact all OSDs. Double check 
> your network connectivity and firewall to ensure that rbd-mirror daemon can 
> connect to *both* Ceph clusters (local and remote).
>
>
>
>
>   The setup is as follows
>One OSD created in the primary site with cluster name [site-a] and one OSD 
> created in the secondary site with cluster name [site-b] both have the same 
> ceph.conf file
>RBD mirror is installed at the secondary site [ which is 300kms away from 
> the primary site]
>We are trying to merge this with our Cloud but the cinder volume fails 
> syncing everytime
>   Primary Site Output
> root@clouddr:/etc/ceph# rbd mirror pool status volumesnew --verbose
> health: WARNING
> images: 4 total
> 4 unknown
> boss123:
>  global_id:   7285ed6d-46f4-4345-b597-d24911a110f8
>  state:   down+unknown
>  description: status not found
>  last_update:
>  new123:
>  global_id:   e9f2dd7e-b0ac-4138-bce5-318b40e9119e
>  state:   down+unknown
>  description: status not found
>  last_update:
>
> root@clouddr:/etc/ceph# rbd mirror pool info volumesnew
> Mode: pool
> Peers: none
> root@clouddr:/etc/ceph# rbd mirror pool status volumesnew
> health: WARNING
> images: 4 total
> 4 unknown
>
> Secondary Site
> root@meghdootctr:~# rbd mirror image status volumesnew/boss123
> boss123:
>   global_id:   7285ed6d-46f4-4345-b597-d24911a110f8
>   state:   up+syncing
>   description: bootstrapping, IMAGE_COPY/COPY_OBJECT 0%
>   last_update: 2019-08-20 17:24:18
> Please help me to identify where do I miss something
>
> Regards
> V.A.Prabha
>
> 
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printi

Re: [ceph-users] Theory: High I/O-wait inside VM with RBD due to CPU throttling

2019-08-26 Thread Jason Dillaman
On Mon, Aug 26, 2019 at 5:01 AM Wido den Hollander  wrote:
>
>
>
> On 8/22/19 5:49 PM, Jason Dillaman wrote:
> > On Thu, Aug 22, 2019 at 11:29 AM Wido den Hollander  wrote:
> >>
> >>
> >>
> >> On 8/22/19 3:59 PM, Jason Dillaman wrote:
> >>> On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander  wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> In a couple of situations I have encountered that Virtual Machines
> >>>> running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO)
> >>>> or sdX (Virtio-SCSI) devices while they were performing CPU intensive 
> >>>> tasks.
> >>>>
> >>>> These servers would be running a very CPU intensive application while
> >>>> *not* doing that many disk I/O.
> >>>>
> >>>> I however noticed that the I/O-wait of the disk(s) in the VM went up to
> >>>> 100%.
> >>>>
> >>>> This VM is CPU limited by Libvirt by putting that KVM process in it's
> >>>> own cgroup with a CPU limitation.
> >>>>
> >>>> Now, my theory is:
> >>>>
> >>>> KVM (qemu-kvm) is completely userspace and librbd runs inside qemu-kvm
> >>>> as a library. All threads for disk I/O are part of the same PID and thus
> >>>> part of that cgroup.
> >>>>
> >>>> If a process inside the Virtual Machine now starts to consume all CPU
> >>>> time there is nothing left for librbd which slows it down.
> >>>>
> >>>> This then causes a increased I/O-wait inside the Virtual Machine. Even
> >>>> though the VM is not performing a lot of disk I/O. The wait of the I/O
> >>>> goes up due to this.
> >>>>
> >>>>
> >>>> Is my theory sane?
> >>>
> >>> Yes, I would say that your theory is sane. Have you looked into
> >>> libvirt's cgroup controls for limiting the emulator portion vs the
> >>> vCPUs [1]? I'd hope the librbd code and threads should be running in
> >>> the emulator cgroup (in a perfect world).
> >>>
> >>
> >> I checked with 'virsh schedinfo X' and this is the output I got:
> >>
> >> Scheduler  : posix
> >> cpu_shares : 1000
> >> vcpu_period: 10
> >> vcpu_quota : -1
> >> emulator_period: 10
> >> emulator_quota : -1
> >> global_period  : 10
> >> global_quota   : -1
> >> iothread_period: 10
> >> iothread_quota : -1
> >>
> >>
> >> How can we confirm if the librbd code runs inside the Emulator part?
> >
> > You can look under the "/proc//tasks// directories.
> > The "comm" file has the thread friendly name. If it's a librbd /
> > librados thread you will see things like the following (taken from an
> > 'rbd bench-write' process):
> >
> > $ cat */comm
> > rbd
> > log
> > service
> > admin_socket
> > msgr-worker-0
> > msgr-worker-1
> > msgr-worker-2
> > rbd
> > ms_dispatch
> > ms_local
> > safe_timer
> > fn_anonymous
> > safe_timer
> > safe_timer
> > fn-radosclient
> > tp_librbd
> > safe_timer
> > safe_timer
> > taskfin_librbd
> > signal_handler
> >
> > Those directories also have "cgroup" files which will indicate which
> > cgroup the thread is currently living under. For example, the
> > "tp_librbd" thread is running under the following cgroups in my
> > environment:
> >
> > 11:blkio:/
> > 10:hugetlb:/
> > 9:freezer:/
> > 8:net_cls,net_prio:/
> > 7:memory:/user.slice/user-1000.slice/user@1000.service
> > 6:cpu,cpuacct:/
> > 5:devices:/user.slice
> > 4:perf_event:/
> > 3:cpuset:/
> > 2:pids:/user.slice/user-1000.slice/user@1000.service
> > 1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
> > 0::/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
> >
>
> I checked:
>
> root@n01:/proc/3668710/task# cat 3668748/comm
> tp_librbd
> root@n01:/proc/3668710/task#
>
> So that seems to be rbd right? I also checked the 'fn-radosclient' thread.
>
> root@n01:/proc/3668710/task# cat 3668748/cgroup
> 12:hugetlb:/
> 11:memory:/machine/i-1551-77-VM.libvirt-qemu
> 10:freezer:/machine/i-1551-77-VM.libvirt-qemu
> 9:pids:/system.slice/libvirt-bin.service
> 8:rdma:/
&g

Re: [ceph-users] Theory: High I/O-wait inside VM with RBD due to CPU throttling

2019-08-22 Thread Jason Dillaman
On Thu, Aug 22, 2019 at 11:29 AM Wido den Hollander  wrote:
>
>
>
> On 8/22/19 3:59 PM, Jason Dillaman wrote:
> > On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander  wrote:
> >>
> >> Hi,
> >>
> >> In a couple of situations I have encountered that Virtual Machines
> >> running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO)
> >> or sdX (Virtio-SCSI) devices while they were performing CPU intensive 
> >> tasks.
> >>
> >> These servers would be running a very CPU intensive application while
> >> *not* doing that many disk I/O.
> >>
> >> I however noticed that the I/O-wait of the disk(s) in the VM went up to
> >> 100%.
> >>
> >> This VM is CPU limited by Libvirt by putting that KVM process in it's
> >> own cgroup with a CPU limitation.
> >>
> >> Now, my theory is:
> >>
> >> KVM (qemu-kvm) is completely userspace and librbd runs inside qemu-kvm
> >> as a library. All threads for disk I/O are part of the same PID and thus
> >> part of that cgroup.
> >>
> >> If a process inside the Virtual Machine now starts to consume all CPU
> >> time there is nothing left for librbd which slows it down.
> >>
> >> This then causes a increased I/O-wait inside the Virtual Machine. Even
> >> though the VM is not performing a lot of disk I/O. The wait of the I/O
> >> goes up due to this.
> >>
> >>
> >> Is my theory sane?
> >
> > Yes, I would say that your theory is sane. Have you looked into
> > libvirt's cgroup controls for limiting the emulator portion vs the
> > vCPUs [1]? I'd hope the librbd code and threads should be running in
> > the emulator cgroup (in a perfect world).
> >
>
> I checked with 'virsh schedinfo X' and this is the output I got:
>
> Scheduler  : posix
> cpu_shares : 1000
> vcpu_period: 10
> vcpu_quota : -1
> emulator_period: 10
> emulator_quota : -1
> global_period  : 10
> global_quota   : -1
> iothread_period: 10
> iothread_quota : -1
>
>
> How can we confirm if the librbd code runs inside the Emulator part?

You can look under the "/proc//tasks// directories.
The "comm" file has the thread friendly name. If it's a librbd /
librados thread you will see things like the following (taken from an
'rbd bench-write' process):

$ cat */comm
rbd
log
service
admin_socket
msgr-worker-0
msgr-worker-1
msgr-worker-2
rbd
ms_dispatch
ms_local
safe_timer
fn_anonymous
safe_timer
safe_timer
fn-radosclient
tp_librbd
safe_timer
safe_timer
taskfin_librbd
signal_handler

Those directories also have "cgroup" files which will indicate which
cgroup the thread is currently living under. For example, the
"tp_librbd" thread is running under the following cgroups in my
environment:

11:blkio:/
10:hugetlb:/
9:freezer:/
8:net_cls,net_prio:/
7:memory:/user.slice/user-1000.slice/user@1000.service
6:cpu,cpuacct:/
5:devices:/user.slice
4:perf_event:/
3:cpuset:/
2:pids:/user.slice/user-1000.slice/user@1000.service
1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
0::/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service


> Wido
>
> >> Can somebody confirm this?
> >>
> >> Thanks,
> >>
> >> Wido
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > [1] https://libvirt.org/cgroups.html
> >



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Theory: High I/O-wait inside VM with RBD due to CPU throttling

2019-08-22 Thread Jason Dillaman
On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander  wrote:
>
> Hi,
>
> In a couple of situations I have encountered that Virtual Machines
> running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO)
> or sdX (Virtio-SCSI) devices while they were performing CPU intensive tasks.
>
> These servers would be running a very CPU intensive application while
> *not* doing that many disk I/O.
>
> I however noticed that the I/O-wait of the disk(s) in the VM went up to
> 100%.
>
> This VM is CPU limited by Libvirt by putting that KVM process in it's
> own cgroup with a CPU limitation.
>
> Now, my theory is:
>
> KVM (qemu-kvm) is completely userspace and librbd runs inside qemu-kvm
> as a library. All threads for disk I/O are part of the same PID and thus
> part of that cgroup.
>
> If a process inside the Virtual Machine now starts to consume all CPU
> time there is nothing left for librbd which slows it down.
>
> This then causes a increased I/O-wait inside the Virtual Machine. Even
> though the VM is not performing a lot of disk I/O. The wait of the I/O
> goes up due to this.
>
>
> Is my theory sane?

Yes, I would say that your theory is sane. Have you looked into
libvirt's cgroup controls for limiting the emulator portion vs the
vCPUs [1]? I'd hope the librbd code and threads should be running in
the emulator cgroup (in a perfect world).

> Can somebody confirm this?
>
> Thanks,
>
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] https://libvirt.org/cgroups.html

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [question] one-way RBD mirroring doesn't work

2019-08-20 Thread Jason Dillaman
On Tue, Aug 20, 2019 at 9:23 AM V A Prabha  wrote:

> I too face the same problem as mentioned by Sat
>   All the images created at the primary site are in the state : down+
> unknown
>   Hence in the secondary site the images is 0 % up + syncing all time
> No progress
>   The only error log that is continuously hitting is
> *  2019-08-20 18:04:38.556908 7f7d4cba3700 -1
> rbd::mirror::InstanceWatcher: C_NotifyInstanceRequest: 0x7f7d4000f650
> finish: resending after timeout*
>

This sounds like your rbd-mirror daemon cannot contact all OSDs. Double
check your network connectivity and firewall to ensure that rbd-mirror
daemon can connect to *both* Ceph clusters (local and remote).


>
>
>   The setup is as follows
>One OSD created in the primary site with cluster name [site-a] and one
> OSD created in the secondary site with cluster name [site-b] both have the
> same ceph.conf file
>RBD mirror is installed at the secondary site [ which is 300kms away
> from the primary site]
>We are trying to merge this with our Cloud but the cinder volume fails
> syncing everytime
> *   Primary Site Output*
> root@clouddr:/etc/ceph# rbd mirror pool status volumesnew --verbose
> health: WARNING
> images: 4 total
> 4 unknown
> boss123:
>  global_id:   7285ed6d-46f4-4345-b597-d24911a110f8
>  state:   down+unknown
>  description: status not found
>  last_update:
>  new123:
>  global_id:   e9f2dd7e-b0ac-4138-bce5-318b40e9119e
>  state:   down+unknown
>  description: status not found
>  last_update:
>
> root@clouddr:/etc/ceph# rbd mirror pool info volumesnew
> Mode: pool
> Peers: none
> root@clouddr:/etc/ceph# rbd mirror pool status volumesnew
> health: WARNING
> images: 4 total
> 4 unknown
>
> *Secondary Site*
>
> root@meghdootctr:~# rbd mirror image status volumesnew/boss123
> boss123:
>   global_id:   7285ed6d-46f4-4345-b597-d24911a110f8
>   state:   up+syncing
>   description: bootstrapping, IMAGE_COPY/COPY_OBJECT 0%
>   last_update: 2019-08-20 17:24:18
>
> Please help me to identify where do I miss something
>
> Regards
> V.A.Prabha
> [image: 150th Anniversary Mahatma Gandhi]
> 
>
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> 
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reproducible rbd-nbd crashes

2019-07-31 Thread Jason Dillaman
On Wed, Jul 31, 2019 at 6:20 AM Marc Schöchlin  wrote:
>
> Hello Jason,
>
> it seems that there is something wrong in the rbd-nbd implementation.
> (added this information also at  https://tracker.ceph.com/issues/40822)
>
> The problem not seems to be related to kernel releases, filesystem types or 
> the ceph and network setup.
> Release 12.2.5 seems to work properly, and at least releases >= 12.2.10 seems 
> to have the described problem.

Here is the complete delta between the two releases in rbd-nbd:

$ git diff v12.2.5..v12.2.12 -- .
diff --git a/src/tools/rbd_nbd/rbd-nbd.cc b/src/tools/rbd_nbd/rbd-nbd.cc
index 098d9925ca..aefdbd36e0 100644
--- a/src/tools/rbd_nbd/rbd-nbd.cc
+++ b/src/tools/rbd_nbd/rbd-nbd.cc
@@ -595,14 +595,13 @@ static int do_map(int argc, const char *argv[],
Config *cfg)
   cerr << err << std::endl;
   return r;
 }
-
 if (forker.is_parent()) {
-  global_init_postfork_start(g_ceph_context);
   if (forker.parent_wait(err) != 0) {
 return -ENXIO;
   }
   return 0;
 }
+global_init_postfork_start(g_ceph_context);
   }

   common_init_finish(g_ceph_context);
@@ -724,8 +723,8 @@ static int do_map(int argc, const char *argv[], Config *cfg)

   if (info.size > ULONG_MAX) {
 r = -EFBIG;
-cerr << "rbd-nbd: image is too large (" << prettybyte_t(info.size)
- << ", max is " << prettybyte_t(ULONG_MAX) << ")" << std::endl;
+cerr << "rbd-nbd: image is too large (" << byte_u_t(info.size)
+ << ", max is " << byte_u_t(ULONG_MAX) << ")" << std::endl;
 goto close_nbd;
   }

@@ -761,9 +760,8 @@ static int do_map(int argc, const char *argv[], Config *cfg)
 cout << cfg->devpath << std::endl;

 if (g_conf->daemonize) {
-  forker.daemonize();
-  global_init_postfork_start(g_ceph_context);
   global_init_postfork_finish(g_ceph_context);
+  forker.daemonize();
 }

 {

It's basically just a log message tweak and some changes to how the
process is daemonized. If you could re-test w/ each release after
12.2.5 and pin-point where the issue starts occurring, we would have
something more to investigate.

> This night a 18 hour testrun with the following procedure was successful:
> -
> #!/bin/bash
> set -x
> while true; do
>date
>find /srv_ec -type f -name "*.MYD" -print0 |head -n 50|xargs -0 -P 10 -n 2 
> gzip -v
>date
>find /srv_ec -type f -name "*.MYD.gz" -print0 |head -n 50|xargs -0 -P 10 
> -n 2 gunzip -v
> done
> -
> Previous tests crashed in a reproducible manner with "-P 1" (single io 
> gzip/gunzip) after a few minutes up to 45 minutes.
>
> Overview of my tests:
>
> - SUCCESSFUL: kernel 4.15, ceph 12.2.5, 1TB ec-volume, ext4 file system, 120s 
> device timeout
>   -> 18 hour testrun was successful, no dmesg output
> - FAILED: kernel 4.4, ceph 12.2.11, 2TB ec-volume, xfs file system, 120s 
> device timeout
>   -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io 
> errors, map/mount can be re-created without reboot
>   -> parallel krbd device usage with 99% io usage worked without a problem 
> while running the test
> - FAILED: kernel 4.15, ceph 12.2.11, 2TB ec-volume, xfs file system, 120s 
> device timeout
>   -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io 
> errors, map/mount can be re-created
>   -> parallel krbd device usage with 99% io usage worked without a problem 
> while running the test
> - FAILED: kernel 4.4, ceph 12.2.11, 2TB ec-volume, xfs file system, no timeout
>   -> failed after < 10 minutes
>   -> system runs in a high system load, system is almost unusable, unable to 
> shutdown the system, hard reset of vm necessary, manual exclusive lock 
> removal is necessary before remapping the device
> - FAILED: kernel 4.4, ceph 12.2.11, 2TB 3-replica-volume, xfs file system, 
> 120s device timeout
>   -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io 
> errors, map/mount can be re-created
>
> All device timeouts were set separately set by the nbd_set_ioctl tool because 
> luminous rbd-nbd does not provide the possibility to define timeouts.
>
> Whats next? Is i a good idea to do a binary search between 12.2.12 and 12.2.5?
>
> From my point of view (without in depth-knowledge of rbd-nbd/librbd) my 
> assumption is that this problem might be caused by rbd-nbd code and not by 
> librbd.
> The probability that a bug like this survives uncovered in librbd for such a 
> long time seems to be low for me :-)
>
> Regards
> Marc
>
> Am 29.07.19 um 22:25 schrieb Marc Schöchlin:

Re: [ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Jason Dillaman
On Fri, Jul 26, 2019 at 9:26 AM Mykola Golub  wrote:
>
> On Fri, Jul 26, 2019 at 04:40:35PM +0530, Ajitha Robert wrote:
> > Thank you for the clarification.
> >
> > But i was trying with openstack-cinder.. when i load some data into the
> > volume around 50gb, the image sync will stop by 5 % or something within
> > 15%...  What could be the reason?
>
> I suppose you see image sync stop in mirror status output? Could you
> please provide an example? And I suppose you don't see any other
> messages in rbd-mirror log apart from what you have already posted?
> Depending on configuration rbd-mirror might log in several logs. Could
> you please try to find all its logs? `lsof |grep 'rbd-mirror.*log'`
> may be useful for this.
>
> BTW, what rbd-mirror version are you running?

>From the previous thread a few days ago (not sure why a new thread was
started on this same topic), to me it sounded like one or more OSDs
isn't reachable from the secondary site:

> > Scenario 2:
> > but when i create a 50gb volume with another glance image. Volume  get 
> > created. and in the backend i could see the rbd images both in primary and 
> > secondary
> >
> > From rbd mirror image status i found secondary cluster starts copying , and 
> > syncing was struck at around 14 %... It will be in 14 % .. no progress at 
> > all. should I set any parameters for this like timeout??
> >
> > I manually checked rbd --cluster primary object-map check ..  
> > No results came for the objects and the command was in hanging.. Thats why 
> > got worried on the failed to map object key log. I couldnt even rebuild the 
> > object map.

> It sounds like one or more of your primary OSDs are not reachable from
> the secondary site. If you run w/ "debug rbd-mirror = 20" and "debug
> rbd = 20", you should be able to see the last object it attempted to
> copy. From that, you could use "ceph osd map" to figure out the
> primary OSD for that object.



> --
> Mykola Golub
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-24 Thread Jason Dillaman
On Wed, Jul 24, 2019 at 12:47 PM Marc Schöchlin  wrote:
>
> Hi Jason,
>
> i installed kernel 4.4.0-154.181 (from ubuntu package sources) and performed 
> the crash reproduction.
> The problem also re-appeared with that kernel release.
>
> A gunzip with 10 gunzip processes throwed 1600 write and 330 read IOPS 
> against the cluster/the rbd_ec volume with a transfer rate of 290MB/sec for 
> 10 Minutes.
> After that the same problem re-appeared.
>
> What should we do now?
>
> Testing with a 10.2.5 librbd/rbd-nbd ist currently not that easy for me, 
> because the ceph apt source does not contain that version.
> Do you know a package source?

All the upstream packages should be available here [1], including 12.2.5.

> How can i support you?

Did you pull the OSD blocked ops stats to figure out what is going on
with the OSDs?

> Regards
> Marc
>
> Am 24.07.19 um 07:55 schrieb Marc Schöchlin:
> > Hi Jason,
> >
> > Am 24.07.19 um 00:40 schrieb Jason Dillaman:
> >>> Sure, which kernel do you prefer?
> >> You said you have never had an issue w/ rbd-nbd 12.2.5 in your Xen 
> >> environment. Can you use a matching kernel version?
> >
> > Thats true, our virtual machines of our xen environments completly run on 
> > rbd-nbd devices.
> > Every host runs dozends of rbd-nbd maps which are visible as xen disks in 
> > the virtual systems.
> > (https://github.com/vico-research-and-consulting/RBDSR)
> >
> > It seems that xenserver has a special behavior with device timings because 
> > 1.5 years ago we had a outage of 1.5 hours of our ceph cluster which 
> > blocked all write requests
> > (overfull disks because of huge usage growth). In this situation all 
> > virtualmachines continue their work without problems after the cluster was 
> > back.
> > We haven't set any timeouts using nbd_set_timeout.c on these systems.
> >
> > We never experienced problems with these rbd-nbd instances.
> >
> > [root@xen-s31 ~]# rbd nbd ls
> > pid   pool   image  
> >   snap device
> > 10405 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 
> > RBD-72f4e61d-acb9-4679-9b1d-fe0324cb5436 -/dev/nbd3
> > 12731 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 
> > RBD-88f8889a-05dc-49ab-a7de-8b5f3961f9c9 -/dev/nbd4
> > 13123 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 
> > RBD-37243066-54b0-453a-8bf3-b958153a680d -/dev/nbd5
> > 15342 RBD_XenStorage-PROD-SSD-1-cb933ab7-a006-4046-a012-5cbe0c5fbfb5 
> > RBD-2bee9bf7-4fed-4735-a749-2d4874181686 -/dev/nbd6
> > 15702 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 
> > RBD-5b93eb93-ebe7-4711-a16a-7893d24c1bbf -/dev/nbd7
> > 27568 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 
> > RBD-616a74b5-3f57-4123-9505-dbd4c9aa9be3 -/dev/nbd8
> > 21112 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 
> > RBD-5c673a73-7827-44cc-802c-8d626da2f401 -/dev/nbd9
> > 15726 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 
> > RBD-1069a275-d97f-48fd-9c52-aed1d8ac9eab -/dev/nbd10
> > 4368  RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 
> > RBD-23b72184-0914-4924-8f7f-10868af7c0ab -/dev/nbd11
> > 4642  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 
> > RBD-bf13cf77-6115-466e-85c5-aa1d69a570a0 -/dev/nbd12
> > 9438  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 
> > RBD-a2071aa0-5f63-4425-9f67-1713851fc1ca -/dev/nbd13
> > 29191 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 
> > RBD-fd9a299f-dad9-4ab9-b6c9-2e9650cda581 -/dev/nbd14
> > 4493  RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 
> > RBD-1bbb4135-e9ed-4720-a41a-a49b998faf42 -/dev/nbd15
> > 4683  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 
> > RBD-374cadac-d969-49eb-8269-aa125cba82d8 -/dev/nbd16
> > 1736  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 
> > RBD-478a20cc-58dd-4cd9-b8b1-6198014e21b1 -/dev/nbd17
> > 3648  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 
> > RBD-6e28ec15-747a-43c9-998d-e9f2a600f266 -/dev/nbd18
> > 9993  RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 
> > RBD-61ae5ef3-9efb-4fe6-8882-45d54558313e -/dev/nbd19
> > 10324 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 
> > RBD-f7d27673-c268-47b9-bd58-46dcd4626bbb -/dev/nbd20
> > 19330 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Jason Dillaman
On Tue, Jul 23, 2019 at 6:58 AM Marc Schöchlin  wrote:
>
>
> Am 23.07.19 um 07:28 schrieb Marc Schöchlin:
> >
> > Okay, i already experimented with high timeouts (i.e 600 seconds). As i can 
> > remember this leaded to pretty unusable system if i put high amounts of io 
> > on the ec volume.
> > This system also runs als krbd volume which saturates the system with 
> > ~30-60% iowait - this volume never had a problem.
> >
> > A comment writer in https://tracker.ceph.com/issues/40822#change-141205 
> > suggests me to reduce the rbd cache.
> > What do you think about that?
>
> Test with reduce rbd cache still fail, therefore i made tests with disabled 
> rbd cache:
>
> - i disabled rbd cache with "rbd cache = false"
> - unmounted and unmapped the image
> - mapped and mounted the image
> - re-executed my test'
>find /srv_ec type f -name "*.sql" -exec gzip -v {} \;
>
>
> It took several hours, but at the end i have the same error situation.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Can you please test a consistent Ceph release w/ a known working
kernel release? It sounds like you have changed two variables, so it's
hard to know which one is broken. We need *you* to isolate what
specific Ceph or kernel release causes the break.

We really haven't made many changes to rbd-nbd, but the kernel has had
major changes to the nbd driver. As Mike pointed out on the tracker
ticket, one of those major changes effectively capped the number of
devices at 256. Can you repeat this with a single device? Can you
repeat this on Ceph rbd-nbd 12.2.11 with an older kernel?

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Failed to get omap key when mirroring of image is enabled

2019-07-22 Thread Jason Dillaman
On Mon, Jul 22, 2019 at 3:26 PM Ajitha Robert  wrote:
>
> Thanks for your reply
>
> 1) In scenario 1, I didnt attempt to delete the cinder volume. Please find 
> the cinder volume log.
> http://paste.openstack.org/show/754731/

It might be better to ping Cinder folks about that one. It doesn't
really make sense to me from a quick glance.

>
> 2) In scenario 2. I will try with debug. But i m having a test setup with one 
> OSD in primary and one OSD in secondary. distance between two ceph clusters 
> is 300 km
>
>
> 3)I have disabled ceph authentication totally for all including rbd-mirror 
> daemon. Also i have deployed the ceph cluster using ceph-ansible. Will these 
> both  create any issue to the entire setup

Not to my knowledge.

> 4)The image which was in syncing mode, showed read only status in secondary.

Mirrored images are either primary or non-primary. It is the expected
(documented) behaviour that non-primary images are read-only.

> 5)In a presentation i found as journaling feature is causing poor performance 
> in IO operations and we can skip the journaling process for mirroring... Is 
> it possible.. By enabling mirroring to entire cinder pool as pool mode 
> instead of mirror mode of rbd mirroring.. And we can skip the 
> replication_enabled is true spec in cinder type..

Journaling is required for RBD mirroring.

>
>
>
> On Mon, Jul 22, 2019 at 11:13 PM Jason Dillaman  wrote:
>>
>> On Mon, Jul 22, 2019 at 10:49 AM Ajitha Robert  
>> wrote:
>> >
>> > No error log in rbd-mirroring except some connection timeout came once,
>> > Scenario 1:
>> >   when I create a bootable volume of 100 GB with a glance image.Image get 
>> > downloaded and from cinder, volume log throws with "volume is busy 
>> > deleting volume that has snapshot" . Image was enabled with exclusive 
>> > lock, journaling, layering, object-map, fast-diff and deep-flatten
>> > Cinder volume is in error state but the rbd image is created in primary 
>> > but not in secondary.
>>
>> Any chance you know where in Cinder that error is being thrown? A
>> quick grep of the code doesn't reveal that error message. If the image
>> is being synced to the secondary site when you attempt to delete it,
>> it's possible you could hit this issue. Providing debug log messages
>> from librbd on the Cinder controller might also be helpful for this.
>>
>> > Scenario 2:
>> > but when i create a 50gb volume with another glance image. Volume  get 
>> > created. and in the backend i could see the rbd images both in primary and 
>> > secondary
>> >
>> > From rbd mirror image status i found secondary cluster starts copying , 
>> > and syncing was struck at around 14 %... It will be in 14 % .. no progress 
>> > at all. should I set any parameters for this like timeout??
>> >
>> > I manually checked rbd --cluster primary object-map check ..  
>> > No results came for the objects and the command was in hanging.. Thats why 
>> > got worried on the failed to map object key log. I couldnt even rebuild 
>> > the object map.
>>
>> It sounds like one or more of your primary OSDs are not reachable from
>> the secondary site. If you run w/ "debug rbd-mirror = 20" and "debug
>> rbd = 20", you should be able to see the last object it attempted to
>> copy. From that, you could use "ceph osd map" to figure out the
>> primary OSD for that object.
>>
>> > the image which was in syncing mode, showed read only status in secondary.
>> >
>> >
>> >
>> > On Mon, 22 Jul 2019, 17:36 Jason Dillaman,  wrote:
>> >>
>> >> On Sun, Jul 21, 2019 at 8:25 PM Ajitha Robert  
>> >> wrote:
>> >> >
>> >> >  I have a rbd mirroring setup with primary and secondary clusters as 
>> >> > peers and I have a pool enabled image mode.., In this i created a rbd 
>> >> > image , enabled with journaling.
>> >> >
>> >> > But whenever i enable mirroring on the image,  I m getting error in 
>> >> > osd.log. I couldnt trace it out. please guide me to solve this error.
>> >> >
>> >> > I think initially it worked fine. but after ceph process restart. these 
>> >> > error coming
>> >> >
>> >> >
>> >> > Secondary.osd.0.log
>> >> >
>> >> > 2019-07-22 05:36:17.371771 7ffbaa0e9700  0  
>> >> > /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:61: failed to get 
>> >>

Re: [ceph-users] Failed to get omap key when mirroring of image is enabled

2019-07-22 Thread Jason Dillaman
On Mon, Jul 22, 2019 at 10:49 AM Ajitha Robert  wrote:
>
> No error log in rbd-mirroring except some connection timeout came once,
> Scenario 1:
>   when I create a bootable volume of 100 GB with a glance image.Image get 
> downloaded and from cinder, volume log throws with "volume is busy deleting 
> volume that has snapshot" . Image was enabled with exclusive lock, 
> journaling, layering, object-map, fast-diff and deep-flatten
> Cinder volume is in error state but the rbd image is created in primary but 
> not in secondary.

Any chance you know where in Cinder that error is being thrown? A
quick grep of the code doesn't reveal that error message. If the image
is being synced to the secondary site when you attempt to delete it,
it's possible you could hit this issue. Providing debug log messages
from librbd on the Cinder controller might also be helpful for this.

> Scenario 2:
> but when i create a 50gb volume with another glance image. Volume  get 
> created. and in the backend i could see the rbd images both in primary and 
> secondary
>
> From rbd mirror image status i found secondary cluster starts copying , and 
> syncing was struck at around 14 %... It will be in 14 % .. no progress at 
> all. should I set any parameters for this like timeout??
>
> I manually checked rbd --cluster primary object-map check ..  No 
> results came for the objects and the command was in hanging.. Thats why got 
> worried on the failed to map object key log. I couldnt even rebuild the 
> object map.

It sounds like one or more of your primary OSDs are not reachable from
the secondary site. If you run w/ "debug rbd-mirror = 20" and "debug
rbd = 20", you should be able to see the last object it attempted to
copy. From that, you could use "ceph osd map" to figure out the
primary OSD for that object.

> the image which was in syncing mode, showed read only status in secondary.
>
>
>
> On Mon, 22 Jul 2019, 17:36 Jason Dillaman,  wrote:
>>
>> On Sun, Jul 21, 2019 at 8:25 PM Ajitha Robert  
>> wrote:
>> >
>> >  I have a rbd mirroring setup with primary and secondary clusters as peers 
>> > and I have a pool enabled image mode.., In this i created a rbd image , 
>> > enabled with journaling.
>> >
>> > But whenever i enable mirroring on the image,  I m getting error in 
>> > osd.log. I couldnt trace it out. please guide me to solve this error.
>> >
>> > I think initially it worked fine. but after ceph process restart. these 
>> > error coming
>> >
>> >
>> > Secondary.osd.0.log
>> >
>> > 2019-07-22 05:36:17.371771 7ffbaa0e9700  0  
>> > /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:61: failed to get omap 
>> > key: client_a5c76849-ba16-480a-a96b-ebfdb7f6ac65
>> > 2019-07-22 05:36:17.388552 7ffbaa0e9700  0  
>> > /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set 
>> > earlier than minimum: 0 < 1
>> > 2019-07-22 05:36:17.413102 7ffbaa0e9700  0  
>> > /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:61: failed to get omap 
>> > key: order
>> > 2019-07-22 05:36:23.341490 7ffbab8ec700  0  
>> > /build/ceph-12.2.12/src/cls/rbd/cls_rbd.cc:4125: error retrieving image id 
>> > for global id '9e36b9f8-238e-4a54-a055-19b19447855e': (2) No such file or 
>> > directory
>> >
>> >
>> > primary-osd.0.log
>> >
>> > 2019-07-22 05:16:49.287769 7fae12db1700  0 log_channel(cluster) log [DBG] 
>> > : 1.b deep-scrub ok
>> > 2019-07-22 05:16:54.078698 7fae125b0700  0 log_channel(cluster) log [DBG] 
>> > : 1.1b scrub starts
>> > 2019-07-22 05:16:54.293839 7fae125b0700  0 log_channel(cluster) log [DBG] 
>> > : 1.1b scrub ok
>> > 2019-07-22 05:17:04.055277 7fae12db1700  0  
>> > /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set 
>> > earlier than minimum: 0 < 1
>> >
>> > 2019-07-22 05:33:21.540986 7fae135b2700  0  
>> > /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set 
>> > earlier than minimum: 0 < 1
>> > 2019-07-22 05:35:27.447820 7fae12db1700  0  
>> > /build/ceph-12.2.12/src/cls/rbd/cls_rbd.cc:4125: error retrieving image id 
>> > for global id '8a61f694-f650-4ba1-b768-c5e7629ad2e0': (2) No such file or 
>> > directory
>>
>> Those don't look like errors, but the log level should probably be
>> reduced for those OSD cls methods. If you look at your rbd-mirror
>> daemon log, do you see any errors? That would be the important place
>> to look.
>>
>> >
>> > --
>> > Regards,
>> > Ajitha R
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Jason



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Failed to get omap key when mirroring of image is enabled

2019-07-22 Thread Jason Dillaman
On Sun, Jul 21, 2019 at 8:25 PM Ajitha Robert  wrote:
>
>  I have a rbd mirroring setup with primary and secondary clusters as peers 
> and I have a pool enabled image mode.., In this i created a rbd image , 
> enabled with journaling.
>
> But whenever i enable mirroring on the image,  I m getting error in osd.log. 
> I couldnt trace it out. please guide me to solve this error.
>
> I think initially it worked fine. but after ceph process restart. these error 
> coming
>
>
> Secondary.osd.0.log
>
> 2019-07-22 05:36:17.371771 7ffbaa0e9700  0  
> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:61: failed to get omap 
> key: client_a5c76849-ba16-480a-a96b-ebfdb7f6ac65
> 2019-07-22 05:36:17.388552 7ffbaa0e9700  0  
> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set 
> earlier than minimum: 0 < 1
> 2019-07-22 05:36:17.413102 7ffbaa0e9700  0  
> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:61: failed to get omap 
> key: order
> 2019-07-22 05:36:23.341490 7ffbab8ec700  0  
> /build/ceph-12.2.12/src/cls/rbd/cls_rbd.cc:4125: error retrieving image id 
> for global id '9e36b9f8-238e-4a54-a055-19b19447855e': (2) No such file or 
> directory
>
>
> primary-osd.0.log
>
> 2019-07-22 05:16:49.287769 7fae12db1700  0 log_channel(cluster) log [DBG] : 
> 1.b deep-scrub ok
> 2019-07-22 05:16:54.078698 7fae125b0700  0 log_channel(cluster) log [DBG] : 
> 1.1b scrub starts
> 2019-07-22 05:16:54.293839 7fae125b0700  0 log_channel(cluster) log [DBG] : 
> 1.1b scrub ok
> 2019-07-22 05:17:04.055277 7fae12db1700  0  
> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set 
> earlier than minimum: 0 < 1
>
> 2019-07-22 05:33:21.540986 7fae135b2700  0  
> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set 
> earlier than minimum: 0 < 1
> 2019-07-22 05:35:27.447820 7fae12db1700  0  
> /build/ceph-12.2.12/src/cls/rbd/cls_rbd.cc:4125: error retrieving image id 
> for global id '8a61f694-f650-4ba1-b768-c5e7629ad2e0': (2) No such file or 
> directory

Those don't look like errors, but the log level should probably be
reduced for those OSD cls methods. If you look at your rbd-mirror
daemon log, do you see any errors? That would be the important place
to look.

>
> --
> Regards,
> Ajitha R
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-18 Thread Jason Dillaman
On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin  wrote:
>
> Hello cephers,
>
> rbd-nbd crashes in a reproducible way here.

I don't see a crash report in the log below. Is it really crashing or
is it shutting down? If it is crashing and it's reproducable, can you
install the debuginfo packages, attach gdb, and get a full backtrace
of the crash?

It seems like your cluster cannot keep up w/ the load and the nbd
kernel driver is timing out the IO and shutting down. There is a
"--timeout" option on "rbd-nbd" that you can use to increase the
kernel IO timeout for nbd.

> I created the following bug report: https://tracker.ceph.com/issues/40822
>
> Do you also experience this problem?
> Do you have suggestions for in depth debug data collection?
>
> I invoke the following command on a freshly mapped rbd and rbd_rbd crashes:
>
> # find . -type f -name "*.sql" -exec ionice -c3 nice -n 20 gzip -v {} \;
> gzip: ./deprecated_data/data_archive.done/entry_search_201232.sql.gz already 
> exists; do you wish to overwrite (y or n)? y
> ./deprecated_data/data_archive.done/entry_search_201232.sql: 84.1% -- 
> replaced with ./deprecated_data/data_archive.done/entry_search_201232.sql.gz
> ./deprecated_data/data_archive.done/entry_search_201233.sql:
> gzip: ./deprecated_data/data_archive.done/entry_search_201233.sql: 
> Input/output error
> gzip: ./deprecated_data/data_archive.done/entry_search_201234.sql: 
> Input/output error
> gzip: ./deprecated_data/data_archive.done/entry_search_201235.sql: 
> Input/output error
> gzip: ./deprecated_data/data_archive.done/entry_search_201236.sql: 
> Input/output error
> 
>
> dmesg output:
>
> [579763.020890] block nbd0: Connection timed out
> [579763.020926] block nbd0: shutting down sockets
> [579763.020943] print_req_error: I/O error, dev nbd0, sector 3221296950
> [579763.020946] block nbd0: Receive data failed (result -32)
> [579763.020952] print_req_error: I/O error, dev nbd0, sector 4523172248
> [579763.021001] XFS (nbd0): metadata I/O error: block 0xc0011736 
> ("xlog_iodone") error 5 numblks 512
> [579763.021031] XFS (nbd0): xfs_do_force_shutdown(0x2) called from line 1261 
> of file /build/linux-hwe-xJVMkx/linux-hwe-4.15.0/fs/xfs/xfs_log.c.  Return 
> address = 0x918af758
> [579763.021046] print_req_error: I/O error, dev nbd0, sector 4523172248
> [579763.021161] XFS (nbd0): Log I/O Error Detected.  Shutting down filesystem
> [579763.021176] XFS (nbd0): Please umount the filesystem and rectify the 
> problem(s)
> [579763.176834] print_req_error: I/O error, dev nbd0, sector 3221296969
> [579763.176856] print_req_error: I/O error, dev nbd0, sector 2195113096
> [579763.176869] XFS (nbd0): metadata I/O error: block 0xc0011749 
> ("xlog_iodone") error 5 numblks 512
> [579763.176884] XFS (nbd0): xfs_do_force_shutdown(0x2) called from line 1261 
> of file /build/linux-hwe-xJVMkx/linux-hwe-4.15.0/fs/xfs/xfs_log.c.  Return 
> address = 0x918af758
> [579763.252836] print_req_error: I/O error, dev nbd0, sector 2195113352
> [579763.252859] print_req_error: I/O error, dev nbd0, sector 2195113608
> [579763.252869] print_req_error: I/O error, dev nbd0, sector 2195113864
> [579763.356841] print_req_error: I/O error, dev nbd0, sector 2195114120
> [579763.356885] print_req_error: I/O error, dev nbd0, sector 2195114376
> [579763.358040] XFS (nbd0): writeback error on sector 2195119688
> [579763.916813] block nbd0: Connection timed out
> [579768.140839] block nbd0: Connection timed out
> [579768.140859] print_req_error: 21 callbacks suppressed
> [579768.140860] print_req_error: I/O error, dev nbd0, sector 2195112840
> [579768.141101] XFS (nbd0): writeback error on sector 2195115592
>
> /var/log/ceph/ceph-client.archiv.log
>
> 2019-07-18 14:52:55.387815 7fffcf7fe700  1 -- 10.23.27.200:0/3920476044 --> 
> 10.23.27.151:6806/2322641 -- osd_op(unknown.0.0:1853 34.132 
> 34:4cb446f4:::rbd_header.6c73776b8b4567:head [watch unwatch cookie 
> 140736414969824] snapc 0=[] ondisk+write+known_if_redirected e256219) v8 -- 
> 0x7fffc803a340 con 0
> 2019-07-18 14:52:55.388656 7fffe913b700  1 -- 10.23.27.200:0/3920476044 <== 
> osd.17 10.23.27.151:6806/2322641 90  watch-notify(notify (1) cookie 
> 140736414969824 notify 1100452225614816 ret 0) v3  68+0+0 (1852866777 0 
> 0) 0x7fffe187b4c0 con 0x7fffc00054d0
> 2019-07-18 14:52:55.388738 7fffe913b700  1 -- 10.23.27.200:0/3920476044 <== 
> osd.17 10.23.27.151:6806/2322641 91  osd_op_reply(1852 
> rbd_header.6c73776b8b4567 [notify cookie 140736550101040] v0'0 uv2102967 
> ondisk = 0) v8  169+0+8 (3077247585 0 3199212159) 0x7fffe0002ef0 con 
> 0x7fffc00054d0
> 2019-07-18 14:52:55.388815 7fffc700  5 librbd::Watcher: 0x7fffc0001010 
> notifications_blocked: blocked=1
> 2019-07-18 14:52:55.388904 7fffc700  1 -- 10.23.27.200:0/3920476044 --> 
> 10.23.27.151:6806/2322641 -- osd_op(unknown.0.0:1854 34.132 
> 34:4cb446f4:::rbd_header.6c73776b8b4567:head [notify-ack cookie 0] snapc 0=[] 
> ondisk+read+known_if_redirected 

Re: [ceph-users] Natlius, RBD-Mirroring & Cluster Names

2019-07-15 Thread Jason Dillaman
On Mon, Jul 15, 2019 at 4:50 PM Michel Raabe  wrote:
>
> Hi,
>
>
> On 15.07.19 22:42, dhils...@performair.com wrote:
> > Paul;
> >
> > If I understand you correctly:
> > I will have 2 clusters, each named "ceph" (internally).
> >   As such, each will have a configuration file at: /etc/ceph/ceph.conf
> > I would copy the other clusters configuration file to something like: 
> > /etc/ceph/remote.conf
> > Then the commands (run on the local mirror) would look like this:
> > rbd mirror pool peer add image-pool [client-name]@ceph (uses default 
> > cluster name to reference local cluster)
> > rbd --cluster remote mirror pool add image-pool [client-name]@remote
>
> yes...and the same for the keyring - remote.client.admin.keyring or
> remote.rbd-mirror.keyring

With Ceph Nautlius, you actually do not need to copy / modify any
ceph.conf (like) files. You can store the mon address and CephX key
for the remote cluster within the local cluster's config-key store
(see --remote-mon-host and --remote-key-file rbd CLI options) or use
the Ceph Dashboard to provide the credentials.

> Regards,
> Michel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd - volume multi attach support

2019-07-08 Thread Jason Dillaman
On Mon, Jul 8, 2019 at 10:07 AM M Ranga Swami Reddy
 wrote:
>
> Thanks Jason.
> Btw, we use Ceph with OpenStack Cinder and Cinder Release (Q and above) 
> supports multi attach. can we use the OpenStack Cinder with Q release with 
> Ceph rbd for multi attach functionality?

I can't speak to the OpenStack release since I don't know, but if you
have this commit [1], it should work.

> Thanks
> Swami

[1] https://review.opendev.org/#/c/595827/

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd - volume multi attach support

2019-07-08 Thread Jason Dillaman
On Mon, Jul 8, 2019 at 8:33 AM M Ranga Swami Reddy  wrote:
>
> Hello - Is ceph rbd support multi attach volumes (with ceph luminous 
> version0)?

Yes, you just need to ensure the exclusive-lock and dependent features
are disabled on the image. When creating a new image, you can use the
"--image-shared" optional handle this for you.

> Thanks
> Swami
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Client admin socket for RBD

2019-06-25 Thread Jason Dillaman
ou go:
>
>  WHOMASK LEVELOPTION  VALUE
>RO
>  client  advanced admin_socket
>   /var/run/ceph/$name.$pid.asok *
>  global  advanced cluster_network *10.0.42.0/23*
>  <http://10.0.42.0/23>  *
>  global  advanced debug_asok  0/0
>  global  advanced debug_auth  0/0
>  global  advanced debug_bdev  0/0
>  global  advanced debug_bluefs0/0
>  global  advanced debug_bluestore 0/0
>  global  advanced debug_buffer0/0
>  global  advanced debug_civetweb  0/0
>  global  advanced debug_client0/0
>  global  advanced debug_compressor0/0
>  global  advanced debug_context   0/0
>  global  advanced debug_crush 0/0
>  global  advanced debug_crypto0/0
>  global  advanced debug_dpdk  0/0
>  global  advanced debug_eventtrace0/0
>  global  advanced debug_filer 0/0
>  global  advanced debug_filestore 0/0
>  global  advanced debug_finisher  0/0
>  global  advanced debug_fuse  0/0
>  global  advanced debug_heartbeatmap  0/0
>  global  advanced debug_javaclient0/0
>  global  advanced debug_journal   0/0
>  global  advanced debug_journaler 0/0
>  global  advanced debug_kinetic   0/0
>  global  advanced debug_kstore0/0
>  global  advanced debug_leveldb   0/0
>  global  advanced debug_lockdep   0/0
>  global  advanced debug_mds   0/0
>  global  advanced debug_mds_balancer  0/0
>  global  advanced debug_mds_locker0/0
>  global  advanced debug_mds_log   0/0
>  global  advanced debug_mds_log_expire0/0
>  global  advanced debug_mds_migrator  0/0
>  global  advanced debug_memdb 0/0
>  global  advanced debug_mgr   0/0
>  global  advanced debug_mgrc  0/0
>  global  advanced debug_mon   0/0
>  global  advanced debug_monc  0/00
>  global  advanced debug_ms0/0
>  global  advanced debug_none  0/0
>  global  advanced debug_objclass  0/0
>  global  advanced debug_objectcacher  0/0
>  global  advanced debug_objecter  0/0
>  global  advanced debug_optracker 0/0
>  global  advanced debug_osd   0/0
>  global  advanced debug_paxos 0/0
>  global  advanced debug_perfcounter   0/0
>  global  advanced debug_rados 0/0
>  global  advanced debug_rbd   0/0
>  global  advanced debug_rbd_mirror0/0
>  global  advanced debug_rbd_replay0/0
>  global  advanced debug_refs  0/0
>  global  basiclog_file/dev/null
>*
>      global  advanced mon_cluster_log_file/dev/null
>*
>  global  advanced osd_pool_default_crush_rule -1
>  global  advanced osd_scrub_begin_hour19
>  global  advanced osd_scrub_end_hour  4
>  global  advanced osd_scrub_load_threshold0.01
>  global  advanced osd_scrub_sleep 0.10
>  global  advanced perftrue
>  global  advanced public_network  *10.0.40.0/23*
>  <http://10.0.40.0/23>  *
>  global  advanced rocksdb_perftrue
>
>  On 6/24/2019 11:50 AM, Jason Dillaman wrote:
>  > On Sun, Jun 23, 2019 at 4:27 PM Alex Litvak
>  > <*alexander.v.lit...@gmail.com* >
>  wrote:
>  >>
>  >> Hello everyone,
>  >>
>  >> I encounter this in nautilus client and not with mimic.
>  Removing admin socket ent

Re: [ceph-users] Client admin socket for RBD

2019-06-25 Thread Jason Dillaman
On Mon, Jun 24, 2019 at 4:30 PM Alex Litvak
 wrote:
>
> Jason,
>
> What  are you suggesting to do ? Removing this line from the config database 
> and keeping in config files instead?

I think it's a hole right now in the MON config store that should be
addressed. I've opened a tracker ticket [1] to support re-opening the
admin socket after the MON configs are received (if not overridden in
the local conf).

> On 6/24/2019 1:12 PM, Jason Dillaman wrote:
> > On Mon, Jun 24, 2019 at 2:05 PM Alex Litvak
> >  wrote:
> >>
> >> Jason,
> >>
> >> Here you go:
> >>
> >> WHOMASK LEVELOPTION  VALUE 
> >> RO
> >> client  advanced admin_socket
> >> /var/run/ceph/$name.$pid.asok *
> >
> > This is the offending config option that is causing your warnings.
> > Since the mon configs are read after the admin socket has been
> > initialized, it is ignored (w/ the warning saying setting this
> > property has no effect).
> >
> >> global  advanced cluster_network 10.0.42.0/23  
> >> *
> >> global  advanced debug_asok  0/0
> >> global  advanced debug_auth  0/0
> >> global  advanced debug_bdev  0/0
> >> global  advanced debug_bluefs0/0
> >> global  advanced debug_bluestore 0/0
> >> global  advanced debug_buffer0/0
> >> global  advanced debug_civetweb  0/0
> >> global  advanced debug_client0/0
> >> global  advanced debug_compressor0/0
> >> global  advanced debug_context   0/0
> >> global  advanced debug_crush 0/0
> >> global  advanced debug_crypto0/0
> >> global  advanced debug_dpdk  0/0
> >> global  advanced debug_eventtrace0/0
> >> global  advanced debug_filer 0/0
> >> global  advanced debug_filestore 0/0
> >> global  advanced debug_finisher  0/0
> >> global  advanced debug_fuse  0/0
> >> global  advanced debug_heartbeatmap  0/0
> >> global  advanced debug_javaclient0/0
> >> global  advanced debug_journal   0/0
> >> global  advanced debug_journaler 0/0
> >> global  advanced debug_kinetic   0/0
> >> global  advanced debug_kstore0/0
> >> global  advanced debug_leveldb   0/0
> >> global  advanced debug_lockdep   0/0
> >> global  advanced debug_mds   0/0
> >> global  advanced debug_mds_balancer  0/0
> >> global  advanced debug_mds_locker0/0
> >> global  advanced debug_mds_log   0/0
> >> global  advanced debug_mds_log_expire0/0
> >> global  advanced debug_mds_migrator  0/0
> >> global  advanced debug_memdb 0/0
> >> global  advanced debug_mgr   0/0
> >> global  advanced debug_mgrc  0/0
> >> global  advanced debug_mon   0/0
> >> global  advanced debug_monc  0/00
> >> global  advanced debug_ms0/0
> >> global  advanced debug_none  0/0
> >> global  advanced debug_objclass  0/0
> >> global  advanced debug_objectcacher  0/0
> >> global  advanced debug_objecter  0/0
> >> global  advanced debug_optracker 0/0
> >> global  advanced debug_osd   0/0
> >> global  advanced debug_paxos 0/0
> >> global  advanced debug_perfcounter   0/0
> >> global  advanced debug_rados 0/0
> >> global  advanced debug_rbd   0/0
> >> global  advanced debug_rbd_mirror0/0
> >> global  advanced debug_rbd_replay0/0
> >> global  advanced debug_refs  0/0
> >> global  basiclog_file/dev/null 
> >>     *
> >> global  advanced mon_cluster_log_file/dev/null 
> >> *
> >> global  advanced osd_pool_default_crush_rule -1
> >> global  advanced osd_scrub_be

Re: [ceph-users] Is rbd caching safe to use in the current ceph-iscsi 3.0 implementation

2019-06-24 Thread Jason Dillaman
On Mon, Jun 24, 2019 at 4:05 PM Paul Emmerich  wrote:
>
> No.
>
> tcmu-runner disables the cache automatically overriding your ceph.conf 
> setting.

Correct. For safety purposes, we don't want to support a writeback
cache when fallover between different gateways is possible

>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Mon, Jun 24, 2019 at 9:43 PM Wesley Dillingham  
> wrote:
>>
>> Is it safe to have RBD cache enabled on all the gateways in the latest ceph 
>> 14.2+ and ceph-iscsi 3.0 setup? Assuming client are using multipath as 
>> outlined here: http://docs.ceph.com/docs/nautilus/rbd/iscsi-initiators/  
>> Thanks.
>>
>> Respectfully,
>>
>> Wes Dillingham
>> wdilling...@godaddy.com
>> Site Reliability Engineer IV - Platform Storage
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Client admin socket for RBD

2019-06-24 Thread Jason Dillaman
On Mon, Jun 24, 2019 at 2:05 PM Alex Litvak
 wrote:
>
> Jason,
>
> Here you go:
>
> WHOMASK LEVELOPTION  VALUE
>  RO
> client  advanced admin_socket
> /var/run/ceph/$name.$pid.asok *

This is the offending config option that is causing your warnings.
Since the mon configs are read after the admin socket has been
initialized, it is ignored (w/ the warning saying setting this
property has no effect).

> global  advanced cluster_network 10.0.42.0/23 
>  *
> global  advanced debug_asok  0/0
> global  advanced debug_auth  0/0
> global  advanced debug_bdev  0/0
> global  advanced debug_bluefs0/0
> global  advanced debug_bluestore 0/0
> global  advanced debug_buffer0/0
> global  advanced debug_civetweb  0/0
> global  advanced debug_client0/0
> global  advanced debug_compressor0/0
> global  advanced debug_context   0/0
> global  advanced debug_crush 0/0
> global  advanced debug_crypto0/0
> global  advanced debug_dpdk  0/0
> global  advanced debug_eventtrace0/0
> global  advanced debug_filer 0/0
> global  advanced debug_filestore 0/0
> global  advanced debug_finisher  0/0
> global  advanced debug_fuse  0/0
> global  advanced debug_heartbeatmap  0/0
> global  advanced debug_javaclient0/0
> global  advanced debug_journal   0/0
> global  advanced debug_journaler 0/0
> global  advanced debug_kinetic   0/0
> global  advanced debug_kstore0/0
> global  advanced debug_leveldb   0/0
> global  advanced debug_lockdep   0/0
> global  advanced debug_mds   0/0
> global  advanced debug_mds_balancer  0/0
> global  advanced debug_mds_locker0/0
> global  advanced debug_mds_log   0/0
> global  advanced debug_mds_log_expire0/0
> global  advanced debug_mds_migrator  0/0
> global  advanced debug_memdb 0/0
> global  advanced debug_mgr   0/0
> global  advanced debug_mgrc  0/0
> global  advanced debug_mon   0/0
> global  advanced debug_monc  0/00
> global  advanced debug_ms0/0
> global  advanced debug_none  0/0
> global  advanced debug_objclass  0/0
> global  advanced debug_objectcacher  0/0
> global  advanced debug_objecter  0/0
> global  advanced debug_optracker 0/0
> global  advanced debug_osd   0/0
> global  advanced debug_paxos 0/0
> global  advanced debug_perfcounter   0/0
> global  advanced debug_rados 0/0
> global  advanced debug_rbd   0/0
> global  advanced debug_rbd_mirror0/0
> global  advanced debug_rbd_replay0/0
> global  advanced debug_refs  0/0
> global  basiclog_file/dev/null
>  *
> global  advanced mon_cluster_log_file/dev/null
>  *
> global  advanced osd_pool_default_crush_rule -1
> global  advanced osd_scrub_begin_hour19
> global  advanced osd_scrub_end_hour  4
> global  advanced osd_scrub_load_threshold0.01
> global  advanced osd_scrub_sleep 0.10
> global  advanced perftrue
> global  advanced public_network  10.0.40.0/23 
>  *
> global  advanced rocksdb_perftrue
>
> On 6/24/2019 11:50 AM, Jason Dillaman wrote:
> > On Sun, Jun 23, 2019 at 4:27 PM Alex Litvak
> >  wrote:
> >>
> >> Hello everyone,
> >>
> >> I encounter this in nautilus client and not with mimic.  Removing admin 
> >> socket entry from config on client makes no difference
> >>
> >> Error:
> >>
> >> rbd ls -p one
> >> 2019-06-23 12:58:29.344 7ff2710b0700 -1 set_mon_vals failed to set 
> >> admin_socket = /var/run/ceph/$name.$pid.asok: Configuration option 
> >> 'admin_socket' may not be modified at runtime
> >> 2019-06-23 12:58:29.348 7ff2708af700 -1 set_mon_vals failed to set 
> >> admin_socket 

Re: [ceph-users] Client admin socket for RBD

2019-06-24 Thread Jason Dillaman
On Sun, Jun 23, 2019 at 4:27 PM Alex Litvak
 wrote:
>
> Hello everyone,
>
> I encounter this in nautilus client and not with mimic.  Removing admin 
> socket entry from config on client makes no difference
>
> Error:
>
> rbd ls -p one
> 2019-06-23 12:58:29.344 7ff2710b0700 -1 set_mon_vals failed to set 
> admin_socket = /var/run/ceph/$name.$pid.asok: Configuration option 
> 'admin_socket' may not be modified at runtime
> 2019-06-23 12:58:29.348 7ff2708af700 -1 set_mon_vals failed to set 
> admin_socket = /var/run/ceph/$name.$pid.asok: Configuration option 
> 'admin_socket' may not be modified at runtime
>
> I have no issues running other ceph clients (no messages on the screen with 
> ceph -s or ceph iostat from the same box.)
> I connected to a few other client nodes and as root I can do the same string
> rbd ls -p one
>
>
> On all the nodes with user libvirt I have seen the admin_socket messages
>
> oneadmin@virt3n1-la:~$  rbd ls -p one --id libvirt
> 2019-06-23 13:16:41.626 7f9ea0ff9700 -1 set_mon_vals failed to set 
> admin_socket = /var/run/ceph/$name.$pid.asok: Configuration option 
> 'admin_socket' may not be modified at runtime
> 2019-06-23 13:16:41.626 7f9e8bfff700 -1 set_mon_vals failed to set 
> admin_socket = /var/run/ceph/$name.$pid.asok: Configuration option 
> 'admin_socket' may not be modified at runtime
>
> I can execute all rbd operations on the cluster from client otherwise.  
> Commenting client in config file makes no difference
>
> This is an optimiised config distributed across the clients it is almost the 
> same as on servers (no libvirt on servers)
>
> [client]
> admin_socket = /var/run/ceph/$name.$pid.asok
>
> [client.libvirt]
> admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be 
> writable by QEMU and allowed by SELinux or AppArmor
> log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and 
> allowed by SELinux or AppArmor
>
> # Please do not change this file directly since it is managed by Ansible and 
> will be overwritten
> [global]
> cluster network = 10.0.42.0/23
> fsid = 3947ba2d-1b01-4909-8e3a-f9714f427483
> log file = /dev/null
> mon cluster log file = /dev/null
> mon host = 
> [v2:10.0.40.121:3300,v1:10.0.40.121:6789],[v2:10.0.40.122:3300,v1:10.0.40.122:6789],[v2:10.0.40.123:3300,v1:10.0.40.123:6789]
> perf = True
> public network = 10.0.40.0/23
> rocksdb_perf = True
>
>
> Here is config from mon
>
> NAMEVALUE 
> 
> SOURCE   OVERRIDES  IGNORES
> cluster_network 10.0.42.0/23  
> 
> file (mon[10.0.42.0/23])
> daemonize   false 
> 
> override
> debug_asok  0/0   
> 
> mon
> debug_auth  0/0   
> 
> mon
> debug_bdev  0/0   
> 
> mon
> debug_bluefs0/0   
> 
> mon
> debug_bluestore 0/0   
> 
> mon
> debug_buffer0/0   
> 
> mon
> debug_civetweb  0/0   
> 
> mon
> debug_client0/0   
> 
> mon
> debug_compressor0/0   
> 
> mon
> debug_context   0/0   
> 
> mon
> debug_crush 0/0   
> 
> mon
> debug_crypto0/0   
> 
> mon
> debug_dpdk   

Re: [ceph-users] Possible to move RBD volumes between pools?

2019-06-19 Thread Jason Dillaman
On Wed, Jun 19, 2019 at 6:25 PM Brett Chancellor
 wrote:
>
> Background: We have a few ceph clusters, each serves multiple Openstack 
> cluster. Each cluster has it's own set of pools.
>
> I'd like to move ~50TB of volumes from an old cluster (we'll call the pool 
> cluster01-volumes) to an existing pool (cluster02-volumes) to later be 
> imported by a different Openstack cluster. I could run something like this...
> rbd export cluster01-volumes/volume-12345 | rbd import 
> cluster02-volumes/volume-12345 .

I'm getting a little confused by the dual use of "cluster" for both
Ceph and OpenStack. Are both pools in the same Ceph cluster? If so,
could you just clone the image to the new pool? The Nautilus release
also includes a simple image live migration tool where it creates a
clone, copies the data and all snapshots to the clone, and then
deletes the original image.

> But that would be slow and duplicate the data which I'd rather not do. Are 
> there any better ways to it?
>
> Thanks,
>
> -Brett
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error when I compare hashes of export-diff / import-diff

2019-06-12 Thread Jason Dillaman
On Wed, Jun 12, 2019 at 9:50 AM Rafael Diaz Maurin
 wrote:
>
> Hello Jason,
>
> Le 11/06/2019 à 15:31, Jason Dillaman a écrit :
> >> 4- I export the snapshot from the source pool and I import the snapshot
> >> towards the destination pool (in the pipe)
> >> rbd export-diff --from-snap ${LAST-SNAP}
> >> ${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP} - | rbd -c ${BACKUP-CLUSTER}
> >> import-diff - ${POOL-DESTINATION}/${KVM-IMAGE}
> > What's the actual difference between the "rbd diff" outputs? There is
> > a known "issue" where object-map will flag an object as dirty if you
> > had run an fstrim/discard on the image, but it doesn't affect the
> > actual validity of the data.
>
>
> The feature discard=on is activated, and qemu-guest-agent is running on
> the guest.
>
> Here are the only differences between the 2 outputs of the "rbd diff
> --format plain" (image source and image destination) :
> Offset  Length  Type
> 121c121
> < 14103347200 2097152 data
> ---
>  > 14103339008 2105344 data
> 153c153
> < 14371782656 2097152 data
> ---
>  > 14369685504 4194304 data
> 216c216
> < 14640218112 2097152 data
> ---
>  > 14638120960 4194304 data
> 444c444
> < 15739125760 2097152 data
> ---
>  > 15738519552 2703360 data
>
> And the hashes of the exports are identical (between source and
> destination) :
> rbd -p ${POOL-SOURCE} export ${KVM-IMAGE}@${TODAY-SNAP} - | md5sum
> => ee7012e14870b36e7b9695e52c417c06
>
> rbd -c ${BACKUP-CLUSTER} -p ${POOL-DESTINATION} export
> ${KVM-IMAGE}@${TODAY-SNAP} - | md5sum
> => ee7012e14870b36e7b9695e52c417c06
>
>
> So do you think this can be caused by the fstrim/discard feature.

You said you weren't using fstrim/discard. If you export both images
and compare those image extents where the diffs are different, is it
just filled w/ zeroes?

> In fact, when fstrim/discard activated it, the only way to validate the
> export-diff is to compare full exports hashes ?
>
>
> Thank you.
>
>
> Best regards,
> Rafael
>
> --
> Rafael Diaz Maurin
> DSI de l'Université de Rennes 1
> Pôle Infrastructures, équipe Systèmes
> 02 23 23 71 57
>
>


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] limitations to using iscsi rbd-target-api directly in lieu of gwcli

2019-06-11 Thread Jason Dillaman
On Tue, Jun 11, 2019 at 10:24 AM Wesley Dillingham
 wrote:
>
> Thanks Jason for the info! A few questions:
>
> "The current rbd-target-api doesn't really support single path LUNs."
>
> In our testing, using single path LUNs, listing the "owner" of a given LUN 
> and then connecting directly to that gateway yielded stable and 
> well-performing results, obviously, there was a SPOF, however for this use 
> case, that is acceptable (not a root fs of a vm, etc) If a SPOF is acceptable 
> is there a particular reason that single path would not be agreeable?

I should clarify: rbd-target-api will configure multiple paths to each
LUN regardless. If you only use the single active path, I guess that's
OK.

> "It currently doesn't have any RBAC style security so I would be weary
> about exposing the current REST API to arbitrary users since you would
> give them full access to do anything"
>
> This is also somewhat of a concern but this is a cluster for a single client 
> who already has full ability to manipulate storage on the legacy system and 
> have been okay. Was planning on network segregating the API so only the given 
> client could communicate with it and also having the gateways run a cephx 
> with permissions only to a particular pool (rbd) and implementing a backup 
> system to offload daily snapshots to a different pool or cluster client does 
> not have capabilities on.
>
> The dashboard feature looks very promising however client would need to 
> interact programmatically, I do intend on experimenting with giving them 
> iscsi role in the nautilus dashboard. I poked at that a bit and am having 
> some trouble getting the dashboard working with iscsi, wondering if the issue 
> is obvious to you:

Fair enough, that would be just using yet another REST API on top of
the other REST API.

> (running 14.2.0 and ceph-iscsi-3.0-57.g4ae)
>
> and configuring the dash as follows:
>
> ceph dashboard set-iscsi-api-ssl-verification false
> ceph dashboard iscsi-gateway-add http://admin:admin@${MY_HOSTNAME}:5000
> systemctl restart ceph-mgr@${MY_HOSTNAME_SHORT}.service
>
> in the dash block/iscsi/target shows:
>
> Unsupported `ceph-iscsi` config version. Expected 8 but found 9.
>

You will need this PR [1] to bump the version support in the
dashboard. It should have been backported to Nautilus as part of
v14.2.2.

> Thanks again.
>
>
>
>
>
>
>
> 
> From: Jason Dillaman 
> Sent: Tuesday, June 11, 2019 9:37 AM
> To: Wesley Dillingham
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] limitations to using iscsi rbd-target-api directly 
> in lieu of gwcli
>
> Notice: This email is from an external sender.
>
>
>
> On Tue, Jun 11, 2019 at 9:29 AM Wesley Dillingham
>  wrote:
> >
> > Hello,
> >
> > I am hoping to expose a REST API to a remote client group who would like to 
> > do things like:
> >
> >
> > Create, List, and Delete RBDs and map them to gateway (make a LUN)
> > Create snapshots, list, delete, and rollback
> > Determine Owner / Active gateway of a given lun
>
> It currently doesn't have any RBAC style security so I would be weary
> about exposing the current REST API to arbitrary users since you would
> give them full access to do anything. The Ceph dashboard in Nautilus
> (and also improved in the master branch) has lots of hooks to
> configure LUNs via the rbd-target-api REST API as another alternative
> to look at.
>
> > I would run 2-4 nodes running rbd-target-gw and rbd-target-api however 
> > client wishes to not use multi-path, wants to connect directly and only to 
> > active gateway for that lun
>
> The current rbd-target-api doesn't really support single path LUNs.
>
> > In order to prevent re-inventing the wheel I was hoping to simply expose 
> > the rbd-target-api directly to client but am wondering if this is 
> > appropriate.
> >
> > My concern is that I am taking gwcli out off the picture by using 
> > rbd-target-api directly and am wondering if the rbd-target-api on its own 
> > is able to propagate changes in the config up to the RADOS configuration 
> > object and thus keep all the gateways in sync.
>
> gwcli just uses rbd-target-api to do the work, and rbd-target-api is
> responsible for keeping the gateways in-sync with each other.
>
> > My other thought was to build a simple and limited in scope api which on 
> > the backend runs gwcli commands.
> >
> > Thank you for clarification on the functionality and appropriate use.
> >
> > Respectfully,
> >
> > Wes Dillingham
> > wdilling...@godaddy.com
> > Site Reliability Engineer IV - Platform Storage / Ceph
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason

[1] https://github.com/ceph/ceph/pull/27448

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] limitations to using iscsi rbd-target-api directly in lieu of gwcli

2019-06-11 Thread Jason Dillaman
On Tue, Jun 11, 2019 at 9:29 AM Wesley Dillingham
 wrote:
>
> Hello,
>
> I am hoping to expose a REST API to a remote client group who would like to 
> do things like:
>
>
> Create, List, and Delete RBDs and map them to gateway (make a LUN)
> Create snapshots, list, delete, and rollback
> Determine Owner / Active gateway of a given lun

It currently doesn't have any RBAC style security so I would be weary
about exposing the current REST API to arbitrary users since you would
give them full access to do anything. The Ceph dashboard in Nautilus
(and also improved in the master branch) has lots of hooks to
configure LUNs via the rbd-target-api REST API as another alternative
to look at.

> I would run 2-4 nodes running rbd-target-gw and rbd-target-api however client 
> wishes to not use multi-path, wants to connect directly and only to active 
> gateway for that lun

The current rbd-target-api doesn't really support single path LUNs.

> In order to prevent re-inventing the wheel I was hoping to simply expose the 
> rbd-target-api directly to client but am wondering if this is appropriate.
>
> My concern is that I am taking gwcli out off the picture by using 
> rbd-target-api directly and am wondering if the rbd-target-api on its own is 
> able to propagate changes in the config up to the RADOS configuration object 
> and thus keep all the gateways in sync.

gwcli just uses rbd-target-api to do the work, and rbd-target-api is
responsible for keeping the gateways in-sync with each other.

> My other thought was to build a simple and limited in scope api which on the 
> backend runs gwcli commands.
>
> Thank you for clarification on the functionality and appropriate use.
>
> Respectfully,
>
> Wes Dillingham
> wdilling...@godaddy.com
> Site Reliability Engineer IV - Platform Storage / Ceph
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error when I compare hashes of export-diff / import-diff

2019-06-11 Thread Jason Dillaman
On Tue, Jun 11, 2019 at 9:25 AM Rafael Diaz Maurin
 wrote:
>
> Hello,
>
> I have a problem when I want to validate (using md5 hashes) rbd
> export/import diff from a rbd source-pool (the production pool) towards
> another rbd destination-pool (the backup pool).
>
> Here is the algorythm :
> 1- First of all, I validate that the two hashes from lasts snapshots
> source and destination are the same :
> rbd -p ${POOL-SOURCE} export ${KVM-IMAGE}@${LAST-SNAP} - | md5sum
> => 3f54626da234730eefc27ef2a3b6ca83
> rbd -c ${BACKUP-CLUSTER} -p ${POOL-DESTINATION} export
> ${KVM-IMAGE}@${LAST-SNAP} - | md5sum
> => 3f54626da234730eefc27ef2a3b6ca83
>
>
> 2- If not exists, I create an empty image in the destination pool
> rbd -c ${BACKUP-CLUSTER} create ${POOL-DESTINATION}/${KVM-IMAGE} -s 1
>
> 3- I create a snapshot inside the source pool
> rbd snap create ${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP}
>
> 4- I export the snapshot from the source pool and I import the snapshot
> towards the destination pool (in the pipe)
> rbd export-diff --from-snap ${LAST-SNAP}
> ${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP} - | rbd -c ${BACKUP-CLUSTER}
> import-diff - ${POOL-DESTINATION}/${KVM-IMAGE}

What's the actual difference between the "rbd diff" outputs? There is
a known "issue" where object-map will flag an object as dirty if you
had run an fstrim/discard on the image, but it doesn't affect the
actual validity of the data.

> The problem occurs when I want to validate only the diff between the 2
> snapshots (in order to be more efficient). I note that those hashes are
> differents.
>
> Here is how I calcultate the hashes :
> Source-hash : rbd diff --from-snap ${LAST-SNAP}
> ${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP} --format json | md5sum | cut
> -d ' ' -f 1
> => bc56663b8ff01ec388598037a20861cf
> Destination-hash : rbd -c ${BACKUP-CLUSTER} diff --from-snap
> ${LAST-SNAP} ${POOL-DESTINATION}/${KVM-IMAGE}@${TODAY-SNAP} --format
> json | md5sum | cut -d ' ' -f 1
> => 3aa35362471419abe0a41f222c113096
>
> In an other hand, if I compare the hashes of the export (between source
> and destination), they are the same :
>
> rbd -p ${POOL-SOURCE} export ${KVM-IMAGE}@${TODAY-SNAP} - | md5sum
> => 2c4962870fdd67ca758c154760d9df83
> rbd -c ${BACKUP-CLUSTER} -p ${POOL-DESTINATION} export
> ${KVM-IMAGE}@${TODAY-SNAP} - | md5sum
> => 2c4962870fdd67ca758c154760d9df83
>
>
> Can someone has an idea of what's happenning ?
>
> Can someone has a way to succeed in comparing the export-diff /import-diff ?
>
>
>
>
> Thank you,
> Rafael
>
> --
> Rafael Diaz Maurin
> DSI de l'Université de Rennes 1
> Pôle Infrastructures, équipe Systèmes
> 02 23 23 71 57
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] krbd namespace missing in /dev

2019-06-10 Thread Jason Dillaman
On Mon, Jun 10, 2019 at 1:50 PM Jonas Jelten  wrote:
>
> When I run:
>
>   rbd map --name client.lol poolname/somenamespace/imagename
>
> The image is mapped to /dev/rbd0 and
>
>   /dev/rbd/poolname/imagename
>
> I would expect the rbd to be mapped to (the rbdmap tool tries this name):
>
>   /dev/rbd/poolname/somenamespace/imagename
>
> The current map point would not allow same-named images in different 
> namespaces, and the automatic mount of rbdmap fails
> because of this.
>
>
> Are there plans to fix this?

I opened a tracker ticket for this issue [1].

>
> Cheers
> -- Jonas
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] http://tracker.ceph.com/issues/40247

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-07 Thread Jason Dillaman
On Fri, Jun 7, 2019 at 7:22 AM Sakirnth Nagarasa
 wrote:
>
> On 6/6/19 5:09 PM, Jason Dillaman wrote:
> > On Thu, Jun 6, 2019 at 10:13 AM Sakirnth Nagarasa
> >  wrote:
> >>
> >> On 6/6/19 3:46 PM, Jason Dillaman wrote:
> >>> Can you run "rbd trash ls --all --long" and see if your image
> >>> is listed?
> >>
> >> No, it is not listed.
> >>
> >> I did run:
> >> rbd trash ls --all --long ${POOLNAME_FROM_IMAGE}
> >>
> >> Cheers,
> >> Sakirnth
> >
> > Is it listed under "rbd ls ${POOLNAME_FROM_IMAGE}"?
>
> Yes that's the point the image is still listed under "rbd ls
> ${POOLNAME_FROM_IMAGE}". But we can't do any operations with it like
> showing info or deleting it. The error message is in the first mail.

Can you run "rbd rm --log-to-stderr=true --debug-rbd=20
${POOLNAME}/${IMAGE}" and provide the logs via pastebin.com?

> Cheers,
> Sakirnth



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-06 Thread Jason Dillaman
On Thu, Jun 6, 2019 at 10:13 AM Sakirnth Nagarasa
 wrote:
>
> On 6/6/19 3:46 PM, Jason Dillaman wrote:
> > Can you run "rbd trash ls --all --long" and see if your image
> > is listed?
>
> No, it is not listed.
>
> I did run:
> rbd trash ls --all --long ${POOLNAME_FROM_IMAGE}
>
> Cheers,
> Sakirnth

Is it listed under "rbd ls ${POOLNAME_FROM_IMAGE}"?

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-06 Thread Jason Dillaman
On Thu, Jun 6, 2019 at 5:07 AM Sakirnth Nagarasa
 wrote:
>
> Hello,
>
> Our ceph version is ceph nautilus (14.2.1).
> We create periodically snapshots from an rbd image (50 TB). In order to
> restore some data, we have cloned a snapshot.
> To delete the snapshot we ran: rbd rm ${POOLNAME}/${IMAGE}
>
> But it took very long to delete the image after half an hour it had only
> 1% progress. We thought it couldn't be because the creation of the clone
> was pretty fast.
> So we interrupted (SIGINT) the delete command. After doing some research
> we found out its the normal deletion behavior.
>
> The problem is that ceph does not recognize the image anymore. Even
> though it is listed in rbd list we can't remove it.
>
> rbd rm ${POOLNAME}/${IMAGE}
> rbd: error opening image ${IMAGE}: (2) No such file or directory
>
> Now how we can get rid of the image correctly.

Starting in Nautilus, we now first temporarily move an image to the
RBD trash when it's requested to be deleted. Interrupting that
operation should leave it in the trash, but "rbd rm" should have still
worked. Can you run "rbd trash ls --all --long" and see if your image
is listed?

> Thanks
> Sakirnth Nagarasa
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple rbd images from different clusters

2019-06-05 Thread Jason Dillaman
On Wed, Jun 5, 2019 at 12:59 PM Jordan Share  wrote:
>
> One thing to keep in mind when pipelining rbd export/import is that the
> default is just a raw image dump.
>
> So if you have a large, but not very full, RBD, you will dump all those
> zeroes into the pipeline.
>
> In our case, it was actually faster to write to a (sparse) temp file and
> read it in again afterwards than to pipeline.
>
> However, we are not using --export-format 2, which I now suspect would
> mitigate this.

It's supposed to help since it's only using diffs -- never the full
image export.

> Jordan
>
>
> On 6/5/2019 8:30 AM, CUZA Frédéric wrote:
> > Hi,
> >
> > Thank you all for you quick answer.
> > I think that will solve our problem.
> >
> > This is what we came up with this :
> > rbd -c /etc/ceph/Oceph.conf --keyring /etc/ceph/Oceph.client.admin.keyring 
> > export rbd/disk_test - | rbd -c /etc/ceph/Nceph.conf --keyring 
> > /etc/ceph/Nceph.client.admin.keyring import - rbd/disk_test
> >
> > This rbd image is a test with only 5Gb of datas inside of it.
> >
> > Unfortunately the command seems to be stuck and nothing happens, both ports 
> > 7800 / 6789 / 22.
> >
> > We can't find no logs on any monitors.
> >
> > Thanks !
> >
> > -Message d'origine-
> > De : ceph-users  De la part de Jason 
> > Dillaman
> > Envoyé : 04 June 2019 14:11
> > À : Burkhard Linke 
> > Cc : ceph-users 
> > Objet : Re: [ceph-users] Multiple rbd images from different clusters
> >
> > On Tue, Jun 4, 2019 at 8:07 AM Jason Dillaman  wrote:
> >>
> >> On Tue, Jun 4, 2019 at 4:45 AM Burkhard Linke
> >>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> On 6/4/19 10:12 AM, CUZA Frédéric wrote:
> >>>
> >>> Hi everyone,
> >>>
> >>>
> >>>
> >>> We want to migrate datas from one cluster (Hammer) to a new one (Mimic). 
> >>> We do not wish to upgrade the actual cluster as all the hardware is EOS 
> >>> and we upgrade the configuration of the servers.
> >>>
> >>> We can’t find a “proper” way to mount two rbd images from two different 
> >>> cluster on the same host.
> >>>
> >>> Does anyone know what is the “good” procedure to achieve this ?
> >>
> >> Copy your "/etc/ceph/ceph.conf" and associated keyrings for both
> >> clusters to a single machine (preferably running a Mimic "rbd" client)
> >> under "/etc/ceph/.conf" and
> >> "/etc/ceph/.client..keyring".
> >>
> >> You can then use "rbd -c  export --export-format 2
> >>  - | rbd -c  import --export-format=2 -
> >> ". The "--export-format=2" option will also copy all
> >> associated snapshots with the images. If you don't want/need the
> >> snapshots, just drop that optional.
> >
> > That "-c" should be "--cluster" if specifying by name, otherwise with "-c" 
> > it's the full path to the two different conf files.
> >
> >>>
> >>> Just my 2 ct:
> >>>
> >>> the 'rbd' commands allows specifying a configuration file (-c). You need 
> >>> to setup two configuration files, one for each cluster. You can also use 
> >>> two different cluster names (--cluster option). AFAIK the name is only 
> >>> used to locate the configuration file. I'm not sure how well the kernel 
> >>> works with mapping RBDs from two different cluster.
> >>>
> >>>
> >>> If you only want to transfer RBDs from one cluster to another, you do not 
> >>> need to map and mount them; the 'rbd' command has the sub commands 
> >>> 'export' and 'import'. You can pipe them to avoid writing data to a local 
> >>> disk. This should be the fastest way to transfer the RBDs.
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Burkhard
> >>>
> >>> --
> >>> Dr. rer. nat. Burkhard Linke
> >>> Bioinformatics and Systems Biology
> >>> Justus-Liebig-University Giessen
> >>> 35392 Giessen, Germany
> >>> Phone: (+49) (0)641 9935810
> >>>
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >>
> >> --
> >> Jason
> >
> >
> >
> > --
> > Jason
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple rbd images from different clusters

2019-06-05 Thread Jason Dillaman
On Wed, Jun 5, 2019 at 11:31 AM CUZA Frédéric  wrote:
>
> Hi,
>
> Thank you all for you quick answer.
> I think that will solve our problem.
>
> This is what we came up with this :
> rbd -c /etc/ceph/Oceph.conf --keyring /etc/ceph/Oceph.client.admin.keyring 
> export rbd/disk_test - | rbd -c /etc/ceph/Nceph.conf --keyring 
> /etc/ceph/Nceph.client.admin.keyring import - rbd/disk_test
>
> This rbd image is a test with only 5Gb of datas inside of it.
>
> Unfortunately the command seems to be stuck and nothing happens, both ports 
> 7800 / 6789 / 22.
>
> We can't find no logs on any monitors.

Try running "rbd -c /path/to/conf --keyring /path/to/keyring ls" or
"ceph -c /path/to/conf --keyring /path/to/keyring health" just to test
connectivity first.

> Thanks !
>
> -Message d'origine-
> De : ceph-users  De la part de Jason 
> Dillaman
> Envoyé : 04 June 2019 14:11
> À : Burkhard Linke 
> Cc : ceph-users 
> Objet : Re: [ceph-users] Multiple rbd images from different clusters
>
> On Tue, Jun 4, 2019 at 8:07 AM Jason Dillaman  wrote:
> >
> > On Tue, Jun 4, 2019 at 4:45 AM Burkhard Linke
> >  wrote:
> > >
> > > Hi,
> > >
> > > On 6/4/19 10:12 AM, CUZA Frédéric wrote:
> > >
> > > Hi everyone,
> > >
> > >
> > >
> > > We want to migrate datas from one cluster (Hammer) to a new one (Mimic). 
> > > We do not wish to upgrade the actual cluster as all the hardware is EOS 
> > > and we upgrade the configuration of the servers.
> > >
> > > We can’t find a “proper” way to mount two rbd images from two different 
> > > cluster on the same host.
> > >
> > > Does anyone know what is the “good” procedure to achieve this ?
> >
> > Copy your "/etc/ceph/ceph.conf" and associated keyrings for both
> > clusters to a single machine (preferably running a Mimic "rbd" client)
> > under "/etc/ceph/.conf" and
> > "/etc/ceph/.client..keyring".
> >
> > You can then use "rbd -c  export --export-format 2
> >  - | rbd -c  import --export-format=2 -
> > ". The "--export-format=2" option will also copy all
> > associated snapshots with the images. If you don't want/need the
> > snapshots, just drop that optional.
>
> That "-c" should be "--cluster" if specifying by name, otherwise with "-c" 
> it's the full path to the two different conf files.
>
> > >
> > > Just my 2 ct:
> > >
> > > the 'rbd' commands allows specifying a configuration file (-c). You need 
> > > to setup two configuration files, one for each cluster. You can also use 
> > > two different cluster names (--cluster option). AFAIK the name is only 
> > > used to locate the configuration file. I'm not sure how well the kernel 
> > > works with mapping RBDs from two different cluster.
> > >
> > >
> > > If you only want to transfer RBDs from one cluster to another, you do not 
> > > need to map and mount them; the 'rbd' command has the sub commands 
> > > 'export' and 'import'. You can pipe them to avoid writing data to a local 
> > > disk. This should be the fastest way to transfer the RBDs.
> > >
> > >
> > > Regards,
> > >
> > > Burkhard
> > >
> > > --
> > > Dr. rer. nat. Burkhard Linke
> > > Bioinformatics and Systems Biology
> > > Justus-Liebig-University Giessen
> > > 35392 Giessen, Germany
> > > Phone: (+49) (0)641 9935810
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > --
> > Jason
>
>
>
> --
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]

2019-06-05 Thread Jason Dillaman
On Wed, Jun 5, 2019 at 11:26 AM CUZA Frédéric  wrote:
>
> Thank you all for you quick answer.
> I think that will solve our problem.

You might have hijacked another thread?

> This is what we came up with this :
> rbd -c /etc/ceph/Oceph.conf --keyring /etc/ceph/Oceph.client.admin.keyring 
> export rbd/disk_test - | rbd -c /etc/ceph/Nceph.conf --keyring 
> /etc/ceph/Nceph.client.admin.keyring import - rbd/disk_test
>
> This rbd image is a test with only 5Gb of datas inside of it.
>
> Unfortunately the command seems to be stuck and nothing happens, both ports 
> 7800 / 6789 / 22.
>
> We can't find no logs on any monitors.
>
> Thanks !
>
> -Message d'origine-----
> De : ceph-users  De la part de Jason 
> Dillaman
> Envoyé : 04 June 2019 14:14
> À : 解决 
> Cc : ceph-users 
> Objet : Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]
>
> On Tue, Jun 4, 2019 at 4:55 AM 解决  wrote:
> >
> > Hi all,
> > We use ceph(luminous) + openstack(queens) in my test
> > environment。The virtual machine does not start properly after the
> > disaster test and the image of virtual machine can not create snap.The
> > procedure is as follows:
> > #!/usr/bin/env python
> >
> > import rados
> > import rbd
> > with rados.Rados(conffile='/etc/ceph/ceph.conf',rados_id='nova') as cluster:
> > with cluster.open_ioctx('vms') as ioctx:
> > rbd_inst = rbd.RBD()
> > print "start open rbd image"
> > with rbd.Image(ioctx, '10df4634-4401-45ca-9c57-f349b78da475_disk') 
> > as image:
> > print "start create snapshot"
> > image.create_snap('myimage_snap1')
> >
> > when i run it ,it show readonlyimage,as follows:
> >
> > start open rbd image
> > start create snapshot
> > Traceback (most recent call last):
> >   File "testpool.py", line 17, in 
> > image.create_snap('myimage_snap1')
> >   File "rbd.pyx", line 1790, in rbd.Image.create_snap
> > (/builddir/build/BUILD/ceph-12.2.5/build/src/pybind/rbd/pyrex/rbd.c:15
> > 682)
> > rbd.ReadOnlyImage: [errno 30] error creating snapshot myimage_snap1
> > from 10df4634-4401-45ca-9c57-f349b78da475_disk
> >
> > but i run it with admin instead of nova,it is ok.
> >
> > "ceph auth list"  as follow
> >
> > installed auth entries:
> >
> > osd.1
> > key: AQBL7uRcfuyxEBAAoK8JrQWMU6EEf/g83zKJjg==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.10
> > key: AQCV7uRcdsB9IBAAHbHHCaylVUZIPKFX20polQ==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.11
> > key: AQCW7uRcRIMRIhAAbXfLbQwijEO5ZQFWFZaO5w==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.2
> > key: AQBL7uRcfFMWDBAAo7kjQobGBbIHYfZkx45pOw==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.4
> > key: AQBk7uRc97CPOBAAK9IBJICvchZPc5p80bISsg==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.5
> > key: AQBk7uRcOdqaORAAkQeEtYsE6rLWLPhYuCTdHA==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.7
> > key: AQB97uRc+1eRJxAA34DImQIMFjzHSXZ25djp0Q==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.8
> > key: AQB97uRcFilBJhAAXzSzNJsgwpobC8654Xo7Sw==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > client.admin
> > key: AQAU7uRcNia+BBAA09mOYdX+yJWbLCjcuMih0A==
> > auid: 0
> > caps: [mds] allow
> > caps: [mgr] allow *
> > caps: [mon] allow *
> > caps: [osd] allow *
> > client.cinder
> > key: AQBp7+RcOzPHGxAA7azgyayVu2RRNWJ7JxSJEg==
> > caps: [mon] allow r
> > caps: [osd] allow class-read object_prefix rbd_children, allow rwx
> > pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow
> > rwx pool=vms-cache, allow rx pool=images, allow rx pool=images-cache
> > client.cinder-backup
> > key: AQBq7+RcVOwGNRAAiwJ59ZvAUc0H4QkVeN82vA==
> > caps: [mon] allow r
> > caps: [osd] allow class-read object_prefix rbd_children, allow rwx
> > pool=backups, allow rwx pool=backups-cache client.glance
> > key: AQDf7uRc32hDBBAAkGucQEVTWqnIpNvihXf/Ng==
> > caps: [mon] allow r
> > caps: [osd] allow class-read object_prefix rbd_children, allow rwx
> > pool=images, allow rwx pool=images-cache client.nova
> > key: AQDN7+RcqDABIxAAXnFcVjBp/S5GkgOy0wqB1Q==
> > caps: [mon] allow r
> > caps: [osd] allow class-read object_prefix rbd_children, allow rwx
> > pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=

Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]

2019-06-04 Thread Jason Dillaman
On Tue, Jun 4, 2019 at 4:55 AM 解决  wrote:
>
> Hi all,
> We use ceph(luminous) + openstack(queens) in my test environment。The 
> virtual machine does not start properly after the disaster test and the image 
> of virtual machine can not create snap.The procedure is as follows:
> #!/usr/bin/env python
>
> import rados
> import rbd
> with rados.Rados(conffile='/etc/ceph/ceph.conf',rados_id='nova') as cluster:
> with cluster.open_ioctx('vms') as ioctx:
> rbd_inst = rbd.RBD()
> print "start open rbd image"
> with rbd.Image(ioctx, '10df4634-4401-45ca-9c57-f349b78da475_disk') as 
> image:
> print "start create snapshot"
> image.create_snap('myimage_snap1')
>
> when i run it ,it show readonlyimage,as follows:
>
> start open rbd image
> start create snapshot
> Traceback (most recent call last):
>   File "testpool.py", line 17, in 
> image.create_snap('myimage_snap1')
>   File "rbd.pyx", line 1790, in rbd.Image.create_snap 
> (/builddir/build/BUILD/ceph-12.2.5/build/src/pybind/rbd/pyrex/rbd.c:15682)
> rbd.ReadOnlyImage: [errno 30] error creating snapshot myimage_snap1 from 
> 10df4634-4401-45ca-9c57-f349b78da475_disk
>
> but i run it with admin instead of nova,it is ok.
>
> "ceph auth list"  as follow
>
> installed auth entries:
>
> osd.1
> key: AQBL7uRcfuyxEBAAoK8JrQWMU6EEf/g83zKJjg==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.10
> key: AQCV7uRcdsB9IBAAHbHHCaylVUZIPKFX20polQ==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.11
> key: AQCW7uRcRIMRIhAAbXfLbQwijEO5ZQFWFZaO5w==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.2
> key: AQBL7uRcfFMWDBAAo7kjQobGBbIHYfZkx45pOw==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.4
> key: AQBk7uRc97CPOBAAK9IBJICvchZPc5p80bISsg==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.5
> key: AQBk7uRcOdqaORAAkQeEtYsE6rLWLPhYuCTdHA==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.7
> key: AQB97uRc+1eRJxAA34DImQIMFjzHSXZ25djp0Q==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.8
> key: AQB97uRcFilBJhAAXzSzNJsgwpobC8654Xo7Sw==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> client.admin
> key: AQAU7uRcNia+BBAA09mOYdX+yJWbLCjcuMih0A==
> auid: 0
> caps: [mds] allow
> caps: [mgr] allow *
> caps: [mon] allow *
> caps: [osd] allow *
> client.cinder
> key: AQBp7+RcOzPHGxAA7azgyayVu2RRNWJ7JxSJEg==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow rwx 
> pool=vms-cache, allow rx pool=images, allow rx pool=images-cache
> client.cinder-backup
> key: AQBq7+RcVOwGNRAAiwJ59ZvAUc0H4QkVeN82vA==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=backups, allow rwx pool=backups-cache
> client.glance
> key: AQDf7uRc32hDBBAAkGucQEVTWqnIpNvihXf/Ng==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=images, allow rwx pool=images-cache
> client.nova
> key: AQDN7+RcqDABIxAAXnFcVjBp/S5GkgOy0wqB1Q==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow rwx 
> pool=vms-cache, allow rwx pool=images, allow rwx pool=images-cache
> client.radosgw.gateway
> key: AQAU7uRccP06CBAA6zLFtDQoTstl8CNclYRugQ==
> auid: 0
> caps: [mon] allow rwx
> caps: [osd] allow rwx
> mgr.172.30.126.26
> key: AQAr7uRclc52MhAA+GWCQEVnAHB01tMFpgJtTQ==
> caps: [mds] allow *
> caps: [mon] allow profile mgr
> caps: [osd] allow *
> mgr.172.30.126.27
> key: AQAs7uRclkD2OBAAW/cUhcZEebZnQulqVodiXQ==
> caps: [mds] allow *
> caps: [mon] allow profile mgr
> caps: [osd] allow *
> mgr.172.30.126.28
> key: AQAu7uRcT9OLBBAAZbEjb/N1NnZpIgfaAcThyQ==
> caps: [mds] allow *
> caps: [mon] allow profile mgr
> caps: [osd] allow *
>
>
> Can someone explain it to me?

Your clients don't have the correct caps. See [1] or [2].


> thanks!!
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] 
http://docs.ceph.com/docs/mimic/releases/luminous/#upgrade-from-jewel-or-kraken
[2] 
http://docs.ceph.com/docs/luminous/rbd/rados-rbd-cmds/#create-a-block-device-user

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple rbd images from different clusters

2019-06-04 Thread Jason Dillaman
On Tue, Jun 4, 2019 at 8:07 AM Jason Dillaman  wrote:
>
> On Tue, Jun 4, 2019 at 4:45 AM Burkhard Linke
>  wrote:
> >
> > Hi,
> >
> > On 6/4/19 10:12 AM, CUZA Frédéric wrote:
> >
> > Hi everyone,
> >
> >
> >
> > We want to migrate datas from one cluster (Hammer) to a new one (Mimic). We 
> > do not wish to upgrade the actual cluster as all the hardware is EOS and we 
> > upgrade the configuration of the servers.
> >
> > We can’t find a “proper” way to mount two rbd images from two different 
> > cluster on the same host.
> >
> > Does anyone know what is the “good” procedure to achieve this ?
>
> Copy your "/etc/ceph/ceph.conf" and associated keyrings for both
> clusters to a single machine (preferably running a Mimic "rbd" client)
> under "/etc/ceph/.conf" and
> "/etc/ceph/.client..keyring".
>
> You can then use "rbd -c  export --export-format 2
>  - | rbd -c  import --export-format=2 -
> ". The "--export-format=2" option will also copy all
> associated snapshots with the images. If you don't want/need the
> snapshots, just drop that optional.

That "-c" should be "--cluster" if specifying by name, otherwise with
"-c" it's the full path to the two different conf files.

> >
> > Just my 2 ct:
> >
> > the 'rbd' commands allows specifying a configuration file (-c). You need to 
> > setup two configuration files, one for each cluster. You can also use two 
> > different cluster names (--cluster option). AFAIK the name is only used to 
> > locate the configuration file. I'm not sure how well the kernel works with 
> > mapping RBDs from two different cluster.
> >
> >
> > If you only want to transfer RBDs from one cluster to another, you do not 
> > need to map and mount them; the 'rbd' command has the sub commands 'export' 
> > and 'import'. You can pipe them to avoid writing data to a local disk. This 
> > should be the fastest way to transfer the RBDs.
> >
> >
> > Regards,
> >
> > Burkhard
> >
> > --
> > Dr. rer. nat. Burkhard Linke
> > Bioinformatics and Systems Biology
> > Justus-Liebig-University Giessen
> > 35392 Giessen, Germany
> > Phone: (+49) (0)641 9935810
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple rbd images from different clusters

2019-06-04 Thread Jason Dillaman
On Tue, Jun 4, 2019 at 4:45 AM Burkhard Linke
 wrote:
>
> Hi,
>
> On 6/4/19 10:12 AM, CUZA Frédéric wrote:
>
> Hi everyone,
>
>
>
> We want to migrate datas from one cluster (Hammer) to a new one (Mimic). We 
> do not wish to upgrade the actual cluster as all the hardware is EOS and we 
> upgrade the configuration of the servers.
>
> We can’t find a “proper” way to mount two rbd images from two different 
> cluster on the same host.
>
> Does anyone know what is the “good” procedure to achieve this ?

Copy your "/etc/ceph/ceph.conf" and associated keyrings for both
clusters to a single machine (preferably running a Mimic "rbd" client)
under "/etc/ceph/.conf" and
"/etc/ceph/.client..keyring".

You can then use "rbd -c  export --export-format 2
 - | rbd -c  import --export-format=2 -
". The "--export-format=2" option will also copy all
associated snapshots with the images. If you don't want/need the
snapshots, just drop that optional.

>
> Just my 2 ct:
>
> the 'rbd' commands allows specifying a configuration file (-c). You need to 
> setup two configuration files, one for each cluster. You can also use two 
> different cluster names (--cluster option). AFAIK the name is only used to 
> locate the configuration file. I'm not sure how well the kernel works with 
> mapping RBDs from two different cluster.
>
>
> If you only want to transfer RBDs from one cluster to another, you do not 
> need to map and mount them; the 'rbd' command has the sub commands 'export' 
> and 'import'. You can pipe them to avoid writing data to a local disk. This 
> should be the fastest way to transfer the RBDs.
>
>
> Regards,
>
> Burkhard
>
> --
> Dr. rer. nat. Burkhard Linke
> Bioinformatics and Systems Biology
> Justus-Liebig-University Giessen
> 35392 Giessen, Germany
> Phone: (+49) (0)641 9935810
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "allow profile rbd" or "profile rbd"

2019-05-24 Thread Jason Dillaman
On Fri, May 24, 2019 at 6:09 PM Marc Roos  wrote:
>
>
> I have still some account listing either "allow" or not. What should
> this be? Should this not be kept uniform?

What if the profile in the future adds denials? What does "allow
profile XYX" (or "deny profile rbd") mean when it has other embedded
logic? That was at least the thought about how to address a grouped
ACL (i.e. the "allow" prefix doesn't make much sense).

>
>
> [client.xxx.xx]
>  key = xxx
>  caps mon = "allow profile rbd"
>  caps osd = "profile rbd pool=rbd,profile rbd pool=rbd.ssd"
>
>
>
> [client.xxx]
>  key = 
>  caps mon = "profile rbd"
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Jason Dillaman
On Tue, May 21, 2019 at 12:03 PM Charles Alva  wrote:
>
> Hi Jason,
>
> Should we disable fstrim services inside VM which runs on top of RBD?

It has a potential to be a thundering herd issue if you have lots of
VMs all issuing discards all at the same time and your RBD images do
not have object-map enabled. With object-map enabled, the discards
will just get ignored if the backing objects do not already exist. You
could hit a similar issue if you have hundreds or thousands of VMs all
running a scheduled IO heavy task all at the same time (e.g. a yum/apt
update every week at midnight), so it's not really tied to discard (or
even Ceph/RBD) but more of a peak IOPS capacity issue.

> I recall Ubuntu OS has weekly fstrim cronjob enabled by default, while we 
> have to enable fstrim service manually on Debian and CentOS.
>
>
> Kind regards,
>
> Charles Alva
> Sent from Gmail Mobile
>
> On Tue, May 21, 2019, 4:49 AM Jason Dillaman  wrote:
>>
>> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin  wrote:
>> >
>> > Hello cephers,
>> >
>> > we have a few systems which utilize a rbd-bd map/mount to get access to a 
>> > rbd volume.
>> > (This problem seems to be related to "[ceph-users] Slow requests from 
>> > bluestore osds" (the original thread))
>> >
>> > Unfortunately the rbd-nbd device of a system crashes three mondays in 
>> > series at ~00:00 when the systemd fstrim timer executes "fstrim -av".
>> > (which runs in parallel to deep scrub operations)
>>
>> That's probably not a good practice if you have lots of VMs doing this
>> at the same time *and* you are not using object-map. The reason is
>> that "fstrim" could discard huge extents that result around a thousand
>> concurrent remove/truncate/zero ops per image being thrown at your
>> cluster.
>>
>> > After that the device constantly reports io errors every time a access to 
>> > the filesystem happens.
>> > Unmounting, remapping and mounting helped to get the filesystem/device 
>> > back into business :-)
>>
>> If the cluster was being DDoSed by the fstrims, the VM OSes' might
>> have timed out thinking a controller failure.
>>
>> > Manual 30 minute stresstests using the following fio command, did not 
>> > produce any problems on client side
>> > (Ceph storage reported some slow requests while testing).
>> >
>> > fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 
>> > --name=test --filename=test --bs=4k --iodepth=64 --size=4G 
>> > --readwrite=randrw --rwmixread=50 --numjobs=50 --loops=10
>> >
>> > It seems that others also experienced this problem: 
>> > https://ceph-users.ceph.narkive.com/2FIfyx1U/rbd-nbd-timeout-and-crash
>> > The change for setting device timeouts by not seems to be merged to 
>> > luminous.
>> > Experiments setting the timeout manually after mapping using 
>> > https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c 
>> > haven't change the situation.
>> >
>> > Do you have suggestions how to analyze/solve the situation?
>> >
>> > Regards
>> > Marc
>> > 
>> >
>> >
>> >
>> > The client kernel throws messages like this:
>> >
>> > May 19 23:59:01 int-nfs-001 CRON[836295]: (root) CMD (command -v 
>> > debian-sa1 > /dev/null && debian-sa1 60 2)
>> > May 20 00:00:30 int-nfs-001 systemd[1]: Starting Discard unused blocks...
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623582] block nbd0: 
>> > Connection timed out
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623613] block nbd0: shutting 
>> > down sockets
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623617] print_req_error: I/O 
>> > error, dev nbd0, sector 84082280
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623632] block nbd0: 
>> > Connection timed out
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623636] print_req_error: I/O 
>> > error, dev nbd0, sector 92470887
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623642] block nbd0: 
>> > Connection timed out
>> >
>> > Ceph throws messages like this:
>> >
>> > 2019-05-20 00:00:00.000124 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 
>> > 173572 : cluster [INF] overall HEALTH_OK
>> > 2019-05-20 00:00:54.249998 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 
>> > 173586 : cluster [WRN] Health check failed: 644 slow requests are blocked 
>> &

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Jason Dillaman
On Tue, May 21, 2019 at 11:28 AM Marc Schöchlin  wrote:
>
> Hello Jason,
>
> Am 20.05.19 um 23:49 schrieb Jason Dillaman:
>
> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin  wrote:
>
> Hello cephers,
>
> we have a few systems which utilize a rbd-bd map/mount to get access to a rbd 
> volume.
> (This problem seems to be related to "[ceph-users] Slow requests from 
> bluestore osds" (the original thread))
>
> Unfortunately the rbd-nbd device of a system crashes three mondays in series 
> at ~00:00 when the systemd fstrim timer executes "fstrim -av".
> (which runs in parallel to deep scrub operations)
>
> That's probably not a good practice if you have lots of VMs doing this
> at the same time *and* you are not using object-map. The reason is
> that "fstrim" could discard huge extents that result around a thousand
> concurrent remove/truncate/zero ops per image being thrown at your
> cluster.
>
> Sure, currently we do not have lots of vms which are capable to run fstim on 
> rbd volumes.
> But the already involved RBD Images are multiple-tb images with a high 
> write/deletetion rate.
> Therefore i am already in progress to distribute fstrims by adding random 
> delays
>
> After that the device constantly reports io errors every time a access to the 
> filesystem happens.
> Unmounting, remapping and mounting helped to get the filesystem/device back 
> into business :-)
>
> If the cluster was being DDoSed by the fstrims, the VM OSes' might
> have timed out thinking a controller failure.
>
>
> Yes and no :-) Probably my problem is related to the kernel release, kernel 
> setting or the operating system release.
> Why?
>
> we run ~800 RBD images on that ceph cluster with rbd-nbd 12.2.5 in our xen 
> cluster as dom0-storage repository device without any timeout problems
> (kernel 4.4.0+10, centos 7)
> we run some 35TB kRBD images with multiples of the load of the crashed 
> rbd-nbd very write/read/deletion load without any timeout problems
> the timeout problem appears on two vms (ubuntu 18.04, ubuntu 16.04) which 
> utilize the described settings
>
> From my point of view, the error behavior is currently reproducible with a 
> good probability.
> Do you have suggestions how to find the root cause of this problem?

Can you provide any logs/backtraces/core dumps from the rbd-nbd process?

> Regards
> Marc


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-20 Thread Jason Dillaman
On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin  wrote:
>
> Hello cephers,
>
> we have a few systems which utilize a rbd-bd map/mount to get access to a rbd 
> volume.
> (This problem seems to be related to "[ceph-users] Slow requests from 
> bluestore osds" (the original thread))
>
> Unfortunately the rbd-nbd device of a system crashes three mondays in series 
> at ~00:00 when the systemd fstrim timer executes "fstrim -av".
> (which runs in parallel to deep scrub operations)

That's probably not a good practice if you have lots of VMs doing this
at the same time *and* you are not using object-map. The reason is
that "fstrim" could discard huge extents that result around a thousand
concurrent remove/truncate/zero ops per image being thrown at your
cluster.

> After that the device constantly reports io errors every time a access to the 
> filesystem happens.
> Unmounting, remapping and mounting helped to get the filesystem/device back 
> into business :-)

If the cluster was being DDoSed by the fstrims, the VM OSes' might
have timed out thinking a controller failure.

> Manual 30 minute stresstests using the following fio command, did not produce 
> any problems on client side
> (Ceph storage reported some slow requests while testing).
>
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test 
> --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw 
> --rwmixread=50 --numjobs=50 --loops=10
>
> It seems that others also experienced this problem: 
> https://ceph-users.ceph.narkive.com/2FIfyx1U/rbd-nbd-timeout-and-crash
> The change for setting device timeouts by not seems to be merged to luminous.
> Experiments setting the timeout manually after mapping using 
> https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c haven't 
> change the situation.
>
> Do you have suggestions how to analyze/solve the situation?
>
> Regards
> Marc
> 
>
>
>
> The client kernel throws messages like this:
>
> May 19 23:59:01 int-nfs-001 CRON[836295]: (root) CMD (command -v debian-sa1 > 
> /dev/null && debian-sa1 60 2)
> May 20 00:00:30 int-nfs-001 systemd[1]: Starting Discard unused blocks...
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623582] block nbd0: Connection 
> timed out
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623613] block nbd0: shutting 
> down sockets
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623617] print_req_error: I/O 
> error, dev nbd0, sector 84082280
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623632] block nbd0: Connection 
> timed out
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623636] print_req_error: I/O 
> error, dev nbd0, sector 92470887
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623642] block nbd0: Connection 
> timed out
>
> Ceph throws messages like this:
>
> 2019-05-20 00:00:00.000124 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173572 
> : cluster [INF] overall HEALTH_OK
> 2019-05-20 00:00:54.249998 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173586 
> : cluster [WRN] Health check failed: 644 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:00.330566 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173587 
> : cluster [WRN] Health check update: 594 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:09.768476 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173591 
> : cluster [WRN] Health check update: 505 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:14.768769 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173592 
> : cluster [WRN] Health check update: 497 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:20.610398 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173593 
> : cluster [WRN] Health check update: 509 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:28.721891 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173594 
> : cluster [WRN] Health check update: 501 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:34.909842 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173596 
> : cluster [WRN] Health check update: 494 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:44.770330 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173597 
> : cluster [WRN] Health check update: 500 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:49.770625 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173599 
> : cluster [WRN] Health check update: 608 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:55.073734 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173600 
> : cluster [WRN] Health check update: 593 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:02:04.771432 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173607 
> : cluster [WRN] Health 

Re: [ceph-users] ceph nautilus namespaces for rbd and rbd image access problem

2019-05-20 Thread Jason Dillaman
On Mon, May 20, 2019 at 11:14 AM Rainer Krienke  wrote:
>
> Am 20.05.19 um 09:06 schrieb Jason Dillaman:
>
> >> $ rbd --namespace=testnamespace map rbd/rbdtestns --name client.rainer
> >> --keyring=/etc/ceph/ceph.keyring
> >> rbd: sysfs write failed
> >> rbd: error opening image rbdtestns: (1) Operation not permitted
> >> In some cases useful info is found in syslog - try "dmesg | tail".
> >> 2019-05-20 08:18:29.187 7f42ab7fe700 -1 librbd::image::RefreshRequest:
> >> failed to retrieve pool metadata: (1) Operation not permitted
> >> 2019-05-20 08:18:29.187 7f42aaffd700 -1 librbd::image::OpenRequest:
> >> failed to refresh image: (1) Operation not permitted
> >> 2019-05-20 08:18:29.187 7f42aaffd700 -1 librbd::ImageState:
> >> 0x561792408860 failed to open image: (1) Operation not permitted
> >> rbd: map failed: (22) Invalid argument
> >
> > Hmm, it looks like we overlooked updating the 'rbd' profile when PR
> > 27423 [1] was merged into v14.2.1. We'll get that fixed, but in the
> > meantime, you can add a "class rbd metadata_list" cap on the base pool
> > (w/o the namespace restriction) [2].
> >
>
> Thanks for your answer. Well I still have Kernel 4.15 so namespaces
> won't work for me at the moment.
>
> Could you please explain what the magic behind "class rbd metadata_list"
> is? Is it thought to "simply" allow access to the basepool (rbd in my
> case), so I authorize access to the pool instead of a namespaces? And if
> this is true then I do not understand the difference of your class cap
> compared to a cap like  osd 'allow rw pool=rbd'?

It allows access to invoke a single OSD object class method named
rbd.metadata_list, which is a read-only operation. Therefore, you are
giving access to read pool-level configuration overrides but not
access to read/write/execute any other things in the base pool. You
could further restrict it to the "rbd_info" object when combined w/
the "object_prefix rbd_info" matcher.

> --
> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
> Web: http://userpages.uni-koblenz.de/~krienke
> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph nautilus namespaces for rbd and rbd image access problem

2019-05-20 Thread Jason Dillaman
On Mon, May 20, 2019 at 9:08 AM Rainer Krienke  wrote:
>
> Hello,
>
> just saw this message on the client when trying and failing to map the
> rbd image:
>
> May 20 08:59:42 client kernel: libceph: bad option at
> '_pool_ns=testnamespace'

You will need kernel v4.19 (or later) I believe to utilize RBD
namespaces via krbd [1].

> Rainer
>
> Am 20.05.19 um 08:56 schrieb Rainer Krienke:
> > Hello,
> >
> > on a ceph Nautilus cluster (14.2.1) running on Ubuntu 18.04 I try to set
> > up rbd images with namespaces in order to allow different clients to
> > access only their "own" rbd images in different namespaces in just one
> > pool. The rbd image data are in an erasure encoded pool named "ecpool"
> > and the metadata in the default "rbd" pool.
> --
> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
> Web: http://userpages.uni-koblenz.de/~krienke
> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] 
https://github.com/torvalds/linux/commit/b26c047b940003295d3896b7f633a66aab95bebd

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph nautilus namespaces for rbd and rbd image access problem

2019-05-20 Thread Jason Dillaman
On Mon, May 20, 2019 at 8:56 AM Rainer Krienke  wrote:
>
> Hello,
>
> on a ceph Nautilus cluster (14.2.1) running on Ubuntu 18.04 I try to set
> up rbd images with namespaces in order to allow different clients to
> access only their "own" rbd images in different namespaces in just one
> pool. The rbd image data are in an erasure encoded pool named "ecpool"
> and the metadata in the default "rbd" pool.
>
> With this setup I am experiencing trouble when I try to access a rbd
> image in a namespace from a (OpenSuSE Leap 15.0 with Ceph 14.2.1) client
> and I do not understand what I am doing wrong. Hope someone can see the
> problem and give me a hint:
>
> # On one of the the ceph servers
>
> $ rbd namespace create --namespace testnamespace
> $ rbd namespace ls
> NAME
> testnamespace
>
> $ ceph auth caps client.rainer mon 'profile rbd' osd 'profile rbd
> pool=rbd namespace=testnamespace'
>
> $ ceph auth get client.rainer
> [client.rainer]
> key = AQCcVt5cHC+WJhBBoRPKhErEYzxGuU8U/GA0xA++
> caps mon = "profile rbd"
> caps osd = "profile rbd pool=rbd namespace=testnamespace"
>
> $ rbd create rbd/rbdtestns --namespace testnamespace --size 50G
> --data-pool=rbd-ecpool
>
> $ rbd --namespace testnamespace ls -l
> NAME  SIZE   PARENT FMT PROT LOCK
> rbdtestns 50 GiB  2
>
> On the openSuSE Client:
>
> $ rbd --namespace=testnamespace map rbd/rbdtestns --name client.rainer
> --keyring=/etc/ceph/ceph.keyring
> rbd: sysfs write failed
> rbd: error opening image rbdtestns: (1) Operation not permitted
> In some cases useful info is found in syslog - try "dmesg | tail".
> 2019-05-20 08:18:29.187 7f42ab7fe700 -1 librbd::image::RefreshRequest:
> failed to retrieve pool metadata: (1) Operation not permitted
> 2019-05-20 08:18:29.187 7f42aaffd700 -1 librbd::image::OpenRequest:
> failed to refresh image: (1) Operation not permitted
> 2019-05-20 08:18:29.187 7f42aaffd700 -1 librbd::ImageState:
> 0x561792408860 failed to open image: (1) Operation not permitted
> rbd: map failed: (22) Invalid argument

Hmm, it looks like we overlooked updating the 'rbd' profile when PR
27423 [1] was merged into v14.2.1. We'll get that fixed, but in the
meantime, you can add a "class rbd metadata_list" cap on the base pool
(w/o the namespace restriction) [2].

> Thanks for your help
> Rainer
> --
> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
> Web: http://userpages.uni-koblenz.de/~krienke
> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[1] https://github.com/ceph/ceph/pull/27423
[2] 
http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stalls on new RBD images.

2019-05-08 Thread Jason Dillaman
On Wed, May 8, 2019 at 7:26 AM  wrote:
>
> Hi.
>
> I'm fishing a bit here.
>
> What we see is that when we have new VM/RBD/SSD-backed images the
> time before they are "fully written" first time - can be lousy
> performance. Sort of like they are thin-provisioned and the subsequent
> growing of the images in Ceph deliveres a performance hit.

Do you have object-map enabled? On a very fast flash-based Ceph
cluster, the object-map becomes a bottleneck on empty RBD images since
the OSDs are only capable of performing ~2-3K object map updates /
second. Since the object-map is only updated when a backing object is
first written, that could account for initial performance hit.
However, once the object-map is updated, it is no longer in the IO
path so you can achieve 10s of thousands of writes per second.

> Does anyone else have someting similar in their setup - how do you deal
> with it?
>
> KVM based virtualization, Ceph Luminous.
>
> Any suggestions/hints/welcome
>
> Jesper
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd ssd pool for (windows) vms

2019-05-01 Thread Jason Dillaman
On Wed, May 1, 2019 at 5:00 PM Marc Roos  wrote:
>
>
> Do you need to tell the vm's that they are on a ssd rbd pool? Or does
> ceph and the libvirt drivers do this automatically for you?

Like discard, any advanced QEMU options would need to be manually specified.

> When testing a nutanix acropolis virtual install, I had to 'cheat' it by
> adding this
>  
> To make the installer think there was a ssd drive.
>
> I only have 'Thin provisioned drive' mentioned regardless if the vm is
> on a hdd rbd pool or a ssd rbd pool.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] msgr2 and cephfs

2019-04-24 Thread Jason Dillaman
AFAIK, the kernel clients for CephFS and RBD do not support msgr2 yet.

On Wed, Apr 24, 2019 at 4:19 PM Aaron Bassett
 wrote:
>
> Hi,
> I'm standing up a new cluster on nautilus to play with some of the new 
> features, and I've somehow got my monitors only listening on msgrv2 port 
> (3300) and not the legacy port (6789). I'm running kernel 4.15 on my clients. 
> Can I mount cephfs via port 3300 or do I have to figure out how to get my 
> mons listening to both?
>
> Thanks,
> Aaron
> CONFIDENTIALITY NOTICE
> This e-mail message and any attachments are only for the use of the intended 
> recipient and may contain information that is privileged, confidential or 
> exempt from disclosure under applicable law. If you are not the intended 
> recipient, any disclosure, distribution or other use of this e-mail message 
> or attachments is prohibited. If you have received this e-mail message in 
> error, please delete and notify the sender immediately. Thank you.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI LUN and target Maximums in ceph-iscsi-3.0+

2019-04-19 Thread Jason Dillaman
On Thu, Apr 18, 2019 at 3:47 PM Wesley Dillingham
 wrote:
>
> I am trying to determine some sizing limitations for a potential iSCSI 
> deployment and wondering whats still the current lay of the land:
>
> Are the following still accurate as of the ceph-iscsi-3.0 implementation 
> assuming CentOS 7.6+ and the latest python-rtslib etc from shaman:
>
> Limit of 4 gateways per cluster (source: 
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/block_device_guide/using_an_iscsi_gateway#requirements)

Yes -- at least that's what's tested. I don't know of any immediate
code-level restrictions, however. You can technically isolate a
cluster of iSCSI gateways by configuring them to access their
configuration from different pools. Of course, then things like the
Ceph Dashboard won't work correctly.

> Limit of 256 LUNS per target (source: 
> https://github.com/ceph/ceph-iscsi-cli/issues/84#issuecomment-373359179 ) 
> there is mention of this being updated in this comment: 
> https://github.com/ceph/ceph-iscsi-cli/issues/84#issuecomment-373449362 per 
> an update to rtslib but I still see the limit as 256 here: 
> https://github.com/ceph/ceph-iscsi/blob/master/ceph_iscsi_config/lun.py#L984 
> wondering if this is just an outdated limit or there is still valid reason to 
> limit the number of LUNs per target

Still a limit although it could possibly be removed. Until recently,
it was painfully slow to add hundreds of LUNs, so assuming that has
been addressed, perhaps this limit could be removed -- it just makes
testing harder.

> Limit of 1 target per cluster: 
> https://github.com/ceph/ceph-iscsi-cli/issues/104#issuecomment-396224922

SUSE added support for multiple targets per cluster.

>
> Thanks in advance.
>
>
>
>
>
> Respectfully,
>
> Wes Dillingham
> wdilling...@godaddy.com
> Site Reliability Engineer IV - Platform Storage / Ceph
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Explicitly picking active iSCSI gateway at RBD/LUN export time.

2019-04-19 Thread Jason Dillaman
On Wed, Apr 17, 2019 at 10:48 AM Wesley Dillingham
 wrote:
>
> The man page for gwcli indicates:
>
> "Disks exported through the gateways use ALUA attributes to provide 
> ActiveOptimised and ActiveNonOptimised  access  to the rbd images. Each disk 
> is assigned a primary owner at creation/import time"
>
> I am trying to determine whether I can explicitly set which gateway will be 
> the "owner" at the creation import time. Further is it possible to change 
> after the initial assignment which gateway is the "owner" through the gwcli.

That's not currently possible via the "gwcli". The owner is
auto-selected based on the gateway with the fewest active LUNs. If you
hand-modified the "gateway.conf" object in the "rbd" pool, you could
force update the owner, but you would need to restart your gateways to
pick up the change.

> Thanks.
>
> Respectfully,
>
> Wes Dillingham
> wdilling...@godaddy.com
> Site Reliability Engineer IV - Platform Storage / Ceph
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove RBD mirror?

2019-04-12 Thread Jason Dillaman
On Fri, Apr 12, 2019 at 10:48 AM Magnus Grönlund  wrote:
>
>
>
> Den fre 12 apr. 2019 kl 16:37 skrev Jason Dillaman :
>>
>> On Fri, Apr 12, 2019 at 9:52 AM Magnus Grönlund  wrote:
>> >
>> > Hi Jason,
>> >
>> > Tried to follow the instructions and setting the debug level to 15 worked 
>> > OK, but the daemon appeared to silently ignore the restart command 
>> > (nothing indicating a restart seen in the log).
>> > So I set the log level to 15 in the config file and restarted the rbd 
>> > mirror daemon. The output surprised me though, my previous perception of 
>> > the issue might be completely wrong...
>> > Lots of "image_replayer::BootstrapRequest: failed to create local 
>> > image: (2) No such file or directory" and ":ImageReplayer:   replay 
>> > encountered an error: (42) No message of desired type"
>>
>> What is the result from "rbd mirror pool status --verbose nova"
>> against your DR cluster now? Are they in up+error now? The ENOENT
>> errors most likely related to a parent image that hasn't been
>> mirrored. The ENOMSG error seems to indicate that there might be some
>> corruption in a journal and it's missing expected records (like a
>> production client crashed), but it should be able to recover from
>> that
>
>
> # rbd mirror pool status --verbose nova
> health: WARNING
> images: 2479 total
> 2479 unknown

Odd, so those log messages were probably related to the two images in
the glance pool.  Unfortunately, v12.2.x will actually require "debug
rbd_mirror = 20" to see the progression in the state machines, which
will result in a huge log. Any chance you are willing to collect that
data for a few minutes at that high log level and upload the
compressed log somewhere? You can use "ceph-post-file" if needed.

> 002344ab-c324-4c01-97ff-de32868fa712_disk:
>   global_id:   c02e0202-df8f-46ce-a4b6-1a50a9692804
>   state:   down+unknown
>   description: status not found
>   last_update:
>
> 002a8fde-3a63-4e32-9c18-b0bf64393d0f_disk:
>   global_id:   d412abc4-b37e-44a2-8aba-107f352dec60
>   state:   down+unknown
>   description: status not found
>   last_update:
>
> 
>
>
>>
>> > https://pastebin.com/1bTETNGs
>> >
>> > Best regards
>> > /Magnus
>> >
>> > Den tis 9 apr. 2019 kl 18:35 skrev Jason Dillaman :
>> >>
>> >> Can you pastebin the results from running the following on your backup
>> >> site rbd-mirror daemon node?
>> >>
>> >> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 15
>> >> ceph --admin-socket /path/to/asok rbd mirror restart nova
>> >>  wait a minute to let some logs accumulate ...
>> >> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 0/5
>> >>
>> >> ... and collect the rbd-mirror log from /var/log/ceph/ (should have
>> >> lots of "rbd::mirror"-like log entries.
>> >>
>> >>
>> >> On Tue, Apr 9, 2019 at 12:23 PM Magnus Grönlund  
>> >> wrote:
>> >> >
>> >> >
>> >> >
>> >> > Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman :
>> >> >>
>> >> >> Any chance your rbd-mirror daemon has the admin sockets available
>> >> >> (defaults to /var/run/ceph/cephdr-clientasok)? If
>> >> >> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status".
>> >> >
>> >> >
>> >> > {
>> >> > "pool_replayers": [
>> >> > {
>> >> > "pool": "glance",
>> >> > "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 
>> >> > cluster: production client: client.productionbackup",
>> >> > "instance_id": "869081",
>> >> > "leader_instance_id": "869081",
>> >> > "leader": true,
>> >> > "instances": [],
>> >> > "local_cluster_admin_socket": 
>> >> > "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok",
>> >> > "remote_cluster_admin_socket": 
>> >> > "/var/run/ceph/client.productionbackup.1936211.production.9422567521.asok",
>> >> > "sync_throttler": {
&g

Re: [ceph-users] Remove RBD mirror?

2019-04-12 Thread Jason Dillaman
On Fri, Apr 12, 2019 at 9:52 AM Magnus Grönlund  wrote:
>
> Hi Jason,
>
> Tried to follow the instructions and setting the debug level to 15 worked OK, 
> but the daemon appeared to silently ignore the restart command (nothing 
> indicating a restart seen in the log).
> So I set the log level to 15 in the config file and restarted the rbd mirror 
> daemon. The output surprised me though, my previous perception of the issue 
> might be completely wrong...
> Lots of "image_replayer::BootstrapRequest: failed to create local image: 
> (2) No such file or directory" and ":ImageReplayer:   replay encountered 
> an error: (42) No message of desired type"

What is the result from "rbd mirror pool status --verbose nova"
against your DR cluster now? Are they in up+error now? The ENOENT
errors most likely related to a parent image that hasn't been
mirrored. The ENOMSG error seems to indicate that there might be some
corruption in a journal and it's missing expected records (like a
production client crashed), but it should be able to recover from
that.

> https://pastebin.com/1bTETNGs
>
> Best regards
> /Magnus
>
> Den tis 9 apr. 2019 kl 18:35 skrev Jason Dillaman :
>>
>> Can you pastebin the results from running the following on your backup
>> site rbd-mirror daemon node?
>>
>> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 15
>> ceph --admin-socket /path/to/asok rbd mirror restart nova
>>  wait a minute to let some logs accumulate ...
>> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 0/5
>>
>> ... and collect the rbd-mirror log from /var/log/ceph/ (should have
>> lots of "rbd::mirror"-like log entries.
>>
>>
>> On Tue, Apr 9, 2019 at 12:23 PM Magnus Grönlund  wrote:
>> >
>> >
>> >
>> > Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman :
>> >>
>> >> Any chance your rbd-mirror daemon has the admin sockets available
>> >> (defaults to /var/run/ceph/cephdr-clientasok)? If
>> >> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status".
>> >
>> >
>> > {
>> > "pool_replayers": [
>> > {
>> > "pool": "glance",
>> > "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 cluster: 
>> > production client: client.productionbackup",
>> > "instance_id": "869081",
>> > "leader_instance_id": "869081",
>> > "leader": true,
>> > "instances": [],
>> > "local_cluster_admin_socket": 
>> > "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok",
>> > "remote_cluster_admin_socket": 
>> > "/var/run/ceph/client.productionbackup.1936211.production.9422567521.asok",
>> > "sync_throttler": {
>> > "max_parallel_syncs": 5,
>> > "running_syncs": 0,
>> > "waiting_syncs": 0
>> > },
>> > "image_replayers": [
>> > {
>> > "name": "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0",
>> > "state": "Replaying"
>> > },
>> > {
>> > "name": "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62",
>> > "state": "Replaying"
>> > },
>> > ---cut--
>> > {
>> > "name": 
>> > "cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05",
>> > "state": "Replaying"
>> > }
>> > ]
>> > },
>> >  {
>> > "pool": "nova",
>> > "peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702 cluster: 
>> > production client: client.productionbackup",
>> > "instance_id": "889074",
>> > "leader_instance_id": "889074",
>> >     "leader": true,
>> > "instances": [],
>> > "local_cluster_admin_socket": 
>>

Re: [ceph-users] Glance client and RBD export checksum mismatch

2019-04-11 Thread Jason Dillaman
On Thu, Apr 11, 2019 at 8:49 AM Erik McCormick
 wrote:
>
>
>
> On Thu, Apr 11, 2019, 8:39 AM Erik McCormick  
> wrote:
>>
>>
>>
>> On Thu, Apr 11, 2019, 12:07 AM Brayan Perera  wrote:
>>>
>>> Dear Jason,
>>>
>>>
>>> Thanks for the reply.
>>>
>>> We are using python 2.7.5
>>>
>>> Yes. script is based on openstack code.
>>>
>>> As suggested, we have tried chunk_size 32 and 64, and both giving same
>>> incorrect checksum value.
>>
>>
>> The value of rbd_store_chunk_size in glance is expressed in MB and then 
>> converted to mb. I think the default is 8, so you would want 8192 if you're 
>> trying to match what the image was uploaded with.
>
>
> Sorry, that should have been "...converted to KB."

Wouldn't it be converted to bytes since all rbd API methods are in bytes? [1]

>>
>>>
>>> We tried to copy same image in different pool and resulted same
>>> incorrect checksum.
>>>
>>>
>>> Thanks & Regards,
>>> Brayan
>>>
>>> On Wed, Apr 10, 2019 at 6:21 PM Jason Dillaman  wrote:
>>> >
>>> > On Wed, Apr 10, 2019 at 1:46 AM Brayan Perera  
>>> > wrote:
>>> > >
>>> > > Dear All,
>>> > >
>>> > > Ceph Version : 12.2.5-2.ge988fb6.el7
>>> > >
>>> > > We are facing an issue on glance which have backend set to ceph, when
>>> > > we try to create an instance or volume out of an image, it throws
>>> > > checksum error.
>>> > > When we use rbd export and use md5sum, value is matching with glance 
>>> > > checksum.
>>> > >
>>> > > When we use following script, it provides same error checksum as glance.
>>> >
>>> > What version of Python are you using?
>>> >
>>> > > We have used below images for testing.
>>> > > 1. Failing image (checksum mismatch): 
>>> > > ffed4088-74e1-4f22-86cb-35e7e97c377c
>>> > > 2. Passing image (checksum identical): 
>>> > > c048f0f9-973d-4285-9397-939251c80a84
>>> > >
>>> > > Output from storage node:
>>> > >
>>> > > 1. Failing image: ffed4088-74e1-4f22-86cb-35e7e97c377c
>>> > > checksum from glance database: 34da2198ec7941174349712c6d2096d8
>>> > > [root@storage01moc ~]# python test_rbd_format.py
>>> > > ffed4088-74e1-4f22-86cb-35e7e97c377c admin
>>> > > Image size: 681181184
>>> > > checksum from ceph: b82d85ae5160a7b74f52be6b5871f596
>>> > > Remarks: checksum is different
>>> > >
>>> > > 2. Passing image: c048f0f9-973d-4285-9397-939251c80a84
>>> > > checksum from glance database: 4f977f748c9ac2989cff32732ef740ed
>>> > > [root@storage01moc ~]# python test_rbd_format.py
>>> > > c048f0f9-973d-4285-9397-939251c80a84 admin
>>> > > Image size: 1411121152
>>> > > checksum from ceph: 4f977f748c9ac2989cff32732ef740ed
>>> > > Remarks: checksum is identical
>>> > >
>>> > > Wondering whether this issue is from ceph python libs or from ceph 
>>> > > itself.
>>> > >
>>> > > Please note that we do not have ceph pool tiering configured.
>>> > >
>>> > > Please let us know whether anyone faced similar issue and any fixes for 
>>> > > this.
>>> > >
>>> > > test_rbd_format.py
>>> > > ===
>>> > > import rados, sys, rbd
>>> > >
>>> > > image_id = sys.argv[1]
>>> > > try:
>>> > > rados_id = sys.argv[2]
>>> > > except:
>>> > > rados_id = 'openstack'
>>> > >
>>> > >
>>> > > class ImageIterator(object):
>>> > > """
>>> > > Reads data from an RBD image, one chunk at a time.
>>> > > """
>>> > >
>>> > > def __init__(self, conn, pool, name, snapshot, store, 
>>> > > chunk_size='8'):
>>> >
>>> > Am I correct in assuming this was adapted from OpenStack code? That
>>> > 8-byte "chunk" is going to be terribly inefficient to compute a CRC.
>>> > Not that it should matter, but does it still 

Re: [ceph-users] Glance client and RBD export checksum mismatch

2019-04-11 Thread Jason Dillaman
On Thu, Apr 11, 2019 at 12:07 AM Brayan Perera  wrote:
>
> Dear Jason,
>
>
> Thanks for the reply.
>
> We are using python 2.7.5
>
> Yes. script is based on openstack code.
>
> As suggested, we have tried chunk_size 32 and 64, and both giving same
> incorrect checksum value.
>
> We tried to copy same image in different pool and resulted same
> incorrect checksum.

My best guess is that there is some odd encoding format issues between
the raw byte stream and Python strings. Can you tweak your Python code
to generate a md5sum for each chunk (let's say 4MiB per chunk to match
the object size) and compare that to a 4MiB chunked "md5sum" CLI
results from the associated "rbd export" data file (split -b 4194304
--filter=md5sum). That will allow you to isolate the issue down to a
specific section of the image.

>
> Thanks & Regards,
> Brayan
>
> On Wed, Apr 10, 2019 at 6:21 PM Jason Dillaman  wrote:
> >
> > On Wed, Apr 10, 2019 at 1:46 AM Brayan Perera  
> > wrote:
> > >
> > > Dear All,
> > >
> > > Ceph Version : 12.2.5-2.ge988fb6.el7
> > >
> > > We are facing an issue on glance which have backend set to ceph, when
> > > we try to create an instance or volume out of an image, it throws
> > > checksum error.
> > > When we use rbd export and use md5sum, value is matching with glance 
> > > checksum.
> > >
> > > When we use following script, it provides same error checksum as glance.
> >
> > What version of Python are you using?
> >
> > > We have used below images for testing.
> > > 1. Failing image (checksum mismatch): ffed4088-74e1-4f22-86cb-35e7e97c377c
> > > 2. Passing image (checksum identical): 
> > > c048f0f9-973d-4285-9397-939251c80a84
> > >
> > > Output from storage node:
> > >
> > > 1. Failing image: ffed4088-74e1-4f22-86cb-35e7e97c377c
> > > checksum from glance database: 34da2198ec7941174349712c6d2096d8
> > > [root@storage01moc ~]# python test_rbd_format.py
> > > ffed4088-74e1-4f22-86cb-35e7e97c377c admin
> > > Image size: 681181184
> > > checksum from ceph: b82d85ae5160a7b74f52be6b5871f596
> > > Remarks: checksum is different
> > >
> > > 2. Passing image: c048f0f9-973d-4285-9397-939251c80a84
> > > checksum from glance database: 4f977f748c9ac2989cff32732ef740ed
> > > [root@storage01moc ~]# python test_rbd_format.py
> > > c048f0f9-973d-4285-9397-939251c80a84 admin
> > > Image size: 1411121152
> > > checksum from ceph: 4f977f748c9ac2989cff32732ef740ed
> > > Remarks: checksum is identical
> > >
> > > Wondering whether this issue is from ceph python libs or from ceph itself.
> > >
> > > Please note that we do not have ceph pool tiering configured.
> > >
> > > Please let us know whether anyone faced similar issue and any fixes for 
> > > this.
> > >
> > > test_rbd_format.py
> > > ===
> > > import rados, sys, rbd
> > >
> > > image_id = sys.argv[1]
> > > try:
> > > rados_id = sys.argv[2]
> > > except:
> > > rados_id = 'openstack'
> > >
> > >
> > > class ImageIterator(object):
> > > """
> > > Reads data from an RBD image, one chunk at a time.
> > > """
> > >
> > > def __init__(self, conn, pool, name, snapshot, store, chunk_size='8'):
> >
> > Am I correct in assuming this was adapted from OpenStack code? That
> > 8-byte "chunk" is going to be terribly inefficient to compute a CRC.
> > Not that it should matter, but does it still fail if you increase this
> > to 32KiB or 64KiB?
> >
> > > self.pool = pool
> > > self.conn = conn
> > > self.name = name
> > > self.snapshot = snapshot
> > > self.chunk_size = chunk_size
> > > self.store = store
> > >
> > > def __iter__(self):
> > > try:
> > > with conn.open_ioctx(self.pool) as ioctx:
> > > with rbd.Image(ioctx, self.name,
> > >snapshot=self.snapshot) as image:
> > > img_info = image.stat()
> > > size = img_info['size']
> > > bytes_left = size
> > > while bytes_left > 0:
> > > length = min(self.c

Re: [ceph-users] Glance client and RBD export checksum mismatch

2019-04-10 Thread Jason Dillaman
On Wed, Apr 10, 2019 at 1:46 AM Brayan Perera  wrote:
>
> Dear All,
>
> Ceph Version : 12.2.5-2.ge988fb6.el7
>
> We are facing an issue on glance which have backend set to ceph, when
> we try to create an instance or volume out of an image, it throws
> checksum error.
> When we use rbd export and use md5sum, value is matching with glance checksum.
>
> When we use following script, it provides same error checksum as glance.

What version of Python are you using?

> We have used below images for testing.
> 1. Failing image (checksum mismatch): ffed4088-74e1-4f22-86cb-35e7e97c377c
> 2. Passing image (checksum identical): c048f0f9-973d-4285-9397-939251c80a84
>
> Output from storage node:
>
> 1. Failing image: ffed4088-74e1-4f22-86cb-35e7e97c377c
> checksum from glance database: 34da2198ec7941174349712c6d2096d8
> [root@storage01moc ~]# python test_rbd_format.py
> ffed4088-74e1-4f22-86cb-35e7e97c377c admin
> Image size: 681181184
> checksum from ceph: b82d85ae5160a7b74f52be6b5871f596
> Remarks: checksum is different
>
> 2. Passing image: c048f0f9-973d-4285-9397-939251c80a84
> checksum from glance database: 4f977f748c9ac2989cff32732ef740ed
> [root@storage01moc ~]# python test_rbd_format.py
> c048f0f9-973d-4285-9397-939251c80a84 admin
> Image size: 1411121152
> checksum from ceph: 4f977f748c9ac2989cff32732ef740ed
> Remarks: checksum is identical
>
> Wondering whether this issue is from ceph python libs or from ceph itself.
>
> Please note that we do not have ceph pool tiering configured.
>
> Please let us know whether anyone faced similar issue and any fixes for this.
>
> test_rbd_format.py
> ===
> import rados, sys, rbd
>
> image_id = sys.argv[1]
> try:
> rados_id = sys.argv[2]
> except:
> rados_id = 'openstack'
>
>
> class ImageIterator(object):
> """
> Reads data from an RBD image, one chunk at a time.
> """
>
> def __init__(self, conn, pool, name, snapshot, store, chunk_size='8'):

Am I correct in assuming this was adapted from OpenStack code? That
8-byte "chunk" is going to be terribly inefficient to compute a CRC.
Not that it should matter, but does it still fail if you increase this
to 32KiB or 64KiB?

> self.pool = pool
> self.conn = conn
> self.name = name
> self.snapshot = snapshot
> self.chunk_size = chunk_size
> self.store = store
>
> def __iter__(self):
> try:
> with conn.open_ioctx(self.pool) as ioctx:
> with rbd.Image(ioctx, self.name,
>snapshot=self.snapshot) as image:
> img_info = image.stat()
> size = img_info['size']
> bytes_left = size
> while bytes_left > 0:
> length = min(self.chunk_size, bytes_left)
> data = image.read(size - bytes_left, length)
> bytes_left -= len(data)
> yield data
> raise StopIteration()
> except rbd.ImageNotFound:
> raise exceptions.NotFound(
> _('RBD image %s does not exist') % self.name)
>
> conn = rados.Rados(conffile='/etc/ceph/ceph.conf',rados_id=rados_id)
> conn.connect()
>
>
> with conn.open_ioctx('images') as ioctx:
> try:
> with rbd.Image(ioctx, image_id,
>snapshot='snap') as image:
> img_info = image.stat()
> print "Image size: %s " % img_info['size']
> iter, size = (ImageIterator(conn, 'images', image_id,
> 'snap', 'rbd'), img_info['size'])
> import six, hashlib
> md5sum = hashlib.md5()
> for chunk in iter:
> if isinstance(chunk, six.string_types):
> chunk = six.b(chunk)
> md5sum.update(chunk)
> md5sum = md5sum.hexdigest()
> print "checksum from ceph: " + md5sum
> except:
> raise
> ===
>
>
> Thank You !
>
> --
> Best Regards,
> Brayan Perera
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove RBD mirror?

2019-04-09 Thread Jason Dillaman
Can you pastebin the results from running the following on your backup
site rbd-mirror daemon node?

ceph --admin-socket /path/to/asok config set debug_rbd_mirror 15
ceph --admin-socket /path/to/asok rbd mirror restart nova
 wait a minute to let some logs accumulate ...
ceph --admin-socket /path/to/asok config set debug_rbd_mirror 0/5

... and collect the rbd-mirror log from /var/log/ceph/ (should have
lots of "rbd::mirror"-like log entries.


On Tue, Apr 9, 2019 at 12:23 PM Magnus Grönlund  wrote:
>
>
>
> Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman :
>>
>> Any chance your rbd-mirror daemon has the admin sockets available
>> (defaults to /var/run/ceph/cephdr-clientasok)? If
>> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status".
>
>
> {
> "pool_replayers": [
> {
> "pool": "glance",
> "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 cluster: 
> production client: client.productionbackup",
> "instance_id": "869081",
> "leader_instance_id": "869081",
> "leader": true,
> "instances": [],
> "local_cluster_admin_socket": 
> "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok",
> "remote_cluster_admin_socket": 
> "/var/run/ceph/client.productionbackup.1936211.production.9422567521.asok",
> "sync_throttler": {
> "max_parallel_syncs": 5,
> "running_syncs": 0,
> "waiting_syncs": 0
> },
> "image_replayers": [
> {
> "name": "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0",
> "state": "Replaying"
> },
> {
> "name": "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62",
> "state": "Replaying"
> },
> ---cut--
> {
> "name": 
> "cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05",
> "state": "Replaying"
> }
> ]
> },
>  {
> "pool": "nova",
> "peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702 cluster: 
> production client: client.productionbackup",
> "instance_id": "889074",
> "leader_instance_id": "889074",
> "leader": true,
> "instances": [],
> "local_cluster_admin_socket": 
> "/var/run/ceph/client.backup.1936211.backup.94225678548048.asok",
> "remote_cluster_admin_socket": 
> "/var/run/ceph/client.productionbackup.1936211.production.94225679621728.asok",
> "sync_throttler": {
> "max_parallel_syncs": 5,
> "running_syncs": 0,
> "waiting_syncs": 0
> },
> "image_replayers": []
> }
> ],
> "image_deleter": {
> "image_deleter_status": {
> "delete_images_queue": [
> {
> "local_pool_id": 3,
> "global_image_id": "ff531159-de6f-4324-a022-50c079dedd45"
> }
> ],
> "failed_deletes_queue": []
> }
>>
>>
>> On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund  wrote:
>> >
>> >
>> >
>> > Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman :
>> >>
>> >> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund  
>> >> wrote:
>> >> >
>> >> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund  
>> >> > >wrote:
>> >> > >>
>> >> > >> Hi,
>> >> > >> We have configured one-way replication of pools between a production 
>> >> > >> cluster and a backup cluster. But unfortunately the rbd-mirror or 
>> >> > >> the backup cluster is unable to keep up with the production cluster 
>> >> > >> so the replication fails to reach replaying state.
>> >> > >
>

Re: [ceph-users] Remove RBD mirror?

2019-04-09 Thread Jason Dillaman
Any chance your rbd-mirror daemon has the admin sockets available
(defaults to /var/run/ceph/cephdr-clientasok)? If
so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status".

On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund  wrote:
>
>
>
> Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman :
>>
>> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund  wrote:
>> >
>> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund  
>> > >wrote:
>> > >>
>> > >> Hi,
>> > >> We have configured one-way replication of pools between a production 
>> > >> cluster and a backup cluster. But unfortunately the rbd-mirror or the 
>> > >> backup cluster is unable to keep up with the production cluster so the 
>> > >> replication fails to reach replaying state.
>> > >
>> > >Hmm, it's odd that they don't at least reach the replaying state. Are
>> > >they still performing the initial sync?
>> >
>> > There are three pools we try to mirror, (glance, cinder, and nova, no 
>> > points for guessing what the cluster is used for :) ),
>> > the glance and cinder pools are smaller and sees limited write activity, 
>> > and the mirroring works, the nova pool which is the largest and has 90% of 
>> > the write activity never leaves the "unknown" state.
>> >
>> > # rbd mirror pool status cinder
>> > health: OK
>> > images: 892 total
>> > 890 replaying
>> > 2 stopped
>> > #
>> > # rbd mirror pool status nova
>> > health: WARNING
>> > images: 2479 total
>> > 2479 unknown
>> > #
>> > The production clsuter has 5k writes/s on average and the backup cluster 
>> > has 1-2k writes/s on average. The production cluster is bigger and has 
>> > better specs. I thought that the backup cluster would be able to keep up 
>> > but it looks like I was wrong.
>>
>> The fact that they are in the unknown state just means that the remote
>> "rbd-mirror" daemon hasn't started any journal replayers against the
>> images. If it couldn't keep up, it would still report a status of
>> "up+replaying". What Ceph release are you running on your backup
>> cluster?
>>
> The backup cluster is running Luminous 12.2.11 (the production cluster 
> 12.2.10)
>
>>
>> > >> And the journals on the rbd volumes keep growing...
>> > >>
>> > >> Is it enough to simply disable the mirroring of the pool  (rbd mirror 
>> > >> pool disable ) and that will remove the lagging reader from the 
>> > >> journals and shrink them, or is there anything else that has to be done?
>> > >
>> > >You can either disable the journaling feature on the image(s) since
>> > >there is no point to leave it on if you aren't using mirroring, or run
>> > >"rbd mirror pool disable " to purge the journals.
>> >
>> > Thanks for the confirmation.
>> > I will stop the mirror of the nova pool and try to figure out if there is 
>> > anything we can do to get the backup cluster to keep up.
>> >
>> > >> Best regards
>> > >> /Magnus
>> > >> ___
>> > >> ceph-users mailing list
>> > >> ceph-users@lists.ceph.com
>> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > >
>> > >--
>> > >Jason
>>
>>
>>
>> --
>> Jason



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove RBD mirror?

2019-04-09 Thread Jason Dillaman
On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund  wrote:
>
> >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund  wrote:
> >>
> >> Hi,
> >> We have configured one-way replication of pools between a production 
> >> cluster and a backup cluster. But unfortunately the rbd-mirror or the 
> >> backup cluster is unable to keep up with the production cluster so the 
> >> replication fails to reach replaying state.
> >
> >Hmm, it's odd that they don't at least reach the replaying state. Are
> >they still performing the initial sync?
>
> There are three pools we try to mirror, (glance, cinder, and nova, no points 
> for guessing what the cluster is used for :) ),
> the glance and cinder pools are smaller and sees limited write activity, and 
> the mirroring works, the nova pool which is the largest and has 90% of the 
> write activity never leaves the "unknown" state.
>
> # rbd mirror pool status cinder
> health: OK
> images: 892 total
> 890 replaying
> 2 stopped
> #
> # rbd mirror pool status nova
> health: WARNING
> images: 2479 total
> 2479 unknown
> #
> The production clsuter has 5k writes/s on average and the backup cluster has 
> 1-2k writes/s on average. The production cluster is bigger and has better 
> specs. I thought that the backup cluster would be able to keep up but it 
> looks like I was wrong.

The fact that they are in the unknown state just means that the remote
"rbd-mirror" daemon hasn't started any journal replayers against the
images. If it couldn't keep up, it would still report a status of
"up+replaying". What Ceph release are you running on your backup
cluster?

> >> And the journals on the rbd volumes keep growing...
> >>
> >> Is it enough to simply disable the mirroring of the pool  (rbd mirror pool 
> >> disable ) and that will remove the lagging reader from the journals 
> >> and shrink them, or is there anything else that has to be done?
> >
> >You can either disable the journaling feature on the image(s) since
> >there is no point to leave it on if you aren't using mirroring, or run
> >"rbd mirror pool disable " to purge the journals.
>
> Thanks for the confirmation.
> I will stop the mirror of the nova pool and try to figure out if there is 
> anything we can do to get the backup cluster to keep up.
>
> >> Best regards
> >> /Magnus
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >--
> >Jason



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tune Ceph RBD mirroring parameters to speed up replication

2019-04-09 Thread Jason Dillaman
On Thu, Apr 4, 2019 at 6:27 AM huxia...@horebdata.cn
 wrote:
>
> thanks a lot, Jason.
>
> how much performance loss should i expect by enabling rbd mirroring? I really 
> need to minimize any performance impact while using this disaster recovery 
> feature. Will a dedicated journal on Intel Optane NVMe help? If so, how big 
> the size should be?

The worst-case impact is effectively double the write latency and
bandwidth (since the librbd client needs to journal the IO first
before committing the actual changes to the image). I would definitely
recommend using a separate fast pool for the journal to minimum the
initial journal write latency hit. The librbd in-memory cache in
writeback mode can also help since it can help absorb the additional
latency since the write IO can be (effectively) immediately ACKed if
you have enough space in the cache.

> cheers,
>
> Samuel
>
> 
> huxia...@horebdata.cn
>
>
> From: Jason Dillaman
> Date: 2019-04-03 23:03
> To: huxia...@horebdata.cn
> CC: ceph-users
> Subject: Re: [ceph-users] How to tune Ceph RBD mirroring parameters to speed 
> up replication
> For better or worse, out of the box, librbd and rbd-mirror are
> configured to conserve memory at the expense of performance to support
> the potential case of thousands of images being mirrored and only a
> single "rbd-mirror" daemon attempting to handle the load.
>
> You can optimize writes by adding "rbd_journal_max_payload_bytes =
> 8388608" to the "[client]" section on the librbd client nodes.
> Normally, writes larger than 16KiB are broken into multiple journal
> entries to allow the remote "rbd-mirror" daemon to make forward
> progress w/o using too much memory, so this will ensure large IOs only
> require a single journal entry.
>
> You can also add "rbd_mirror_journal_max_fetch_bytes = 33554432" to
> the "[client]" section on the "rbd-mirror" daemon nodes and restart
> the daemon for the change to take effect. Normally, the daemon tries
> to nibble the per-image journal events to prevent excessive memory use
> in the case where potentially thousands of images are being mirrored.
>
> On Wed, Apr 3, 2019 at 4:34 PM huxia...@horebdata.cn
>  wrote:
> >
> > Hello, folks,
> >
> > I am setting up two ceph clusters to test async replication via RBD 
> > mirroring. The two clusters are very close, just in two buildings about 20m 
> > away, and the networking is very good as well, 10Gb Fiber connection. In 
> > this case, how should i tune the relevant RBD mirroring parameters to 
> > accelerate the replication?
> >
> > thanks in advance,
> >
> > Samuel
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove RBD mirror?

2019-04-09 Thread Jason Dillaman
On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund  wrote:
>
> Hi,
> We have configured one-way replication of pools between a production cluster 
> and a backup cluster. But unfortunately the rbd-mirror or the backup cluster 
> is unable to keep up with the production cluster so the replication fails to 
> reach replaying state.

Hmm, it's odd that they don't at least reach the replaying state. Are
they still performing the initial sync?

> And the journals on the rbd volumes keep growing...
>
> Is it enought to simply disable the mirroring of the pool  (rbd mirror pool 
> disable ) and that will remove the lagging reader from the journals and 
> shrink them, or is there any thing else that has to be done?

You can either disable the journaling feature on the image(s) since
there is no point to leave it on if you aren't using mirroring, or run
"rbd mirror pool disable " to purge the journals.

> Best regards
> /Magnus
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Replication not working

2019-04-08 Thread Jason Dillaman
On Mon, Apr 8, 2019 at 9:47 AM Vikas Rana  wrote:
>
> Hi Jason,
>
> On Prod side, we have cluster ceph and on DR side we renamed to cephdr
>
> Accordingly, we renamed the ceph.conf to cephdr.conf on DR side.
>
> This setup used to work and one day we tried to promote the DR to verify the 
> replication and since then it's been a nightmare.
> The resync didn’t work and then we eventually gave up and deleted the pool on 
> DR side to start afresh.
>
> We deleted and recreated the peer relationship also.
>
> Is there any debugging we can do on Prod or DR side to see where its stopping 
> or waiting while "send_open_image"?

You need to add "debug rbd = 20" to both your ceph.conf and
cephdr.conf (if you haven't already) and you would need to provide the
log associated w/ the production cluster connection (see below). Also,
please use pastebin or similar service to avoid mailing the logs to
the list.

> Rbd-mirror is running as "rbd-mirror --cluster=cephdr"
>
>
> Thanks,
> -Vikas
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Monday, April 8, 2019 9:30 AM
> To: Vikas Rana 
> Cc: ceph-users 
> Subject: Re: [ceph-users] Ceph Replication not working
>
> The log appears to be missing all the librbd log messages. The process seems 
> to stop at attempting to open the image from the remote cluster:
>
> 2019-04-05 12:07:29.992323 7f0f3bfff700 20
> rbd::mirror::image_replayer::OpenImageRequest: 0x7f0f28018a20 send_open_image
>
> Assuming you are using the default log file naming settings, the log should 
> be located at "/var/log/ceph/ceph-client.mirrorprod.log". Of course, looking 
> at your cluster naming makes me think that since your primary cluster is 
> named "ceph" on the DR-site side, have you changed your "/etc/default/ceph" 
> file to rename the local cluster from "ceph"
> to "cephdr" so that the "rbd-mirror" daemon connects to the correct local 
> cluster?
>
>
> On Fri, Apr 5, 2019 at 3:28 PM Vikas Rana  wrote:
> >
> > Hi Jason,
> >
> > 12.2.11 is the version.
> >
> > Attached is the complete log file.
> >
> > We removed the pool to make sure there's no image left on DR site and 
> > recreated an empty pool.
> >
> > Thanks,
> > -Vikas
> >
> > -Original Message-
> > From: Jason Dillaman 
> > Sent: Friday, April 5, 2019 2:24 PM
> > To: Vikas Rana 
> > Cc: ceph-users 
> > Subject: Re: [ceph-users] Ceph Replication not working
> >
> > What is the version of rbd-mirror daemon and your OSDs? It looks it found 
> > two replicated images and got stuck on the "wait_for_deletion"
> > step. Since I suspect those images haven't been deleted, it should have 
> > immediately proceeded to the next step of the image replay state machine. 
> > Are there any additional log messages after 2019-04-05 12:07:29.981203?
> >
> > On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana  wrote:
> > >
> > > Hi there,
> > >
> > > We are trying to setup a rbd-mirror replication and after the setup, 
> > > everything looks good but images are not replicating.
> > >
> > >
> > >
> > > Can some please please help?
> > >
> > >
> > >
> > > Thanks,
> > >
> > > -Vikas
> > >
> > >
> > >
> > > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs
> > >
> > > Mode: pool
> > >
> > > Peers:
> > >
> > >   UUID NAME CLIENT
> > >
> > >   bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod
> > >
> > >
> > >
> > > root@local:/etc/ceph# rbd  mirror pool info nfs
> > >
> > > Mode: pool
> > >
> > > Peers:
> > >
> > >   UUID NAME   CLIENT
> > >
> > >   612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr
> > >
> > >
> > >
> > >
> > >
> > > root@local:/etc/ceph# rbd info nfs/test01
> > >
> > > rbd image 'test01':
> > >
> > > size 102400 kB in 25 objects
> > >
> > > order 22 (4096 kB objects)
> > >
> > > block_name_prefix: rbd_data.11cd3c238e1f29
> > >
> > > format: 2
> > >
> > > features: layering, exclusive-lock, object-map, fast-diff,
> > > deep-flatten, journaling
> > >
> > > flag

Re: [ceph-users] Ceph Replication not working

2019-04-08 Thread Jason Dillaman
The log appears to be missing all the librbd log messages. The process
seems to stop at attempting to open the image from the remote cluster:

2019-04-05 12:07:29.992323 7f0f3bfff700 20
rbd::mirror::image_replayer::OpenImageRequest: 0x7f0f28018a20
send_open_image

Assuming you are using the default log file naming settings, the log
should be located at "/var/log/ceph/ceph-client.mirrorprod.log". Of
course, looking at your cluster naming makes me think that since your
primary cluster is named "ceph" on the DR-site side, have you changed
your "/etc/default/ceph" file to rename the local cluster from "ceph"
to "cephdr" so that the "rbd-mirror" daemon connects to the correct
local cluster?


On Fri, Apr 5, 2019 at 3:28 PM Vikas Rana  wrote:
>
> Hi Jason,
>
> 12.2.11 is the version.
>
> Attached is the complete log file.
>
> We removed the pool to make sure there's no image left on DR site and 
> recreated an empty pool.
>
> Thanks,
> -Vikas
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Friday, April 5, 2019 2:24 PM
> To: Vikas Rana 
> Cc: ceph-users 
> Subject: Re: [ceph-users] Ceph Replication not working
>
> What is the version of rbd-mirror daemon and your OSDs? It looks it found two 
> replicated images and got stuck on the "wait_for_deletion"
> step. Since I suspect those images haven't been deleted, it should have 
> immediately proceeded to the next step of the image replay state machine. Are 
> there any additional log messages after 2019-04-05 12:07:29.981203?
>
> On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana  wrote:
> >
> > Hi there,
> >
> > We are trying to setup a rbd-mirror replication and after the setup, 
> > everything looks good but images are not replicating.
> >
> >
> >
> > Can some please please help?
> >
> >
> >
> > Thanks,
> >
> > -Vikas
> >
> >
> >
> > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs
> >
> > Mode: pool
> >
> > Peers:
> >
> >   UUID NAME CLIENT
> >
> >   bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod
> >
> >
> >
> > root@local:/etc/ceph# rbd  mirror pool info nfs
> >
> > Mode: pool
> >
> > Peers:
> >
> >   UUID NAME   CLIENT
> >
> >   612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr
> >
> >
> >
> >
> >
> > root@local:/etc/ceph# rbd info nfs/test01
> >
> > rbd image 'test01':
> >
> > size 102400 kB in 25 objects
> >
> > order 22 (4096 kB objects)
> >
> > block_name_prefix: rbd_data.11cd3c238e1f29
> >
> > format: 2
> >
> > features: layering, exclusive-lock, object-map, fast-diff,
> > deep-flatten, journaling
> >
> > flags:
> >
> > journal: 11cd3c238e1f29
> >
> > mirroring state: enabled
> >
> > mirroring global id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7
> >
> > mirroring primary: true
> >
> >
> >
> >
> >
> > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool status nfs
> > --verbose
> >
> > health: OK
> >
> > images: 0 total
> >
> >
> >
> > root@remote:/var/log/ceph# rbd info nfs/test01
> >
> > rbd: error opening image test01: (2) No such file or directory
> >
> >
> >
> >
> >
> > root@remote:/var/log/ceph# ceph -s --cluster cephdr
> >
> >   cluster:
> >
> > id: ade49174-1f84-4c3c-a93c-b293c3655c93
> >
> > health: HEALTH_WARN
> >
> > noout,nodeep-scrub flag(s) set
> >
> >
> >
> >   services:
> >
> > mon:3 daemons, quorum nidcdvtier1a,nidcdvtier2a,nidcdvtier3a
> >
> > mgr:nidcdvtier1a(active), standbys: nidcdvtier2a
> >
> > osd:12 osds: 12 up, 12 in
> >
> > flags noout,nodeep-scrub
> >
> > rbd-mirror: 1 daemon active
> >
> >
> >
> >   data:
> >
> > pools:   5 pools, 640 pgs
> >
> > objects: 1.32M objects, 5.03TiB
> >
> > usage:   10.1TiB used, 262TiB / 272TiB avail
> >
> > pgs: 640 active+clean
> >
> >
> >
> >   io:
> >
> > client:   170B/s rd, 0B/s wr, 0op/s rd, 0op/s wr
> >
> >
> >
> >
> >
> > 2019-04-05 

Re: [ceph-users] Ceph Replication not working

2019-04-05 Thread Jason Dillaman
What is the version of rbd-mirror daemon and your OSDs? It looks it
found two replicated images and got stuck on the "wait_for_deletion"
step. Since I suspect those images haven't been deleted, it should
have immediately proceeded to the next step of the image replay state
machine. Are there any additional log messages after 2019-04-05
12:07:29.981203?

On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana  wrote:
>
> Hi there,
>
> We are trying to setup a rbd-mirror replication and after the setup, 
> everything looks good but images are not replicating.
>
>
>
> Can some please please help?
>
>
>
> Thanks,
>
> -Vikas
>
>
>
> root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs
>
> Mode: pool
>
> Peers:
>
>   UUID NAME CLIENT
>
>   bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod
>
>
>
> root@local:/etc/ceph# rbd  mirror pool info nfs
>
> Mode: pool
>
> Peers:
>
>   UUID NAME   CLIENT
>
>   612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr
>
>
>
>
>
> root@local:/etc/ceph# rbd info nfs/test01
>
> rbd image 'test01':
>
> size 102400 kB in 25 objects
>
> order 22 (4096 kB objects)
>
> block_name_prefix: rbd_data.11cd3c238e1f29
>
> format: 2
>
> features: layering, exclusive-lock, object-map, fast-diff, 
> deep-flatten, journaling
>
> flags:
>
> journal: 11cd3c238e1f29
>
> mirroring state: enabled
>
> mirroring global id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7
>
> mirroring primary: true
>
>
>
>
>
> root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool status nfs 
> --verbose
>
> health: OK
>
> images: 0 total
>
>
>
> root@remote:/var/log/ceph# rbd info nfs/test01
>
> rbd: error opening image test01: (2) No such file or directory
>
>
>
>
>
> root@remote:/var/log/ceph# ceph -s --cluster cephdr
>
>   cluster:
>
> id: ade49174-1f84-4c3c-a93c-b293c3655c93
>
> health: HEALTH_WARN
>
> noout,nodeep-scrub flag(s) set
>
>
>
>   services:
>
> mon:3 daemons, quorum nidcdvtier1a,nidcdvtier2a,nidcdvtier3a
>
> mgr:nidcdvtier1a(active), standbys: nidcdvtier2a
>
> osd:12 osds: 12 up, 12 in
>
> flags noout,nodeep-scrub
>
> rbd-mirror: 1 daemon active
>
>
>
>   data:
>
> pools:   5 pools, 640 pgs
>
> objects: 1.32M objects, 5.03TiB
>
> usage:   10.1TiB used, 262TiB / 272TiB avail
>
> pgs: 640 active+clean
>
>
>
>   io:
>
> client:   170B/s rd, 0B/s wr, 0op/s rd, 0op/s wr
>
>
>
>
>
> 2019-04-05 12:07:29.720742 7f0fa5e284c0  0 ceph version 12.2.11 
> (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable), process 
> rbd-mirror, pid 3921391
>
> 2019-04-05 12:07:29.721752 7f0fa5e284c0  0 pidfile_write: ignore empty 
> --pid-file
>
> 2019-04-05 12:07:29.726580 7f0fa5e284c0 20 rbd::mirror::ServiceDaemon: 
> 0x560200d29bb0 ServiceDaemon:
>
> 2019-04-05 12:07:29.732654 7f0fa5e284c0 20 rbd::mirror::ServiceDaemon: 
> 0x560200d29bb0 init:
>
> 2019-04-05 12:07:29.734920 7f0fa5e284c0  1 mgrc service_daemon_register 
> rbd-mirror.admin metadata {arch=x86_64,ceph_version=ceph version 12.2.11 
> (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable),cpu=Intel(R) 
> Xeon(R) CPU E5-2690 v2 @ 3.00GHz,distro=ubuntu,distro_description=Ubuntu 
> 14.04.5 
> LTS,distro_version=14.04,hostname=nidcdvtier3a,instance_id=464360,kernel_description=#93
>  SMP Sat Jun 17 04:01:23 EDT 
> 2017,kernel_version=3.19.0-85-vtier,mem_swap_kb=67105788,mem_total_kb=131999112,os=Linux}
>
> 2019-04-05 12:07:29.735779 7f0fa5e284c0 20 rbd::mirror::Mirror: 
> 0x560200d27f90 run: enter
>
> 2019-04-05 12:07:29.735793 7f0fa5e284c0 20 
> rbd::mirror::ClusterWatcher:0x560200dcd930 refresh_pools: enter
>
> 2019-04-05 12:07:29.735809 7f0f77fff700 20 rbd::mirror::ImageDeleter: 
> 0x560200dcd9c0 run: enter
>
> 2019-04-05 12:07:29.735819 7f0f77fff700 20 rbd::mirror::ImageDeleter: 
> 0x560200dcd9c0 run: waiting for delete requests
>
> 2019-04-05 12:07:29.739019 7f0fa5e284c0 10 
> rbd::mirror::ClusterWatcher:0x560200dcd930 read_pool_peers: mirroring is 
> disabled for pool docnfs
>
> 2019-04-05 12:07:29.741090 7f0fa5e284c0 10 
> rbd::mirror::ClusterWatcher:0x560200dcd930 read_pool_peers: mirroring is 
> disabled for pool doccifs
>
> 2019-04-05 12:07:29.742620 7f0fa5e284c0 10 
> rbd::mirror::ClusterWatcher:0x560200dcd930 read_pool_peers: mirroring is 
> disabled for pool fcp-dr
>
> 2019-04-05 12:07:29.76 7f0fa5e284c0 10 
> rbd::mirror::ClusterWatcher:0x560200dcd930 read_pool_peers: mirroring is 
> disabled for pool cifs
>
> 2019-04-05 12:07:29.746958 7f0fa5e284c0 20 rbd::mirror::ServiceDaemon: 
> 0x560200d29bb0 add_pool: pool_id=8, pool_name=nfs
>
> 2019-04-05 12:07:29.748181 7f0fa5e284c0 20 rbd::mirror::Mirror: 
> 0x560200d27f90 update_pool_replayers: enter
>
> 2019-04-05 12:07:29.748212 7f0fa5e284c0 20 rbd::mirror::Mirror: 
> 0x560200d27f90 update_pool_replayers: starting pool replayer for uuid: 

Re: [ceph-users] How to tune Ceph RBD mirroring parameters to speed up replication

2019-04-03 Thread Jason Dillaman
For better or worse, out of the box, librbd and rbd-mirror are
configured to conserve memory at the expense of performance to support
the potential case of thousands of images being mirrored and only a
single "rbd-mirror" daemon attempting to handle the load.

You can optimize writes by adding "rbd_journal_max_payload_bytes =
8388608" to the "[client]" section on the librbd client nodes.
Normally, writes larger than 16KiB are broken into multiple journal
entries to allow the remote "rbd-mirror" daemon to make forward
progress w/o using too much memory, so this will ensure large IOs only
require a single journal entry.

You can also add "rbd_mirror_journal_max_fetch_bytes = 33554432" to
the "[client]" section on the "rbd-mirror" daemon nodes and restart
the daemon for the change to take effect. Normally, the daemon tries
to nibble the per-image journal events to prevent excessive memory use
in the case where potentially thousands of images are being mirrored.

On Wed, Apr 3, 2019 at 4:34 PM huxia...@horebdata.cn
 wrote:
>
> Hello, folks,
>
> I am setting up two ceph clusters to test async replication via RBD 
> mirroring. The two clusters are very close, just in two buildings about 20m 
> away, and the networking is very good as well, 10Gb Fiber connection. In this 
> case, how should i tune the relevant RBD mirroring parameters to accelerate 
> the replication?
>
> thanks in advance,
>
> Samuel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd: error processing image xxx (2) No such file or directory

2019-04-02 Thread Jason Dillaman
On Tue, Apr 2, 2019 at 8:42 AM Eugen Block  wrote:
>
> Hi,
>
> > If you run "rbd snap ls --all", you should see a snapshot in
> > the "trash" namespace.
>
> I just tried the command "rbd snap ls --all" on a lab cluster
> (nautilus) and get this error:
>
> ceph-2:~ # rbd snap ls --all
> rbd: image name was not specified

Sorry -- you need the "" as part of that command.

> Are there any requirements I haven't noticed? This lab cluster was
> upgraded from Mimic a couple of weeks ago.
>
> ceph-2:~ # ceph version
> ceph version 14.1.0-559-gf1a72cff25
> (f1a72cff2522833d16ff057ed43eeaddfc17ea8a) nautilus (dev)
>
> Regards,
> Eugen
>
>
> Zitat von Jason Dillaman :
>
> > On Tue, Apr 2, 2019 at 4:19 AM Nikola Ciprich
> >  wrote:
> >>
> >> Hi,
> >>
> >> on one of my clusters, I'm getting error message which is getting
> >> me a bit nervous.. while listing contents of a pool I'm getting
> >> error for one of images:
> >>
> >> [root@node1 ~]# rbd ls -l nvme > /dev/null
> >> rbd: error processing image  xxx: (2) No such file or directory
> >>
> >> [root@node1 ~]# rbd info nvme/xxx
> >> rbd image 'xxx':
> >> size 60 GiB in 15360 objects
> >> order 22 (4 MiB objects)
> >> id: 132773d6deb56
> >> block_name_prefix: rbd_data.132773d6deb56
> >> format: 2
> >> features: layering, operations
> >> op_features: snap-trash
> >> flags:
> >> create_timestamp: Wed Aug 29 12:25:13 2018
> >>
> >> volume contains production data and seems to be working correctly (it's 
> >> used
> >> by VM)
> >>
> >> is this something to worry about? What is snap-trash feature?
> >> wasn't able to google
> >> much about it..
> >
> > This implies that you are (or were) using transparent image clones and
> > that you deleted a snapshot that had one or more child images attached
> > to it. If you run "rbd snap ls --all", you should see a snapshot in
> > the "trash" namespace. You can also list its child images by running
> > "rbd children --snap-id  ".
> >
> > There definitely is an issue w/ the "rbd ls --long" command in that
> > when it attempts to list all snapshots in the image, it is incorrectly
> > using the snapshot's name instead of it's ID. I've opened a tracker
> > ticket to get the bug fixed [1]. It was fixed in Nautilus but it
> > wasn't flagged for backport to Mimic.
> >
> >> I'm running ceph 13.2.4 on centos 7.
> >>
> >> I'd be gratefull any help
> >>
> >> BR
> >>
> >> nik
> >>
> >>
> >> --
> >> -
> >> Ing. Nikola CIPRICH
> >> LinuxBox.cz, s.r.o.
> >> 28.rijna 168, 709 00 Ostrava
> >>
> >> tel.:   +420 591 166 214
> >> fax:+420 596 621 273
> >> mobil:  +420 777 093 799
> >> www.linuxbox.cz
> >>
> >> mobil servis: +420 737 238 656
> >> email servis: ser...@linuxbox.cz
> >> -
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > [1] http://tracker.ceph.com/issues/39081
> >
> > --
> > Jason
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd: error processing image xxx (2) No such file or directory

2019-04-02 Thread Jason Dillaman
On Tue, Apr 2, 2019 at 4:19 AM Nikola Ciprich
 wrote:
>
> Hi,
>
> on one of my clusters, I'm getting error message which is getting
> me a bit nervous.. while listing contents of a pool I'm getting
> error for one of images:
>
> [root@node1 ~]# rbd ls -l nvme > /dev/null
> rbd: error processing image  xxx: (2) No such file or directory
>
> [root@node1 ~]# rbd info nvme/xxx
> rbd image 'xxx':
> size 60 GiB in 15360 objects
> order 22 (4 MiB objects)
> id: 132773d6deb56
> block_name_prefix: rbd_data.132773d6deb56
> format: 2
> features: layering, operations
> op_features: snap-trash
> flags:
> create_timestamp: Wed Aug 29 12:25:13 2018
>
> volume contains production data and seems to be working correctly (it's used
> by VM)
>
> is this something to worry about? What is snap-trash feature? wasn't able to 
> google
> much about it..

This implies that you are (or were) using transparent image clones and
that you deleted a snapshot that had one or more child images attached
to it. If you run "rbd snap ls --all", you should see a snapshot in
the "trash" namespace. You can also list its child images by running
"rbd children --snap-id  ".

There definitely is an issue w/ the "rbd ls --long" command in that
when it attempts to list all snapshots in the image, it is incorrectly
using the snapshot's name instead of it's ID. I've opened a tracker
ticket to get the bug fixed [1]. It was fixed in Nautilus but it
wasn't flagged for backport to Mimic.

> I'm running ceph 13.2.4 on centos 7.
>
> I'd be gratefull any help
>
> BR
>
> nik
>
>
> --
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
>
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[1] http://tracker.ceph.com/issues/39081

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-iscsi: (Config.lock) Timed out (30s) waiting for excl lock on gateway.conf object

2019-04-01 Thread Jason Dillaman
What happens when you run "rados -p rbd lock list gateway.conf"?

On Fri, Mar 29, 2019 at 12:19 PM Matthias Leopold
 wrote:
>
> Hi,
>
> I upgraded my test Ceph iSCSI gateways to
> ceph-iscsi-3.0-6.g433bbaa.el7.noarch.
> I'm trying to use the new parameter "cluster_client_name", which - to me
> - sounds like I don't have to access the ceph cluster as "client.admin"
> anymore. I created a "client.iscsi" user and watched what happened. The
> gateways can obviously read the config (which I created when I was still
> client.admin), but when I try to change anything (like create a new disk
> in pool "iscsi") I get the following error:
>
> (Config.lock) Timed out (30s) waiting for excl lock on gateway.conf object
>
> I suspect this is related to the privileges of "client.iscsi", but I
> couldn't find the correct settings yet. The last thing I tried was:
>
> caps: [mon] allow r, allow command "osd blacklist"
> caps: [osd] allow * pool=rbd, profile rbd pool=iscsi
>
> Can anybody tell me how to solve this?
> My Ceph version is 12.2.10 on CentOS 7.
>
> thx
> Matthias
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Resizing a cache tier rbd

2019-03-27 Thread Jason Dillaman
For upstream, "deprecated" might be too strong of a word; however, it
is strongly cautioned against using [1]. There is ongoing work to
replace cache tiering with a new implementation that hopefully works
better and avoids lots of the internal edge cases that the cache
tiering v1 design required.


[1] 
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#a-word-of-caution

On Wed, Mar 27, 2019 at 7:07 AM Sergey Malinin  wrote:
>
> March 27, 2019 1:09 PM, "Fyodor Ustinov"  wrote:
>
> Tiering - deprecated? Where can I read more about this?
>
>
> Looks like it was deprecated in Red Hat Ceph Storage in 2016:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-October/thread.html#13867
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2.0/html/release_notes/deprecated_functionality



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Resizing a cache tier rbd

2019-03-26 Thread Jason Dillaman
When using cache pools (which are essentially deprecated functionality
BTW), you should always reference the base tier pool. The fact that a
cache tier sits in front of a slower, base tier is transparently
handled.

On Tue, Mar 26, 2019 at 5:41 PM Götz Reinicke
 wrote:
>
> Hi,
>
> I have a rbd in a cache tier setup which I need to extend.  The question is, 
> do I resize it trough the cache pool or directly on the slow/storage pool? Or 
> dosen t that matter at all?
>
>
> Thanks for feedback and regards . Götz
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Mirror Image Resync

2019-03-26 Thread Jason Dillaman
On Fri, Mar 22, 2019 at 8:38 AM Vikas Rana  wrote:
>
> Hi Jason,
>
> Thanks you for your help and support.
>
>
> One last question, after the demotion and promotion and when you do a resync 
> again, does it copies the whole image again or sends just the changes since 
> the last journal update?

Right now, it will copy the entire image. There is still a long(er)
term plan to get support from the OSDs to deeply delete a backing
object which would be needed in the case where a snapshot exists on
the image and you need to resync the non-HEAD revision. Once that
support is in-place, we can tweak the resync logic to only copy the
deltas by comparing hashes of the objects.

> I'm trying to estimate how long will it take to get a 200TB image in sync.
>
> Thanks,
> -Vikas
>
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Wednesday, March 13, 2019 4:49 PM
> To: Vikas Rana 
> Subject: Re: [ceph-users] RBD Mirror Image Resync
>
> On Wed, Mar 13, 2019 at 4:42 PM Vikas Rana  wrote:
> >
> > Thanks Jason for your response.
> >
> > From the documents, I believe the resync has to be run where rbd-mirror 
> > daemon is running.
> > Rbd-mirror is running on the DR site and that’s where we issued the resync.
>
> You would need rbd-mirror daemon configured and running against both 
> clusters. The "resync" request just adds a flag to the specified image which 
> the local "rbd-mirror" daemon discovers and then starts to pull the image 
> down from the remote cluster. So again, the correct procedure is to initiate 
> the resync against the out-of-sync image you want to delete/recreate, wait 
> for it to complete, then demote the current primary image, and promote the 
> newly resynced image to primary.
>
> > Should we do it on Prod site?
> > Here's the Prod status
> > :~# rbd info nfs/dir_research
> > rbd image 'dir_research':
> > size 200 TB in 52428800 objects
> > order 22 (4096 kB objects)
> > block_name_prefix: rbd_data.edd65238e1f29
> > format: 2
> > features: layering, exclusive-lock, journaling
> > flags:
> > journal: edd65238e1f29
> > mirroring state: enabled
> > mirroring global id: 3ad67d0c-e06b-406a-9469-4e5faedd09a4
> > mirroring primary: true
>
> Are you sure this is the prod site? The image id is different from the dump 
> below.
>
> >
> >
> > What does "starting_replay" means?
>
> Given that the state is "down+unknown", I think it's just an odd, left-over 
> status message. The "down" indicates that you do not have a 
> running/functional "rbd-mirror" daemon running against cluster "cephdr". If 
> it is running, I would check its log messages to see if any errors are being 
> spit out.
>
> > Thanks,
> > -Vikas
> >
> > -Original Message-
> > From: Jason Dillaman 
> > Sent: Wednesday, March 13, 2019 3:44 PM
> > To: Vikas Rana 
> > Cc: ceph-users 
> > Subject: Re: [ceph-users] RBD Mirror Image Resync
> >
> > On Tue, Mar 12, 2019 at 11:09 PM Vikas Rana  wrote:
> > >
> > > Hi there,
> > >
> > >
> > >
> > > We are replicating a RBD image from Primary to DR site using RBD 
> > > mirroring.
> > >
> > > On Primary, we were using 10.2.10.
> >
> > Just a note that Jewel is end-of-life upstream.
> >
> > > DR site is luminous and we promoted the DR copy to test the failure. 
> > > Everything checked out good.
> > >
> > >
> > >
> > > Now we are trying to restart the replication and we did the demote
> > > and then resync the image but it stuck in “starting_replay” state
> > > for last
> > > 3 days. It’s a 200TB RBD image
> >
> > You would need to run "rbd --cluster  mirror image resync 
> > nfs/dir_research" and wait for that to complete *before* demoting the 
> > primary image on cluster "cephdr". Without a primary image, there is 
> > nothing to resync against.
> >
> > >
> > >
> > > :~# rbd --cluster cephdr mirror pool status nfs --verbose
> > >
> > > health: WARNING
> > >
> > > images: 1 total
> > >
> > > 1 starting_replay
> > >
> > >
> > >
> > > dir_research:
> > >
> > >   global_id:   3ad67d0c-e06b-406a-9469-4e5faedd09a4
> > >
> > >   state:   down+unknown
> > >
> > >   description: status not found
> > &g

Re: [ceph-users] 答复: CEPH ISCSI LIO multipath change delay

2019-03-21 Thread Jason Dillaman
It's just the design of the iSCSI protocol. Sure, you can lower the
timeouts (see "fast_io_fail_tmo" [1]) but you will just end up w/ more
false-positive failovers.

[1] http://docs.ceph.com/docs/master/rbd/iscsi-initiator-linux/

On Thu, Mar 21, 2019 at 10:46 AM li jerry  wrote:
>
> Hi Maged
>
> thank you for your reply.
>
> To exclude the osd_heartbeat_interval and osd_heartbeat_grace factors, I 
> cleared the current lio configuration, redeployed two CENTOS7 (not in any 
> ceph role), and deployed rbd-target-api, rbd-target-gw, trum-runner on it. ;
>
> And do the following test
> 1. centos7 client mounts iscsi lun
> 2, write data to iscsi lun through dd
> 3. Close the target node that is active. (forced power off)
>
> [18:33:48 ] active target node power off
> [18:33:57] centos7 client found iscsi target interrupted
> [18:34:23] centos7 client converts to another target node
>
>
> The whole process lasted for 35 seconds, and ceph was always healthy during 
> the test.
>
> This conversion time is too long to reach the production level. Do I still 
> have a place to optimize?
>
>
> Below is the centos7 client log [messages]
> 
>
> Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: ping timeout of 5 
> secs expired, recv timeout 5, last rx 4409486146, last ping 4409491148, now 
> 4409496160
> Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: detected conn error 
> (1022)
> Mar 21 18:33:57 CEPH-client01test iscsid: Kernel reported iSCSI connection 
> 4:0 error (1022 - Invalid or unknown error code) state (3)
> Mar 21 18:34:22 CEPH-client01test kernel: session4: session recovery timed 
> out after 25 secs
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: 
> hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 
> 00 00 23 fd 00 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev 
> sda, sector 2358528
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline 

  1   2   3   4   5   6   7   8   9   >