Re: [ceph-users] Kraken rgw lifeycle processing nightly crash

2017-07-24 Thread Ben Hines
Looks like wei found and fixed this in
https://github.com/ceph/ceph/pull/16495

Thanks Wei!

This has been causing crashes for us since May. Guess it shows that not
many folks use Kraken with lifecycles yet, but more certainly will with
Luminous.

-Ben

On Fri, Jul 21, 2017 at 7:19 AM, Daniel Gryniewicz  wrote:

> On 07/20/2017 04:48 PM, Ben Hines wrote:
>
>> Still having this RGWLC crash once a day or so. I do plan to update to
>> Luminous as soon as that is final, but it's possible this issue will still
>> occur, so i was hoping one of the devs could take a look at it.
>>
>> My original suspicion was that it happens when lifecycle processing at
>> the same time that the morning log rotation occurs, but i am not certain
>> about that, so perhaps the bug title should be updated to remove that
>> conclusion. (i can't edit it)
>>
>> http://tracker.ceph.com/issues/19956 - no activity for 2 months.
>>
>> Stack with symbols:
>>
>> #0 0x7f6a6cb1723b in raise () from /lib64/libpthread.so.0
>> #1  0x7f6a778b9e95 in
>> reraise_fatal (signum=11) at /usr/src/debug/ceph-11.2.0/src
>> /global/signal_handler.cc:72
>> #2  handle_fatal_signal (signum=11) at
>> /usr/src/debug/ceph-11.2.0/src/global/signal_handler.cc:134
>> #3  
>> #4  RGWGC::add_chain (this=this@entry=0x0,
>> op=..., chain=..., tag="default.68996150.61684839") at
>> /usr/src/debug/ceph-11.2.0/src/rgw/rgw_gc.cc:58
>> #5  0x7f6a77801e3f in
>> RGWGC::send_chain (this=0x0, chain=..., tag="default.68996150.61684839",
>> sync=sync@entry=false)
>>
>
> Here, this (the RGWGC, or store->gc) is NULL, so that's the problem.  I
> have no idea how the store isn't initialized, though.
>
> at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_gc.cc:64
>> #6  0x7f6a776c0a29 in
>> RGWRados::Object::complete_atomic_modification (this=0x7f69cc8578d0) at
>> /usr/src/debug/ceph-11.2.0/src/rgw/rgw_rados.cc:7870
>> #7  0x7f6a777102a0 in
>> RGWRados::Object::Delete::delete_obj (this=this@entry=0x7f69cc857840) at
>> /usr/src/debug/ceph-11.2.0/src/rgw/rgw_rados.cc:8295
>> #8  0x7f6a77710ce8 in
>> RGWRados::delete_obj (this=, obj_ctx=..., bucket_info=...,
>> obj=..., versioning_status=0, bilog_flags=,
>> expiration_time=...) at /usr/src/debug/ceph-11.2.0/src
>> /rgw/rgw_rados.cc:8330
>> #9  0x7f6a77607ced in
>> rgw_remove_object (store=0x7f6a810fe000, bucket_info=..., bucket=...,
>> key=...) at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_bucket.cc:519
>> #10  0x7f6a7780c971 in
>> RGWLC::bucket_lc_process (this=this@entry=0x7f6a81959c00,
>> shard_id=":globalcache307:default.42048218.11")
>> at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:283
>> #11  0x7f6a7780d928 in
>> RGWLC::process (this=this@entry=0x7f6a81959c00, index=,
>> max_lock_secs=max_lock_secs@entry=60)
>> at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:482
>> #12  0x7f6a7780ddc1 in
>> RGWLC::process (this=0x7f6a81959c00) at /usr/src/debug/ceph-11.2.0/src
>> /rgw/rgw_lc.cc:412
>> #13  0x7f6a7780e033 in
>> RGWLC::LCWorker::entry (this=0x7f6a81a820d0) at
>> /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:51
>> #14  0x7f6a6cb0fdc5 in
>> start_thread () from /lib64/libpthread.so.0
>> #15  0x7f6a6b37073d in clone ()
>> from /lib64/libc.so.6
>>
>>
> Daniel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous radosgw hangs after a few hours

2017-07-24 Thread Martin Emrich
I created an issue: http://tracker.ceph.com/issues/20763

Regards,

Martin

Von: Vasu Kulkarni 
Datum: Montag, 24. Juli 2017 um 19:26
An: Vaibhav Bhembre 
Cc: Martin Emrich , "ceph-users@lists.ceph.com" 

Betreff: Re: [ceph-users] Luminous radosgw hangs after a few hours

Please raise a tracker for rgw and also provide some additional journalctl logs 
and info(ceph version, os version etc): http://tracker.ceph.com/projects/rgw

On Mon, Jul 24, 2017 at 9:03 AM, Vaibhav Bhembre 
mailto:vaib...@digitalocean.com>> wrote:
I am seeing the same issue on upgrade to Luminous v12.1.0 from Jewel.
I am not using Keystone or OpenStack either and my radosgw daemon
hangs as well. I have to restart it to resume processing.

2017-07-24 00:23:33.057401 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 00:38:33.057524 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 00:53:33.057648 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:08:33.057749 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:23:33.057878 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:38:33.057964 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:53:33.058098 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 02:08:33.058225 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22

The following are my keystone config options:

"rgw_keystone_url": ""
"rgw_keystone_admin_token": ""
"rgw_keystone_admin_user": ""
"rgw_keystone_admin_password": ""
"rgw_keystone_admin_tenant": ""
"rgw_keystone_admin_project": ""
"rgw_keystone_admin_domain": ""
"rgw_keystone_barbican_user": ""
"rgw_keystone_barbican_password": ""
"rgw_keystone_barbican_tenant": ""
"rgw_keystone_barbican_project": ""
"rgw_keystone_barbican_domain": ""
"rgw_keystone_api_version": "2"
"rgw_keystone_accepted_roles": "Member
"rgw_keystone_accepted_admin_roles": ""
"rgw_keystone_token_cache_size": "1"
"rgw_keystone_revocation_interval": "900"
"rgw_keystone_verify_ssl": "true"
"rgw_keystone_implicit_tenants": "false"
"rgw_s3_auth_use_keystone": "false"

Is this fixed in RC2 by any chance?

On Thu, Jun 29, 2017 at 3:11 AM, Martin Emrich
mailto:martin.emr...@empolis.com>> wrote:
> Since upgrading to 12.1, our Object Gateways hang after a few hours, I only
> see these messages in the log file:
>
>
>
> 2017-06-29 07:52:20.877587 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:07:20.877761 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:07:29.994979 7fa8e11e7700  0 process_single_logshard: Error in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 08:22:20.877911 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:27:30.086119 7fa8e11e7700  0 process_single_logshard: Error in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 08:37:20.878108 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:37:30.187696 7fa8e11e7700  0 process_single_logshard: Error in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 08:52:20.878283 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:57:30.280881 7fa8e11e7700  0 process_single_logshard: Error in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 09:07:20.878451 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
>
>
> FYI: we do not use Keystone or Openstack.
>
>
>
> This started after upgrading from jewel (via kraken) to luminous.
>
>
>
> What could I do to fix this?
>
> Is there some “fsck” like consistency check + repair for the radosgw
> buckets?
>
>
>
> Thanks,
>
>
>
> Martin
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] oVirt/RHEV and Ceph

2017-07-24 Thread Brady Deetz
Thanks for pointing to some documentation. I'd seen that and it is
certainly an option. From my understanding, with a Cinder deployment, you'd
have the same failure domains and similar performance characteristics to an
oVirt + NFS + RBD deployment. This is acceptable. But, the dream I have in
my head is where the RBD images are mounted and controlled on each
hypervisor instead of a central storage authority like Cinder. Does that
exist for anything or is this a fundamentally flawed idea?

On Mon, Jul 24, 2017 at 9:41 PM, Jason Dillaman  wrote:

> oVirt 3.6 added Cinder/RBD integration [1] and it looks like they are
> currently working on integrating Cinder within a container to simplify
> the integration [2].
>
> [1] http://www.ovirt.org/develop/release-management/features/
> storage/cinder-integration/
> [2] http://www.ovirt.org/develop/release-management/features/
> cinderglance-docker-integration/
>
> On Mon, Jul 24, 2017 at 10:27 PM, Brady Deetz  wrote:
> > Funny enough, I just had a call with Redhat where the OpenStack engineer
> was
> > voicing his frustration that there wasn't any movement on RBD for oVirt.
> > This is important to me because I'm building out a user-facing private
> cloud
> > that just isn't going to be big enough to justify OpenStack and its
> > administrative overhead. But, I already have 1.75PB (soon to be 2PB) of
> > CephFS in production. So, it puts me in a really difficult design
> position.
> >
> > On Mon, Jul 24, 2017 at 9:09 PM, Dino Yancey  wrote:
> >>
> >> I was as much as told by Redhat in a sales call that they push Gluster
> >> for oVirt/RHEV and Ceph for OpenStack, and don't have any plans to
> >> change that in the short term. (note this was about a year ago, i
> >> think - so this isn't super current information).
> >>
> >> I seem to recall the hangup was that oVirt had no orchestration
> >> capability for RBD comparable to OpenStack, and that CephFS wasn't
> >> (yet?) viable for use as a "POSIX filesystem" oVirt storage domain.
> >> Personally, I feel like Redhat is worried about competing with
> >> themselves with GlusterFS versus CephFS and is choosing to focus on
> >> Gluster as a filesystem, and Ceph as everything minus the filesystem.
> >>
> >> Which is a shame, as I'm a fan of both Ceph and oVirt and would love
> >> to use my existing RHEV infrastructure to bring Ceph into my
> >> environment.
> >>
> >>
> >> On Mon, Jul 24, 2017 at 8:39 PM, Brady Deetz  wrote:
> >> > I haven't seen much talk about direct integration with oVirt.
> Obviously
> >> > it
> >> > kind of comes down to oVirt being interested in participating. But, is
> >> > the
> >> > only hold-up getting development time toward an integration or is
> there
> >> > some
> >> > kind of friction between the dev teams?
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >>
> >>
> >>
> >> --
> >> __
> >> Dino Yancey
> >> 2GNT.com Admin
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] oVirt/RHEV and Ceph

2017-07-24 Thread Jason Dillaman
oVirt 3.6 added Cinder/RBD integration [1] and it looks like they are
currently working on integrating Cinder within a container to simplify
the integration [2].

[1] 
http://www.ovirt.org/develop/release-management/features/storage/cinder-integration/
[2] 
http://www.ovirt.org/develop/release-management/features/cinderglance-docker-integration/

On Mon, Jul 24, 2017 at 10:27 PM, Brady Deetz  wrote:
> Funny enough, I just had a call with Redhat where the OpenStack engineer was
> voicing his frustration that there wasn't any movement on RBD for oVirt.
> This is important to me because I'm building out a user-facing private cloud
> that just isn't going to be big enough to justify OpenStack and its
> administrative overhead. But, I already have 1.75PB (soon to be 2PB) of
> CephFS in production. So, it puts me in a really difficult design position.
>
> On Mon, Jul 24, 2017 at 9:09 PM, Dino Yancey  wrote:
>>
>> I was as much as told by Redhat in a sales call that they push Gluster
>> for oVirt/RHEV and Ceph for OpenStack, and don't have any plans to
>> change that in the short term. (note this was about a year ago, i
>> think - so this isn't super current information).
>>
>> I seem to recall the hangup was that oVirt had no orchestration
>> capability for RBD comparable to OpenStack, and that CephFS wasn't
>> (yet?) viable for use as a "POSIX filesystem" oVirt storage domain.
>> Personally, I feel like Redhat is worried about competing with
>> themselves with GlusterFS versus CephFS and is choosing to focus on
>> Gluster as a filesystem, and Ceph as everything minus the filesystem.
>>
>> Which is a shame, as I'm a fan of both Ceph and oVirt and would love
>> to use my existing RHEV infrastructure to bring Ceph into my
>> environment.
>>
>>
>> On Mon, Jul 24, 2017 at 8:39 PM, Brady Deetz  wrote:
>> > I haven't seen much talk about direct integration with oVirt. Obviously
>> > it
>> > kind of comes down to oVirt being interested in participating. But, is
>> > the
>> > only hold-up getting development time toward an integration or is there
>> > some
>> > kind of friction between the dev teams?
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> __
>> Dino Yancey
>> 2GNT.com Admin
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] oVirt/RHEV and Ceph

2017-07-24 Thread Brady Deetz
Funny enough, I just had a call with Redhat where the OpenStack engineer
was voicing his frustration that there wasn't any movement on RBD for
oVirt. This is important to me because I'm building out a user-facing
private cloud that just isn't going to be big enough to justify OpenStack
and its administrative overhead. But, I already have 1.75PB (soon to be
2PB) of CephFS in production. So, it puts me in a really difficult design
position.

On Mon, Jul 24, 2017 at 9:09 PM, Dino Yancey  wrote:

> I was as much as told by Redhat in a sales call that they push Gluster
> for oVirt/RHEV and Ceph for OpenStack, and don't have any plans to
> change that in the short term. (note this was about a year ago, i
> think - so this isn't super current information).
>
> I seem to recall the hangup was that oVirt had no orchestration
> capability for RBD comparable to OpenStack, and that CephFS wasn't
> (yet?) viable for use as a "POSIX filesystem" oVirt storage domain.
> Personally, I feel like Redhat is worried about competing with
> themselves with GlusterFS versus CephFS and is choosing to focus on
> Gluster as a filesystem, and Ceph as everything minus the filesystem.
>
> Which is a shame, as I'm a fan of both Ceph and oVirt and would love
> to use my existing RHEV infrastructure to bring Ceph into my
> environment.
>
>
> On Mon, Jul 24, 2017 at 8:39 PM, Brady Deetz  wrote:
> > I haven't seen much talk about direct integration with oVirt. Obviously
> it
> > kind of comes down to oVirt being interested in participating. But, is
> the
> > only hold-up getting development time toward an integration or is there
> some
> > kind of friction between the dev teams?
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> __
> Dino Yancey
> 2GNT.com Admin
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Speeding up garbage collection in RGW

2017-07-24 Thread Z Will
I think if you want to delete through gc,
increase this
OPTION(rgw_gc_processor_max_time, OPT_INT, 3600)  // total run time
for a single gc processor work
decrease this
OPTION(rgw_gc_processor_period, OPT_INT, 3600)  // gc processor cycle time

Or , I think if there is some option to bypass the gc


On Tue, Jul 25, 2017 at 5:05 AM, Bryan Stillwell  wrote:
> Wouldn't doing it that way cause problems since references to the objects 
> wouldn't be getting removed from .rgw.buckets.index?
>
> Bryan
>
> From: Roger Brown 
> Date: Monday, July 24, 2017 at 2:43 PM
> To: Bryan Stillwell , "ceph-users@lists.ceph.com" 
> 
> Subject: Re: [ceph-users] Speeding up garbage collection in RGW
>
> I hope someone else can answer your question better, but in my case I found 
> something like this helpful to delete objects faster than I could through the 
> gateway:
>
> rados -p default.rgw.buckets.data ls | grep 'replace this with pattern 
> matching files you want to delete' | xargs -d '\n' -n 200 rados -p 
> default.rgw.buckets.data rm
>
>
> On Mon, Jul 24, 2017 at 2:02 PM Bryan Stillwell  
> wrote:
> I'm in the process of cleaning up a test that an internal customer did on our 
> production cluster that produced over a billion objects spread across 6000 
> buckets.  So far I've been removing the buckets like this:
>
> printf %s\\n bucket{1..6000} | xargs -I{} -n 1 -P 32 radosgw-admin bucket rm 
> --bucket={} --purge-objects
>
> However, the disk usage doesn't seem to be getting reduced at the same rate 
> the objects are being removed.  From what I can tell a large number of the 
> objects are waiting for garbage collection.
>
> When I first read the docs it sounded like the garbage collector would only 
> remove 32 objects every hour, but after looking through the logs I'm seeing 
> about 55,000 objects removed every hour.  That's about 1.3 million a day, so 
> at this rate it'll take a couple years to clean up the rest!  For comparison, 
> the purge-objects command above is removing (but not GC'ing) about 30 million 
> objects a day, so a much more manageable 33 days to finish.
>
> I've done some digging and it appears like I should be changing these 
> configuration options:
>
> rgw gc max objs (default: 32)
> rgw gc obj min wait (default: 7200)
> rgw gc processor max time (default: 3600)
> rgw gc processor period (default: 3600)
>
> A few questions I have though are:
>
> Should 'rgw gc processor max time' and 'rgw gc processor period' always be 
> set to the same value?
>
> Which would be better, increasing 'rgw gc max objs' to something like 1024, 
> or reducing the 'rgw gc processor' times to something like 60 seconds?
>
> Any other guidance on the best way to adjust these values?
>
> Thanks,
> Bryan
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] oVirt/RHEV and Ceph

2017-07-24 Thread Dino Yancey
I was as much as told by Redhat in a sales call that they push Gluster
for oVirt/RHEV and Ceph for OpenStack, and don't have any plans to
change that in the short term. (note this was about a year ago, i
think - so this isn't super current information).

I seem to recall the hangup was that oVirt had no orchestration
capability for RBD comparable to OpenStack, and that CephFS wasn't
(yet?) viable for use as a "POSIX filesystem" oVirt storage domain.
Personally, I feel like Redhat is worried about competing with
themselves with GlusterFS versus CephFS and is choosing to focus on
Gluster as a filesystem, and Ceph as everything minus the filesystem.

Which is a shame, as I'm a fan of both Ceph and oVirt and would love
to use my existing RHEV infrastructure to bring Ceph into my
environment.


On Mon, Jul 24, 2017 at 8:39 PM, Brady Deetz  wrote:
> I haven't seen much talk about direct integration with oVirt. Obviously it
> kind of comes down to oVirt being interested in participating. But, is the
> only hold-up getting development time toward an integration or is there some
> kind of friction between the dev teams?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
__
Dino Yancey
2GNT.com Admin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] oVirt/RHEV and Ceph

2017-07-24 Thread Brady Deetz
I haven't seen much talk about direct integration with oVirt. Obviously it
kind of comes down to oVirt being interested in participating. But, is the
only hold-up getting development time toward an integration or is there
some kind of friction between the dev teams?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Exclusive-lock Ceph

2017-07-24 Thread Jason Dillaman
On Mon, Jul 24, 2017 at 2:15 PM,   wrote:
> 2 questions,
>
>
>
> 1 In the moment i use kernel 4.10, the exclusive-lock not works fine in
> kernel's version less than < 4.12, right ?

Exclusive lock should work just fine under 4.10 -- but you are trying
to use the new "exclusive" map option that is only available starting
with kernel 4.12.

> 2 The comand  with exclusive
> would this  ?
>
> rbd map --exclusive test-xlock3

Yes, that should be it.

> Thanks a Lot,
>
> Marcelo
>
>
> Em 24/07/2017, Jason Dillaman  escreveu:
>> You will need to pass the "exclusive" option when running "rbd map"
>> (and be running kernel >= 4.12).
>>
>> On Mon, Jul 24, 2017 at 8:42 AM,   wrote:
>> > I'm testing ceph in my enviroment, but the feature exclusive lock don't
>> > works fine for me or maybe i'm doing something wrong.
>> >
>> > I testing in two machines create one image with exclusive-lock enable,
>> > if I
>> > understood correctly, with this feature, one machine only can mount and
>> > write in image at time.
>> >
>> > But When I'm testing, i saw the lock always is move to machine that try
>> > mount the volume lastly
>> >
>> > Example if i try mount the image in machine1 i see ip the machine1 and i
>> > mount the volume in machine1 :
>> > #rbd lock list test-xlock3
>> > There is 1 exclusive lock on this image.
>> > Locker  IDAddress
>> > client.4390 auto  192.168.0.1:0/2940167630
>> >
>> > But if now i running rbd map and try mount image in machine2, the lock
>> > is
>> > change to machine2, and i believe this is one error, because if lock
>> > already
>> > in machine one and i write in image, the machine2 don't should can mount
>> > the
>> > same image in the same time.
>> > If i running in machine2 now, i see :
>> >
>> > #rbd lock list test-xlock3
>> > There is 1 exclusive lock on this image.
>> > Locker  IDAddress
>> > client.4491 auto XX 192.168.0.2:0/1260424031
>> >
>> >
>> >
>> > Exclusive-lock enable in my image :
>> >
>> > rbd info  test-xlock3 | grep features
>> > features: exclusive-lock
>> >
>> >
>> > i'm doing some wrong ? Existing some conf, to add in ceph.conf, to fix
>> > this,
>> > if one machine mount the volume, the machine2 don't can in the same
>> > time, i
>> > read about command rbd
>> > lock, but this command seem deprecated.
>> >
>> >
>> >
>> > Thanks, a lot.
>> > Marcelo
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Jason



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to map rbd using rbd-nbd on boot?

2017-07-24 Thread Jason Dillaman
Your google-fu hasn't failed -- that is a missing feature. I've opened
a new feature-request tracker ticket to get support for that.

[1] http://tracker.ceph.com/issues/20762

On Fri, Jul 21, 2017 at 5:04 PM, Daniel K  wrote:
> Once again my google-fu has failed me and I can't find the 'correct' way to
> map an rbd using rbd-nbd on boot. Everything takes me to rbdmap, which isn't
> using rbd-nbd.
>
> If someone could just point me in the right direction I'd appreciated it.
>
>
> Thanks!
>
> Dan
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mount CephFS with dedicated user fails: mount error 13 = Permission denied

2017-07-24 Thread Deepak Naidu
For permanent fix, you need to fix this using  patched kernel or upgrade to 4.9 
kernel or higher(which has the patch fix) http://tracker.ceph.com/issues/17191

Using [mds] allow r gives users “read” permission to “/” share ie any 
directory/files under “/” , Example “/dir1”,”dir2” or “/MTY” can be read using 
the KEY and USER(client.mtyadm). If this is not concern to you, then I guess 
you are fine, else consider upgrading the kernel or get your current kernel 
patched for this cephFS kernel client fix.

caps: [mds] allow r,allow rw path=/MTY

--
Deepak

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
c.mo...@web.de
Sent: Monday, July 24, 2017 7:00 AM
To: Дмитрий Глушенок
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Mount CephFS with dedicated user fails: mount error 
13 = Permission denied

THX.
Mount is working now.

The auth list for user mtyadm is now:
client.mtyadm
key: AQAlyXVZEfsYNRAAM4jHuV1Br7lpRx1qaINO+A==
caps: [mds] allow r,allow rw path=/MTY
caps: [mon] allow r
caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata




24. Juli 2017 13:25, "Дмитрий Глушенок" 
mailto:%22%d0%94%d0%bc%d0%b8%d1%82%d1%80%d0%b8%d0%b9%20%d0%93%d0%bb%d1%83%d1%88%d0%b5%d0%bd%d0%be%d0%ba%22%20%3cgl...@jet.msk.su%3e>>
 schrieb:
Check your kernel version, prior to 4.9 it was needed to allow read on root 
path: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014804.html
24 июля 2017 г., в 12:36, c.mo...@web.de написал(а):
Hello!

I want to mount CephFS with a dedicated user in order to avoid putting the 
admin key on every client host.
Therefore I created a user account
ceph auth get-or-create client.mtyadm mon 'allow r' mds 'allow rw path=/MTY' 
osd 'allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata' -o 
/etc/ceph/ceph.client.mtyadm.keyring
and wrote out the keyring
ceph-authtool -p -n client.mtyadm ceph.client.mtyadm.keyring > 
ceph.client.mtyadm.key

This user is now displayed in auth list:
client.mtyadm
key: AQBYu3VZLg66LBAAGM1jW+cvNE6BoJWfsORZKA==
caps: [mds] allow rw path=/MTY
caps: [mon] allow r
caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata

When I try to mount directory /MTY on the client host I get this error:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=mtyadm,secretfile=/etc/ceph/ceph.client.mtyadm.key
mount error 13 = Permission denied

The mount works using admin though:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=admin,secretfile=/etc/ceph/ceph.client.admin.key
ld2398:/etc/ceph # mount | grep cephfs
10.96.5.37,10.96.5.38,10.96.5.38:/MTY on /mnt/cephfs type ceph 
(rw,relatime,name=admin,secret=,acl)

What is causing this mount error?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Dmitry Glushenok
Jet Infosystems


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Speeding up garbage collection in RGW

2017-07-24 Thread Bryan Stillwell
Wouldn't doing it that way cause problems since references to the objects 
wouldn't be getting removed from .rgw.buckets.index?

Bryan

From: Roger Brown 
Date: Monday, July 24, 2017 at 2:43 PM
To: Bryan Stillwell , "ceph-users@lists.ceph.com" 

Subject: Re: [ceph-users] Speeding up garbage collection in RGW

I hope someone else can answer your question better, but in my case I found 
something like this helpful to delete objects faster than I could through the 
gateway: 

rados -p default.rgw.buckets.data ls | grep 'replace this with pattern matching 
files you want to delete' | xargs -d '\n' -n 200 rados -p 
default.rgw.buckets.data rm


On Mon, Jul 24, 2017 at 2:02 PM Bryan Stillwell  wrote:
I'm in the process of cleaning up a test that an internal customer did on our 
production cluster that produced over a billion objects spread across 6000 
buckets.  So far I've been removing the buckets like this:

printf %s\\n bucket{1..6000} | xargs -I{} -n 1 -P 32 radosgw-admin bucket rm 
--bucket={} --purge-objects

However, the disk usage doesn't seem to be getting reduced at the same rate the 
objects are being removed.  From what I can tell a large number of the objects 
are waiting for garbage collection.

When I first read the docs it sounded like the garbage collector would only 
remove 32 objects every hour, but after looking through the logs I'm seeing 
about 55,000 objects removed every hour.  That's about 1.3 million a day, so at 
this rate it'll take a couple years to clean up the rest!  For comparison, the 
purge-objects command above is removing (but not GC'ing) about 30 million 
objects a day, so a much more manageable 33 days to finish.

I've done some digging and it appears like I should be changing these 
configuration options:

rgw gc max objs (default: 32)
rgw gc obj min wait (default: 7200)
rgw gc processor max time (default: 3600)
rgw gc processor period (default: 3600)

A few questions I have though are:

Should 'rgw gc processor max time' and 'rgw gc processor period' always be set 
to the same value?

Which would be better, increasing 'rgw gc max objs' to something like 1024, or 
reducing the 'rgw gc processor' times to something like 60 seconds?

Any other guidance on the best way to adjust these values?

Thanks,
Bryan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Random CephFS freeze, osd bad authorize reply

2017-07-24 Thread topro

Hi Ilya, hi Gregory,

 

all hosts/clients run proper NTP. Still, it could be that if hwclock of those machines has significant drift, so after client boot-up in the morning time is quite far off until NTP gets clock resynced. Maybe that offset drift of NTP resync is causing the issue. I'll have a look into those machines log files to see if they might have a clock skew after boot-up.

 

How much of an offset would be enough to trigger such issues so that CephFS freezes indefinitely (to make that clear, it doesn't freeze for a couple of seconds, it freezes indefenitely or for hours at least).

 

>The ceph messenger equivalent for this error is "failed verifying
>authorize reply". If you search for that, most of the reports are
>indeed clock skews.
 

Ilya, where am I supposed to find the "ceph messenger equivalent" which shows me what kind of error causes my auth issues, i.e. prove that its clock skew related? Couldn't find anything useful in the OSDs logs.

 

Anything else I could do to find the root cause of this?

 

Thanks,

Tobi

 

 

Gesendet: Montag, 24. Juli 2017 um 19:31 Uhr
Von: "Ilya Dryomov" 
An: to...@gmx.de
Cc: ceph-users 
Betreff: Re: [ceph-users] Random CephFS freeze, osd bad authorize reply

On Mon, Jul 24, 2017 at 6:35 PM,  wrote:
> Hi,
>
> I'm running a Ceph cluster which I started back in bobtail age and kept it
> running/upgrading over the years. It has three nodes, each running one MON,
> 10 OSDs and one MDS. The cluster has one MDS active and two standby.
> Machines are 8-core Opterons with 32GB of ECC RAM each. I'm using it to host
> our clients (about 25) /home using CephFS and as a RBD Backend for a couple
> of libvirt VMs (about 5).
>
> Currently I'm running 11.2.0 (kraken) and a couple of month ago I started
> experiencing some strange behaviour. Exactly 2 of my ~25 CephFS Clients
> (always the same two) keep freezing their /home about 1 or two hours after
> first boot in the morning. At the moment of freeze, syslog starts reporting
> loads of:
>
> _hostname_ kernel: libceph: osdXX 172.16.0.XXX:68XX bad authorize reply
>
> On one of the clients I replaced every single piece of hardware with new
> hardware, so that machine is completely replaced now including NIC, Switch,
> Network-Cabling and did a complete OS reinstall. But the user is still
> getting that behaviour. As far as I could get, it seems that key
> renegotiation is failing and client tries to keep connecting with old cephx
> key. But I cannot find a reason for why this is happening and how to fix it.
>
> Biggest problem, the second affected machine is the one of our CEO and if we
> won't fix it I will have a hard time explaining that Ceph is the way to go.
>
> The two affected machines do not share any common piece of network segment
> other than TOR-Switch in Ceph Rack, while there are other clients that do
> share network segment with affected machines but arent affected at all.
>
> Google won't help me either on this one, seems no one else is experiencing
> something similar.
>
> Client setup on all clients is Debian Jessie with 4.9 Backports kernel,
> using kernel client for mounting CephFS. I think the whole thing started
> with a kernel upgrade from one 4.X series to another, but cannout
> reconstruct.

This check was merged into 4.10 and backported to various stable
series, including 4.9 (4.9.2, I think). That explains why you started
seeing it.

The ceph messenger equivalent for this error is "failed verifying
authorize reply". If you search for that, most of the reports are
indeed clock skews.

Thanks,

Ilya




 

 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Speeding up garbage collection in RGW

2017-07-24 Thread Roger Brown
I hope someone else can answer your question better, but in my case I found
something like this helpful to delete objects faster than I could through
the gateway:

rados -p default.rgw.buckets.data ls | grep 'replace this with pattern
matching files you want to delete' | xargs -d '\n' -n 200 rados -p
default.rgw.buckets.data rm


On Mon, Jul 24, 2017 at 2:02 PM Bryan Stillwell 
wrote:

> I'm in the process of cleaning up a test that an internal customer did on
> our production cluster that produced over a billion objects spread across
> 6000 buckets.  So far I've been removing the buckets like this:
>
> printf %s\\n bucket{1..6000} | xargs -I{} -n 1 -P 32 radosgw-admin bucket
> rm --bucket={} --purge-objects
>
> However, the disk usage doesn't seem to be getting reduced at the same
> rate the objects are being removed.  From what I can tell a large number of
> the objects are waiting for garbage collection.
>
> When I first read the docs it sounded like the garbage collector would
> only remove 32 objects every hour, but after looking through the logs I'm
> seeing about 55,000 objects removed every hour.  That's about 1.3 million a
> day, so at this rate it'll take a couple years to clean up the rest!  For
> comparison, the purge-objects command above is removing (but not GC'ing)
> about 30 million objects a day, so a much more manageable 33 days to finish.
>
> I've done some digging and it appears like I should be changing these
> configuration options:
>
> rgw gc max objs (default: 32)
> rgw gc obj min wait (default: 7200)
> rgw gc processor max time (default: 3600)
> rgw gc processor period (default: 3600)
>
> A few questions I have though are:
>
> Should 'rgw gc processor max time' and 'rgw gc processor period' always be
> set to the same value?
>
> Which would be better, increasing 'rgw gc max objs' to something like
> 1024, or reducing the 'rgw gc processor' times to something like 60 seconds?
>
> Any other guidance on the best way to adjust these values?
>
> Thanks,
> Bryan
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Speeding up garbage collection in RGW

2017-07-24 Thread Bryan Stillwell
I'm in the process of cleaning up a test that an internal customer did on our 
production cluster that produced over a billion objects spread across 6000 
buckets.  So far I've been removing the buckets like this:

printf %s\\n bucket{1..6000} | xargs -I{} -n 1 -P 32 radosgw-admin bucket rm 
--bucket={} --purge-objects

However, the disk usage doesn't seem to be getting reduced at the same rate the 
objects are being removed.  From what I can tell a large number of the objects 
are waiting for garbage collection.

When I first read the docs it sounded like the garbage collector would only 
remove 32 objects every hour, but after looking through the logs I'm seeing 
about 55,000 objects removed every hour.  That's about 1.3 million a day, so at 
this rate it'll take a couple years to clean up the rest!  For comparison, the 
purge-objects command above is removing (but not GC'ing) about 30 million 
objects a day, so a much more manageable 33 days to finish.

I've done some digging and it appears like I should be changing these 
configuration options:

rgw gc max objs (default: 32)
rgw gc obj min wait (default: 7200)
rgw gc processor max time (default: 3600)
rgw gc processor period (default: 3600)

A few questions I have though are:

Should 'rgw gc processor max time' and 'rgw gc processor period' always be set 
to the same value?

Which would be better, increasing 'rgw gc max objs' to something like 1024, or 
reducing the 'rgw gc processor' times to something like 60 seconds?

Any other guidance on the best way to adjust these values?

Thanks,
Bryan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph and Fscache : can you kindly share your experiences?

2017-07-24 Thread Anish Gupta
Hello,

Can you kindly share their experience with the  bulit-in FSCache support with 
ceph?
Interested in knowing the following:- Are you using FSCache in production 
environment?- How large is your Ceph deployment?- If with CephFS, how many Ceph 
clients are using FSCache- which version of Ceph and Linux kernel 

thank you.Anish


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Exclusive-lock Ceph

2017-07-24 Thread lista

2 questions,
 
1 In the moment i use kernel 4.10, the exclusive-lock not works fine in kernel's version 
less than < 4.12, right ? 2 The comand  with exclusive would this  
?
rbd map --exclusive test-xlock3

Thanks a Lot,
Marcelo

Em 24/07/2017, Jason Dillaman  escreveu:
> You will need to pass the "exclusive" option when running "rbd map" 
> (and be running kernel >= 4.12). 
> 
> On Mon, Jul 24, 2017 at 8:42 AM,   wrote: 
> > I'm testing ceph in my enviroment, but the feature exclusive lock don't 
> > works fine for me or maybe i'm doing something wrong. 
> > 
> > I testing in two machines create one image with exclusive-lock enable, if I 
> > understood correctly, with this feature, one machine only can mount and 
> > write in image at time. 
> > 
> > But When I'm testing, i saw the lock always is move to machine that try 
> > mount the volume lastly 
> > 
> > Example if i try mount the image in machine1 i see ip the machine1 and i 
> > mount the volume in machine1 : 
> > #rbd lock list test-xlock3 
> > There is 1 exclusive lock on this image. 
> > Locker      ID                        Address 
> > client.4390 auto  192.168.0.1:0/2940167630 
> > 
> > But if now i running rbd map and try mount image in machine2, the lock is 
> > change to machine2, and i believe this is one error, because if lock already 
> > in machine one and i write in image, the machine2 don't should can mount the 
> > same image in the same time. 
> > If i running in machine2 now, i see : 
> > 
> > #rbd lock list test-xlock3 
> > There is 1 exclusive lock on this image. 
> > Locker      ID                        Address 
> > client.4491 auto XX 192.168.0.2:0/1260424031 
> > 
> > 
> > 
> > Exclusive-lock enable in my image : 
> > 
> > rbd info  test-xlock3 | grep features 
> > features: exclusive-lock 
> > 
> > 
> > i'm doing some wrong ? Existing some conf, to add in ceph.conf, to fix this, 
> > if one machine mount the volume, the machine2 don't can in the same time, i 
> > read about command rbd 
> > lock, but this command seem deprecated. 
> > 
> > 
> > 
> > Thanks, a lot. 
> > Marcelo 
> > 
> > 
> > ___ 
> > ceph-users mailing list 
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > 
> 
> 
> 
> -- 
> Jason___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mounting pool, but where are the files?

2017-07-24 Thread David Turner
You might be able to read these objects using s3fs if you're using a
RadosGW.  But like John mentioned, you cannot write them as objects into
the pool and read them as files from the filesystem.

On Mon, Jul 24, 2017, 12:07 PM John Spray  wrote:

> On Mon, Jul 24, 2017 at 4:52 PM,   wrote:
> > Hello!
> >
> > I created CephFS according to documentation:
> > $ ceph osd pool create hdb-backup 
> > $ ceph osd pool create hdb-backup_metadata 
> > $ ceph fs new   
> >
> > I can mount this pool with user admin:
> > ld4257:/etc/ceph # mount -t ceph 10.96.5.37,10.96.5.38,10.96.5.38:/
> /mnt/cephfs -o name=admin,secretfile=/etc/ceph/ceph.client.admin.key
>
> Need to untangle the terminology a bit.
>
> What you're mounting is a filesystem, the filesystem is storing it's
> data in pools.  Pools are a lower-level concept than filesystems.
>
> > ld4257:/etc/ceph # mount | grep ceph
> > 10.96.5.37,10.96.5.38,10.96.5.38:/ on /mnt/cephfs type ceph
> (rw,relatime,name=admin,secret=,acl)
> >
> > To verify which pool is mounted, I checked this:
> > ld4257:/etc/ceph # ceph osd lspools
> > 0 rbd,1 templates,3 hdb-backup,4 hdb-backup_metadata,
> >
> > ld4257:/etc/ceph # cephfs /mnt/cephfs/ show_layout
> > WARNING: This tool is deprecated.  Use the layout.* xattrs to query and
> modify layouts.
> > layout.data_pool: 3
> > layout.object_size:   4194304
> > layout.stripe_unit:   4194304
> > layout.stripe_count:  1
> >
> > So, I guess the correct pool "hdb-backup" is now mounted to /mnt/cephfs.
> >
> > Then I pushed some files in this pool.
>
> I think you mean that you put some objects into your pool.  So at this
> stage you have not created any files, cephfs doesn't know anything
> about these objects.  You would need to really create files (i.e.
> write to your mount) to have files that exist in cephfs.
>
> > I can display the relevant objects now:
> > ld4257:/etc/ceph # rados -p hdb-backup ls
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:7269
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:6357
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:772
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:14039
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1803
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:5549
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:15797
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:20624
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:7322
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:5208
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:17479
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:14361
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:16963
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:4694
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1391
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1199
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:11359
> > MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:11995
> > [...]
> >
> > (This is just an extract, there are many more object.)
> >
> > Now, the question is:
> > Can I display these files with CephFS?
>
> Unfortunately not -- you would need to write your data in as files
> (via a cephfs mount) to read it back as files.
>
> John
>
> >
> > When I check the content of /mnt/cephfs, there's only one directory
> "MTY" that I have created; this directory is not related to the output of
> rados at all:
> > ld4257:/etc/ceph # ll /mnt/cephfs/
> > total 0
> > drwxr-xr-x 1 root root 0 Jul 24 15:57 MTY
> >
> > THX
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what is the correct way to update ceph.conf on a running cluster

2017-07-24 Thread Roger Brown
The method I have used is to 1) edit ceph.conf, 2) use ceph-deploy config
push, 3) restart monitors

Example:
roger@desktop:~/ceph-cluster$ vi ceph.conf# make ceph.conf change
roger@desktop:~/ceph-cluster$ ceph-deploy --overwrite-conf config push
nuc{1..3}
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/roger/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.38): /usr/bin/ceph-deploy
--overwrite-conf config push nuc1 nuc2 nuc3
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: True
[ceph_deploy.cli][INFO  ]  subcommand: push
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   :

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  client: ['nuc1', 'nuc2',
'nuc3']
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.config][DEBUG ] Pushing config to nuc1
[nuc1][DEBUG ] connection detected need for sudo
[nuc1][DEBUG ] connected to host: nuc1
[nuc1][DEBUG ] detect platform information from remote host
[nuc1][DEBUG ] detect machine type
[nuc1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to nuc2
[nuc2][DEBUG ] connection detected need for sudo
[nuc2][DEBUG ] connected to host: nuc2
[nuc2][DEBUG ] detect platform information from remote host
[nuc2][DEBUG ] detect machine type
[nuc2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to nuc3
[nuc3][DEBUG ] connection detected need for sudo
[nuc3][DEBUG ] connected to host: nuc3
[nuc3][DEBUG ] detect platform information from remote host
[nuc3][DEBUG ] detect machine type
[nuc3][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
roger@desktop:~/ceph-cluster$ ssh nuc1
roger@nuc1:~$ sudo systemctl restart ceph-mon.target
roger@nuc1:~$
...etc.


On Mon, Jul 24, 2017 at 11:34 AM moftah moftah  wrote:

> Hi
>
> I am having hard time finding documentation on what is the correct way to
> upgrade ceph.conf in running cluster.
>
> The change i want to introduce is this
> osd crush update on start = false
>
> i tried to do it through the tell utility like this
> ceph tell osd.82 injectargs --no-osd-crush-update-on-start
>
> the answer was
> osd_crush_update_on_start = 'false' (unchangeable)
>
> Now it seems i need to reboot somthing to get this new config alive
> I find it strange that i have to reboot all OSDs processes int he system
> just to update the config
>
> is there a procedure for this
> and can i just reboot the mon process on the mon nodes ?
>
> Thanks
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what is the correct way to update ceph.conf on a running cluster

2017-07-24 Thread Gregory Farnum
On Mon, Jul 24, 2017 at 10:33 AM, moftah moftah  wrote:
> Hi
>
> I am having hard time finding documentation on what is the correct way to
> upgrade ceph.conf in running cluster.
>
> The change i want to introduce is this
> osd crush update on start = false
>
> i tried to do it through the tell utility like this
> ceph tell osd.82 injectargs --no-osd-crush-update-on-start
>
> the answer was
> osd_crush_update_on_start = 'false' (unchangeable)
>
> Now it seems i need to reboot somthing to get this new config alive
> I find it strange that i have to reboot all OSDs processes int he system
> just to update the config
>
> is there a procedure for this
> and can i just reboot the mon process on the mon nodes ?

The "osd crush update on start" config option is only ever used when
the OSDs boot; you'll need to update it on the ceph.conf for every
OSD. There's no need to do anything like rebooting though because the
live value doesn't matter.

And as a note, the "unchangeable" warning on config is unfortunately
not entirely reliable. The reasons are somewhat baroque, but if you
see it that doesn't always mean the command is ineffective. :/
-Greg

>
> Thanks
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] what is the correct way to update ceph.conf on a running cluster

2017-07-24 Thread moftah moftah
Hi

I am having hard time finding documentation on what is the correct way to
upgrade ceph.conf in running cluster.

The change i want to introduce is this
osd crush update on start = false

i tried to do it through the tell utility like this
ceph tell osd.82 injectargs --no-osd-crush-update-on-start

the answer was
osd_crush_update_on_start = 'false' (unchangeable)

Now it seems i need to reboot somthing to get this new config alive
I find it strange that i have to reboot all OSDs processes int he system
just to update the config

is there a procedure for this
and can i just reboot the mon process on the mon nodes ?

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can't start bluestore OSDs after sucessfully moving them 12.1.1 ** ERROR: osd init failed: (2) No such file or directory

2017-07-24 Thread Daniel K
List --

I have a 4-node cluster running on baremetal and have a need to use the
kernel client on 2 nodes. As I read you should not run the kernel client on
a node that runs an OSD daemon, I decided to move the OSD daemons into a VM
on the same device.

Orignal host is stor-vm2(bare metal), new host is stor-vm2a(Virtual)

All went well -- I did these steps(for each OSD, 5 total per host)

- setup the VM
- install the OS
- installed ceph(using ceph-deploy)
- set noout
- stopped ceph osd on bare metal host
- unmount /dev/sdb1 from /var/lib/ceph/osd/ceph-0
- add /dev/sdb to the VM
- ceph detected the osd and started automatically.
- moved VM host to the same bucket as physical host in crushmap

I did this for each OSD, and despite some recovery IO because of the
updated crushmap, all OSDs were up.

I rebooted the physical host, which rebooted the VM, and now the OSDs are
refusing to start.

I've tried moving them back to the bare metal host with the same results.

Any ideas?

Here are what seem to be the relevant osd log lines:

2017-07-24 13:21:53.561265 7faf1752fc80  0 osd.10 8854 crush map has
features 2200130813952, adjusting msgr requires for clients
2017-07-24 13:21:53.561284 7faf1752fc80  0 osd.10 8854 crush map has
features 2200130813952 was 8705, adjusting msgr requires for mons
2017-07-24 13:21:53.561298 7faf1752fc80  0 osd.10 8854 crush map has
features 720578140510109696, adjusting msgr requires for osds
2017-07-24 13:21:55.626834 7faf1752fc80  0 osd.10 8854 load_pgs
2017-07-24 13:22:20.970222 7faf1752fc80  0 osd.10 8854 load_pgs opened 536
pgs
2017-07-24 13:22:20.972659 7faf1752fc80  0 osd.10 8854 using
weightedpriority op queue with priority op cut off at 64.
2017-07-24 13:22:20.976861 7faf1752fc80 -1 osd.10 8854 log_to_monitors
{default=true}
2017-07-24 13:22:20.998233 7faf1752fc80 -1 osd.10 8854
mon_cmd_maybe_osd_create fail: '(2) No such file or directory': (2) No such
file or directory
2017-07-24 13:22:20.999165 7faf1752fc80  1
bluestore(/var/lib/ceph/osd/ceph-10) umount
2017-07-24 13:22:21.016146 7faf1752fc80  1 freelist shutdown
2017-07-24 13:22:21.016243 7faf1752fc80  4 rocksdb:
[/build/ceph-12.1.1/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling all
background work
2017-07-24 13:22:21.020440 7faf1752fc80  4 rocksdb:
[/build/ceph-12.1.1/src/rocksdb/db/db_impl.cc:343] Shutdown complete
2017-07-24 13:22:21.274481 7faf1752fc80  1 bluefs umount
2017-07-24 13:22:21.275822 7faf1752fc80  1 bdev(0x558bb1f82d80
/var/lib/ceph/osd/ceph-10/block) close
2017-07-24 13:22:21.485226 7faf1752fc80  1 bdev(0x558bb1f82b40
/var/lib/ceph/osd/ceph-10/block) close
2017-07-24 13:22:21.551009 7faf1752fc80 -1  ** ERROR: osd init failed: (2)
No such file or directory
2017-07-24 13:22:21.563567 7faf1752fc80 -1
/build/ceph-12.1.1/src/common/HeartbeatMap.cc: In function
'ceph::HeartbeatMap::~HeartbeatMap()' thread 7faf1752fc80 time 2017-07-24
13:22:21.558275
/build/ceph-12.1.1/src/common/HeartbeatMap.cc: 39: FAILED
assert(m_workers.empty())

 ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous
(rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x558ba6ba6b72]
 2: (()+0xb81cf1) [0x558ba6cc0cf1]
 3: (CephContext::~CephContext()+0x4d9) [0x558ba6ca77b9]
 4: (CephContext::put()+0xe6) [0x558ba6ca7ab6]
 5: (main()+0x563) [0x558ba650df73]
 6: (__libc_start_main()+0xf0) [0x7faf14999830]
 7: (_start()+0x29) [0x558ba6597cf9]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- begin dump of recent events ---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Random CephFS freeze, osd bad authorize reply

2017-07-24 Thread Ilya Dryomov
On Mon, Jul 24, 2017 at 6:35 PM,   wrote:
> Hi,
>
> I'm running a Ceph cluster which I started back in bobtail age and kept it
> running/upgrading over the years. It has three nodes, each running one MON,
> 10 OSDs and one MDS. The cluster has one MDS active and two standby.
> Machines are 8-core Opterons with 32GB of ECC RAM each. I'm using it to host
> our clients (about 25) /home using CephFS and as a RBD Backend for a couple
> of libvirt VMs (about 5).
>
> Currently I'm running 11.2.0 (kraken) and a couple of month ago I started
> experiencing some strange behaviour. Exactly 2 of my ~25 CephFS Clients
> (always the same two) keep freezing their /home about 1 or two hours after
> first boot in the morning. At the moment of freeze, syslog starts reporting
> loads of:
>
> _hostname_ kernel: libceph: osdXX 172.16.0.XXX:68XX bad authorize reply
>
> On one of the clients I replaced every single piece of hardware with new
> hardware, so that machine is completely replaced now including NIC, Switch,
> Network-Cabling and did a complete OS reinstall. But the user is still
> getting that behaviour. As far as I could get, it seems that key
> renegotiation is failing and client tries to keep connecting with old cephx
> key. But I cannot find a reason for why this is happening and how to fix it.
>
> Biggest problem, the second affected machine is the one of our CEO and if we
> won't fix it I will have a hard time explaining that Ceph is the way to go.
>
> The two affected machines do not share any common piece of network segment
> other than TOR-Switch in Ceph Rack, while there are other clients that do
> share network segment with affected machines but arent affected at all.
>
> Google won't help me either on this one, seems no one else is experiencing
> something similar.
>
> Client setup on all clients is Debian Jessie with 4.9 Backports kernel,
> using kernel client for mounting CephFS. I think the whole thing started
> with a kernel upgrade from one 4.X series to another, but cannout
> reconstruct.

This check was merged into 4.10 and backported to various stable
series, including 4.9 (4.9.2, I think).  That explains why you started
seeing it.

The ceph messenger equivalent for this error is "failed verifying
authorize reply".  If you search for that, most of the reports are
indeed clock skews.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous radosgw hangs after a few hours

2017-07-24 Thread Vasu Kulkarni
Please raise a tracker for rgw and also provide some additional journalctl
logs and info(ceph version, os version etc):
http://tracker.ceph.com/projects/rgw

On Mon, Jul 24, 2017 at 9:03 AM, Vaibhav Bhembre 
wrote:

> I am seeing the same issue on upgrade to Luminous v12.1.0 from Jewel.
> I am not using Keystone or OpenStack either and my radosgw daemon
> hangs as well. I have to restart it to resume processing.
>
> 2017-07-24 00:23:33.057401 7f196096a700  0 ERROR: keystone revocation
> processing returned error r=-22
> 2017-07-24 00:38:33.057524 7f196096a700  0 ERROR: keystone revocation
> processing returned error r=-22
> 2017-07-24 00:53:33.057648 7f196096a700  0 ERROR: keystone revocation
> processing returned error r=-22
> 2017-07-24 01:08:33.057749 7f196096a700  0 ERROR: keystone revocation
> processing returned error r=-22
> 2017-07-24 01:23:33.057878 7f196096a700  0 ERROR: keystone revocation
> processing returned error r=-22
> 2017-07-24 01:38:33.057964 7f196096a700  0 ERROR: keystone revocation
> processing returned error r=-22
> 2017-07-24 01:53:33.058098 7f196096a700  0 ERROR: keystone revocation
> processing returned error r=-22
> 2017-07-24 02:08:33.058225 7f196096a700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> The following are my keystone config options:
>
> "rgw_keystone_url": ""
> "rgw_keystone_admin_token": ""
> "rgw_keystone_admin_user": ""
> "rgw_keystone_admin_password": ""
> "rgw_keystone_admin_tenant": ""
> "rgw_keystone_admin_project": ""
> "rgw_keystone_admin_domain": ""
> "rgw_keystone_barbican_user": ""
> "rgw_keystone_barbican_password": ""
> "rgw_keystone_barbican_tenant": ""
> "rgw_keystone_barbican_project": ""
> "rgw_keystone_barbican_domain": ""
> "rgw_keystone_api_version": "2"
> "rgw_keystone_accepted_roles": "Member
> "rgw_keystone_accepted_admin_roles": ""
> "rgw_keystone_token_cache_size": "1"
> "rgw_keystone_revocation_interval": "900"
> "rgw_keystone_verify_ssl": "true"
> "rgw_keystone_implicit_tenants": "false"
> "rgw_s3_auth_use_keystone": "false"
>
> Is this fixed in RC2 by any chance?
>
> On Thu, Jun 29, 2017 at 3:11 AM, Martin Emrich
>  wrote:
> > Since upgrading to 12.1, our Object Gateways hang after a few hours, I
> only
> > see these messages in the log file:
> >
> >
> >
> > 2017-06-29 07:52:20.877587 7fa8e01e5700  0 ERROR: keystone revocation
> > processing returned error r=-22
> >
> > 2017-06-29 08:07:20.877761 7fa8e01e5700  0 ERROR: keystone revocation
> > processing returned error r=-22
> >
> > 2017-06-29 08:07:29.994979 7fa8e11e7700  0 process_single_logshard:
> Error in
> > get_bucket_info: (2) No such file or directory
> >
> > 2017-06-29 08:22:20.877911 7fa8e01e5700  0 ERROR: keystone revocation
> > processing returned error r=-22
> >
> > 2017-06-29 08:27:30.086119 7fa8e11e7700  0 process_single_logshard:
> Error in
> > get_bucket_info: (2) No such file or directory
> >
> > 2017-06-29 08:37:20.878108 7fa8e01e5700  0 ERROR: keystone revocation
> > processing returned error r=-22
> >
> > 2017-06-29 08:37:30.187696 7fa8e11e7700  0 process_single_logshard:
> Error in
> > get_bucket_info: (2) No such file or directory
> >
> > 2017-06-29 08:52:20.878283 7fa8e01e5700  0 ERROR: keystone revocation
> > processing returned error r=-22
> >
> > 2017-06-29 08:57:30.280881 7fa8e11e7700  0 process_single_logshard:
> Error in
> > get_bucket_info: (2) No such file or directory
> >
> > 2017-06-29 09:07:20.878451 7fa8e01e5700  0 ERROR: keystone revocation
> > processing returned error r=-22
> >
> >
> >
> > FYI: we do not use Keystone or Openstack.
> >
> >
> >
> > This started after upgrading from jewel (via kraken) to luminous.
> >
> >
> >
> > What could I do to fix this?
> >
> > Is there some “fsck” like consistency check + repair for the radosgw
> > buckets?
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Martin
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Restore RBD image

2017-07-24 Thread Martin Wittwer
Hi there,

Thanks for the answer!

I taught that there is someting strange during the resize operation
because it took to long but normaly it's instant. The logs doesn't
contain anything about the bug.

A few hours later I tried to set the size to 100G again but all files
were lost.

I had to restore everything from an old backup and my brain.

Best,
Martin


Am 24.07.2017 um 16:25 schrieb Jason Dillaman:
> Increasing the size of an image only issues a single write to update
> the image size metadata in the image header. That operation is atomic
> and really shouldn't be able to do what you are saying.  Regardless,
> since this is a grow operation, just re-run the resize to update the
> metadata again.
>
> On Mon, Jul 24, 2017 at 8:31 AM, Marc Roos  wrote:
>>
>> I would recommend logging into the host and running your commands from a
>> screen session, so they keep running.
>>
>>
>> -Original Message-
>> From: Martin Wittwer [mailto:martin.witt...@datonus.ch]
>> Sent: zondag 23 juli 2017 15:20
>> To: ceph-us...@ceph.com
>> Subject: [ceph-users] Restore RBD image
>>
>> Hi list
>>
>> I have a big problem:
>>
>> I had to resize a RBD image from 100G to 150G. So I used  rbd resize
>> --size 150G volume01 to resize.
>>
>> Because of a bad internet connection I was cicked from the server a few
>> seconds after the start of the resize.
>>
>> Now the image has a size of only 205M!
>>
>>
>> I now need to restore the RBD image or at least the files which were on
>> it. Is there a way to restore them?
>>
>> Best,
>> Martin
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Anybody worked with collectd and Luminous build? help please

2017-07-24 Thread Yang X
The perf counter dump added a "avgtime" field for which collectd-5.7.2
ceph plugin does not understand and put out a warning and exit.

ceph plugin: ds %s was not properly initialized.",

Anybody knows a patch to collectd which might help?

Thanks,

Yang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Degraded objects while OSD is being added/filled

2017-07-24 Thread Gregory Farnum
Yeah, the objects being degraded here are a consequence of stuff being
written while backfill is happening; it doesn't last long because it's only
a certain range of them.
I didn't think that should upgrade to the PG being marked degraded but may
be misinformed. Still planning to dig through that but haven't gotten to it
yet. :)

On Thu, Jul 20, 2017 at 8:13 AM Andras Pataki 
wrote:

> Hi Greg,
>
> I have just now added a single drive/osd to a clean cluster, and can see
> the degradation immediately.  We are on ceph 10.2.9 everywhere.
>
> Here is how the cluster looked before the OSD got added:
>
> cluster d7b33135-0940-4e48-8aa6-1d2026597c2f
>  health HEALTH_WARN
> noout flag(s) set
>  monmap e31: 3 mons at {cephmon00=
> 10.128.128.100:6789/0,cephmon01=10.128.128.101:6789/0,cephmon02=10.128.128.102:6789/0
> }
> election epoch 46092, quorum 0,1,2
> cephmon00,cephmon01,cephmon02
>   fsmap e26638: 1/1/1 up {0=cephmon01=up:active}, 2 up:standby
>  osdmap e681227: 1270 osds: 1270 up, 1270 in
> flags noout,sortbitwise,require_jewel_osds
>   pgmap v54583934: 42496 pgs, 6 pools, 1488 TB data, 437 Mobjects
> 4471 TB used, 3416 TB / 7887 TB avail
>42491 active+clean
>5 active+clean+scrubbing+deep
>   client io 2193 kB/s rd, 27240 kB/s wr, 85 op/s rd, 47 op/s wr
>
>
> And this is shortly after it was added (after all the peering was done):
>
> cluster d7b33135-0940-4e48-8aa6-1d2026597c2f
>  health HEALTH_WARN
> 141 pgs backfill_wait
> 117 pgs backfilling
> 20 pgs degraded
> 20 pgs recovery_wait
> 56 pgs stuck unclean
> recovery 130/1376744346 objects degraded (0.000%)
> recovery 3827502/1376744346 objects misplaced (0.278%)
> noout flag(s) set
>  monmap e31: 3 mons at {cephmon00=
> 10.128.128.100:6789/0,cephmon01=10.128.128.101:6789/0,cephmon02=10.128.128.102:6789/0
> }
> election epoch 46092, quorum 0,1,2
> cephmon00,cephmon01,cephmon02
>   fsmap e26638: 1/1/1 up {0=cephmon01=up:active}, 2 up:standby
>  osdmap e681238: 1271 osds: 1271 up, 1271 in; 258 remapped pgs
> flags noout,sortbitwise,require_jewel_osds
>   pgmap v54585141: 42496 pgs, 6 pools, 1488 TB data, 437 Mobjects
> 4471 TB used, 3423 TB / 7895 TB avail
> *130/1376744346 objects degraded (0.000%)*
> 3827502/1376744346 objects misplaced (0.278%)
>42210 active+clean
>  141 active+remapped+wait_backfill
>  117 active+remapped+backfilling
> *  20 active+recovery_wait+degraded*
>7 active+clean+scrubbing+deep
>1 active+clean+scrubbing
> recovery io 17375 MB/s, 5069 objects/s
>   client io 12210 kB/s rd, 29887 kB/s wr, 4 op/s rd, 140 op/s wr
>
>
> Even though there was no failure, we have 20 degraded PGs, and 130
> degraded objects.  My expectation was for some data to move around, start
> filling the added drive, but I would not expect to see degraded objects or
> PGs.
>
> Also, as time passes, the number of degraded objects increases steadily,
> here is a snapshot a little later:
>
> cluster d7b33135-0940-4e48-8aa6-1d2026597c2f
>  health HEALTH_WARN
> 63 pgs backfill_wait
> 4 pgs backfilling
> 67 pgs stuck unclean
> recovery 706/1377244134 objects degraded (0.000%)
> recovery 843267/1377244134 objects misplaced (0.061%)
> noout flag(s) set
>  monmap e31: 3 mons at {cephmon00=
> 10.128.128.100:6789/0,cephmon01=10.128.128.101:6789/0,cephmon02=10.128.128.102:6789/0
> }
> election epoch 46092, quorum 0,1,2
> cephmon00,cephmon01,cephmon02
>   fsmap e26640: 1/1/1 up {0=cephmon01=up:active}, 2 up:standby
>  osdmap e681569: 1271 osds: 1271 up, 1271 in; 67 remapped pgs
> flags noout,sortbitwise,require_jewel_osds
>   pgmap v54588554: 42496 pgs, 6 pools, 1488 TB data, 437 Mobjects
> 4471 TB used, 3423 TB / 7895 TB avail
> *706/1377244134 objects degraded (0.000%)*
> 843267/1377244134 objects misplaced (0.061%)
>42422 active+clean
>   63 active+remapped+wait_backfill
>5 active+clean+scrubbing+deep
>4 active+remapped+backfilling
>2 active+clean+scrubbing
> recovery io 779 MB/s, 229 objects/s
>   client io 306 MB/s rd, 344 MB/s wr, 138 op/s rd, 226 op/s wr
>
> From past experience, the degraded object count keeps going up for most of
> the time the disk is being filled.  Towards the end it decreases.  Is
> writing to a pool that is waiting for backfilling causing degraded objects
> to appear perhaps?
>
> I took a 'pg dump' before and after the change, as well as an 'osd tree'
> before and after.  All these are available at
> http://voms.simonsf

Re: [ceph-users] ceph recovery incomplete PGs on Luminous RC

2017-07-24 Thread Daniel K
I was able to export the PGs using the ceph-object-store tool and import
them to the new OSDs.

I moved some other OSDs from the bare metal on a node into a virtual
machine on the same node and was surprised at how easy it was. Install ceph
in the VM(using ceph-deploy) -- stop the OSD and dismount OSD drive from
physical machine, mount it to the VM, the OSD was auto-detected and
ceph-osd process started automatically and was up within a few seconds.

I'm having a different problem now that I will make a separate message
about.

Thanks!


On Mon, Jul 24, 2017 at 12:52 PM, Gregory Farnum  wrote:

>
> On Fri, Jul 21, 2017 at 10:23 PM Daniel K  wrote:
>
>> Luminous 12.1.0(RC)
>>
>> I replaced two OSD drives(old ones were still good, just too small),
>> using:
>>
>> ceph osd out osd.12
>> ceph osd crush remove osd.12
>> ceph auth del osd.12
>> systemctl stop ceph-osd@osd.12
>> ceph osd rm osd.12
>>
>> I later found that I also should have unmounted it from
>> /var/lib/ceph/osd-12
>>
>> (remove old disk, insert new disk)
>>
>> I added the new disk/osd with ceph-deploy osd prepare stor-vm3:sdg
>> --bluestore
>>
>> This automatically activated the osd (not sure why, I thought it needed a
>> ceph-deploy osd activate as well)
>>
>>
>> Then, working on an unrelated issue, I upgraded one (out of 4 total)
>> nodes to 12.1.1 using apt and rebooted.
>>
>> The mon daemon would not form a quorum with the others on 12.1.0, so,
>> instead of troubleshooting that, I just went ahead and upgraded the other 3
>> nodes and rebooted.
>>
>> Lots of recovery IO went on afterwards, but now things have stopped at:
>>
>> pools:   10 pools, 6804 pgs
>> objects: 1784k objects, 7132 GB
>> usage:   11915 GB used, 19754 GB / 31669 GB avail
>> pgs: 0.353% pgs not active
>>  70894/2988573 objects degraded (2.372%)
>>  422090/2988573 objects misplaced (14.123%)
>>  6626 active+clean
>>  129  active+remapped+backfill_wait
>>  23   incomplete
>>  14   active+undersized+degraded+remapped+backfill_wait
>>  4active+undersized+degraded+remapped+backfilling
>>  4active+remapped+backfilling
>>  2active+clean+scrubbing+deep
>>  1peering
>>  1active+recovery_wait+degraded+remapped
>>
>>
>> when I run ceph pg query on the incompletes, they all list at least one
>> of the two removed OSDs(12,17) in "down_osds_we_would_probe"
>>
>> most pools are size:2 min_size 1(trusting bluestore to tell me which one
>> is valid). One pool is size:1 min size:1 and I'm okay with losing it,
>> except I had it mounted in a directory on cephfs, I rm'd the directory but
>> I can't delete the pool because it's "in use by CephFS"
>>
>>
>> I still have the old drives, can I stick them into another host and
>> re-add them somehow?
>>
>
> Yes, that'll probably be your easiest solution. You may have some trouble
> because you already deleted them, but I'm not sure.
>
> Alternatively, you ought to be able to remove the pool from CephFS using
> some of the monitor commands and then delete it.
>
>
>> This data isn't super important, but I'd like to learn a bit on how to
>> recover when bad things happen as we are planning a production deployment
>> in a couple of weeks.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Random CephFS freeze, osd bad authorize reply

2017-07-24 Thread Gregory Farnum
Are the clocks dramatically out of sync? Basically any bug in signing could
cause that kind of log message, but u think simple time sync so they're
using different keys is the most common.
On Mon, Jul 24, 2017 at 9:36 AM  wrote:

> Hi,
>
> I'm running a Ceph cluster which I started back in bobtail age and kept it
> running/upgrading over the years. It has three nodes, each running one MON,
> 10 OSDs and one MDS. The cluster has one MDS active and two standby.
> Machines are 8-core Opterons with 32GB of ECC RAM each. I'm using it to
> host our clients (about 25) /home using CephFS and as a RBD Backend for a
> couple of libvirt VMs (about 5).
>
> Currently I'm running 11.2.0 (kraken) and a couple of month ago I started
> experiencing some strange behaviour. Exactly 2 of my ~25 CephFS Clients
> (always the same two) keep freezing their /home about 1 or two hours after
> first boot in the morning. At the moment of freeze, syslog starts reporting
> loads of:
>
> _hostname_ kernel: libceph: osdXX 172.16.0.XXX:68XX bad authorize reply
>
> On one of the clients I replaced every single piece of hardware with new
> hardware, so that machine is completely replaced now including NIC, Switch,
> Network-Cabling and did a complete OS reinstall. But the user is still
> getting that behaviour. As far as I could get, it seems that key
> renegotiation is failing and client tries to keep connecting with old cephx
> key. But I cannot find a reason for why this is happening and how to fix it.
>
> Biggest problem, the second affected machine is the one of our CEO and if
> we won't fix it I will have a hard time explaining that Ceph is the way to
> go.
>
> The two affected machines do not share any common piece of network segment
> other than TOR-Switch in Ceph Rack, while there are other clients that do
> share network segment with affected machines but arent affected at all.
>
> Google won't help me either on this one, seems no one else is experiencing
> something similar.
>
> Client setup on all clients is Debian Jessie with 4.9 Backports kernel,
> using kernel client for mounting CephFS. I think the whole thing started
> with a kernel upgrade from one 4.X series to another, but cannout
> reconstruct.
>
> Any help greatly appreciated.
>
> Best regards,
> Tobi
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph recovery incomplete PGs on Luminous RC

2017-07-24 Thread Gregory Farnum
On Fri, Jul 21, 2017 at 10:23 PM Daniel K  wrote:

> Luminous 12.1.0(RC)
>
> I replaced two OSD drives(old ones were still good, just too small), using:
>
> ceph osd out osd.12
> ceph osd crush remove osd.12
> ceph auth del osd.12
> systemctl stop ceph-osd@osd.12
> ceph osd rm osd.12
>
> I later found that I also should have unmounted it from
> /var/lib/ceph/osd-12
>
> (remove old disk, insert new disk)
>
> I added the new disk/osd with ceph-deploy osd prepare stor-vm3:sdg
> --bluestore
>
> This automatically activated the osd (not sure why, I thought it needed a
> ceph-deploy osd activate as well)
>
>
> Then, working on an unrelated issue, I upgraded one (out of 4 total) nodes
> to 12.1.1 using apt and rebooted.
>
> The mon daemon would not form a quorum with the others on 12.1.0, so,
> instead of troubleshooting that, I just went ahead and upgraded the other 3
> nodes and rebooted.
>
> Lots of recovery IO went on afterwards, but now things have stopped at:
>
> pools:   10 pools, 6804 pgs
> objects: 1784k objects, 7132 GB
> usage:   11915 GB used, 19754 GB / 31669 GB avail
> pgs: 0.353% pgs not active
>  70894/2988573 objects degraded (2.372%)
>  422090/2988573 objects misplaced (14.123%)
>  6626 active+clean
>  129  active+remapped+backfill_wait
>  23   incomplete
>  14   active+undersized+degraded+remapped+backfill_wait
>  4active+undersized+degraded+remapped+backfilling
>  4active+remapped+backfilling
>  2active+clean+scrubbing+deep
>  1peering
>  1active+recovery_wait+degraded+remapped
>
>
> when I run ceph pg query on the incompletes, they all list at least one of
> the two removed OSDs(12,17) in "down_osds_we_would_probe"
>
> most pools are size:2 min_size 1(trusting bluestore to tell me which one
> is valid). One pool is size:1 min size:1 and I'm okay with losing it,
> except I had it mounted in a directory on cephfs, I rm'd the directory but
> I can't delete the pool because it's "in use by CephFS"
>
>
> I still have the old drives, can I stick them into another host and re-add
> them somehow?
>

Yes, that'll probably be your easiest solution. You may have some trouble
because you already deleted them, but I'm not sure.

Alternatively, you ought to be able to remove the pool from CephFS using
some of the monitor commands and then delete it.


> This data isn't super important, but I'd like to learn a bit on how to
> recover when bad things happen as we are planning a production deployment
> in a couple of weeks.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Random CephFS freeze, osd bad authorize reply

2017-07-24 Thread topro


Hi,

 

I'm running a Ceph cluster which I started back in bobtail age and kept it running/upgrading over the years. It has three nodes, each running one MON, 10 OSDs and one MDS. The cluster has one MDS active and two standby. Machines are 8-core Opterons with 32GB of ECC RAM each. I'm using it to host our clients (about 25) /home using CephFS and as a RBD Backend for a couple of libvirt VMs (about 5).

 

Currently I'm running 11.2.0 (kraken) and a couple of month ago I started experiencing some strange behaviour. Exactly 2 of my ~25 CephFS Clients (always the same two) keep freezing their /home about 1 or two hours after first boot in the morning. At the moment of freeze, syslog starts reporting loads of:

 

_hostname_ kernel: libceph: osdXX 172.16.0.XXX:68XX bad authorize reply

 

On one of the clients I replaced every single piece of hardware with new hardware, so that machine is completely replaced now including NIC, Switch, Network-Cabling and did a complete OS reinstall. But the user is still getting that behaviour. As far as I could get, it seems that key renegotiation is failing and client tries to keep connecting with old cephx key. But I cannot find a reason for why this is happening and how to fix it.

 

Biggest problem, the second affected machine is the one of our CEO and if we won't fix it I will have a hard time explaining that Ceph is the way to go.

 

The two affected machines do not share any common piece of network segment other than TOR-Switch in Ceph Rack, while there are other clients that do share network segment with affected machines but arent affected at all.

 

Google won't help me either on this one, seems no one else is experiencing something similar.

 

Client setup on all clients is Debian Jessie with 4.9 Backports kernel, using kernel client for mounting CephFS. I think the whole thing started with a kernel upgrade from one 4.X series to another, but cannout reconstruct.

 

Any help greatly appreciated.

 

Best regards,

Tobi



 

 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mounting pool, but where are the files?

2017-07-24 Thread John Spray
On Mon, Jul 24, 2017 at 4:52 PM,   wrote:
> Hello!
>
> I created CephFS according to documentation:
> $ ceph osd pool create hdb-backup 
> $ ceph osd pool create hdb-backup_metadata 
> $ ceph fs new   
>
> I can mount this pool with user admin:
> ld4257:/etc/ceph # mount -t ceph 10.96.5.37,10.96.5.38,10.96.5.38:/ 
> /mnt/cephfs -o name=admin,secretfile=/etc/ceph/ceph.client.admin.key

Need to untangle the terminology a bit.

What you're mounting is a filesystem, the filesystem is storing it's
data in pools.  Pools are a lower-level concept than filesystems.

> ld4257:/etc/ceph # mount | grep ceph
> 10.96.5.37,10.96.5.38,10.96.5.38:/ on /mnt/cephfs type ceph 
> (rw,relatime,name=admin,secret=,acl)
>
> To verify which pool is mounted, I checked this:
> ld4257:/etc/ceph # ceph osd lspools
> 0 rbd,1 templates,3 hdb-backup,4 hdb-backup_metadata,
>
> ld4257:/etc/ceph # cephfs /mnt/cephfs/ show_layout
> WARNING: This tool is deprecated.  Use the layout.* xattrs to query and 
> modify layouts.
> layout.data_pool: 3
> layout.object_size:   4194304
> layout.stripe_unit:   4194304
> layout.stripe_count:  1
>
> So, I guess the correct pool "hdb-backup" is now mounted to /mnt/cephfs.
>
> Then I pushed some files in this pool.

I think you mean that you put some objects into your pool.  So at this
stage you have not created any files, cephfs doesn't know anything
about these objects.  You would need to really create files (i.e.
write to your mount) to have files that exist in cephfs.

> I can display the relevant objects now:
> ld4257:/etc/ceph # rados -p hdb-backup ls
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:7269
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:6357
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:772
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:14039
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1803
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:5549
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:15797
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:20624
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:7322
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:5208
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:17479
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:14361
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:16963
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:4694
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1391
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1199
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:11359
> MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:11995
> [...]
>
> (This is just an extract, there are many more object.)
>
> Now, the question is:
> Can I display these files with CephFS?

Unfortunately not -- you would need to write your data in as files
(via a cephfs mount) to read it back as files.

John

>
> When I check the content of /mnt/cephfs, there's only one directory "MTY" 
> that I have created; this directory is not related to the output of rados at 
> all:
> ld4257:/etc/ceph # ll /mnt/cephfs/
> total 0
> drwxr-xr-x 1 root root 0 Jul 24 15:57 MTY
>
> THX
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous radosgw hangs after a few hours

2017-07-24 Thread Vaibhav Bhembre
I am seeing the same issue on upgrade to Luminous v12.1.0 from Jewel.
I am not using Keystone or OpenStack either and my radosgw daemon
hangs as well. I have to restart it to resume processing.

2017-07-24 00:23:33.057401 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 00:38:33.057524 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 00:53:33.057648 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:08:33.057749 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:23:33.057878 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:38:33.057964 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:53:33.058098 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 02:08:33.058225 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22

The following are my keystone config options:

"rgw_keystone_url": ""
"rgw_keystone_admin_token": ""
"rgw_keystone_admin_user": ""
"rgw_keystone_admin_password": ""
"rgw_keystone_admin_tenant": ""
"rgw_keystone_admin_project": ""
"rgw_keystone_admin_domain": ""
"rgw_keystone_barbican_user": ""
"rgw_keystone_barbican_password": ""
"rgw_keystone_barbican_tenant": ""
"rgw_keystone_barbican_project": ""
"rgw_keystone_barbican_domain": ""
"rgw_keystone_api_version": "2"
"rgw_keystone_accepted_roles": "Member
"rgw_keystone_accepted_admin_roles": ""
"rgw_keystone_token_cache_size": "1"
"rgw_keystone_revocation_interval": "900"
"rgw_keystone_verify_ssl": "true"
"rgw_keystone_implicit_tenants": "false"
"rgw_s3_auth_use_keystone": "false"

Is this fixed in RC2 by any chance?

On Thu, Jun 29, 2017 at 3:11 AM, Martin Emrich
 wrote:
> Since upgrading to 12.1, our Object Gateways hang after a few hours, I only
> see these messages in the log file:
>
>
>
> 2017-06-29 07:52:20.877587 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:07:20.877761 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:07:29.994979 7fa8e11e7700  0 process_single_logshard: Error in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 08:22:20.877911 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:27:30.086119 7fa8e11e7700  0 process_single_logshard: Error in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 08:37:20.878108 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:37:30.187696 7fa8e11e7700  0 process_single_logshard: Error in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 08:52:20.878283 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:57:30.280881 7fa8e11e7700  0 process_single_logshard: Error in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 09:07:20.878451 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
>
>
> FYI: we do not use Keystone or Openstack.
>
>
>
> This started after upgrading from jewel (via kraken) to luminous.
>
>
>
> What could I do to fix this?
>
> Is there some “fsck” like consistency check + repair for the radosgw
> buckets?
>
>
>
> Thanks,
>
>
>
> Martin
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mounting pool, but where are the files?

2017-07-24 Thread c . monty
Hello!

I created CephFS according to documentation:
$ ceph osd pool create hdb-backup 
$ ceph osd pool create hdb-backup_metadata 
$ ceph fs new   

I can mount this pool with user admin:
ld4257:/etc/ceph # mount -t ceph 10.96.5.37,10.96.5.38,10.96.5.38:/ /mnt/cephfs 
-o name=admin,secretfile=/etc/ceph/ceph.client.admin.key

ld4257:/etc/ceph # mount | grep ceph
10.96.5.37,10.96.5.38,10.96.5.38:/ on /mnt/cephfs type ceph 
(rw,relatime,name=admin,secret=,acl)

To verify which pool is mounted, I checked this:
ld4257:/etc/ceph # ceph osd lspools
0 rbd,1 templates,3 hdb-backup,4 hdb-backup_metadata,

ld4257:/etc/ceph # cephfs /mnt/cephfs/ show_layout
WARNING: This tool is deprecated.  Use the layout.* xattrs to query and modify 
layouts.
layout.data_pool: 3
layout.object_size:   4194304
layout.stripe_unit:   4194304
layout.stripe_count:  1

So, I guess the correct pool "hdb-backup" is now mounted to /mnt/cephfs.

Then I pushed some files in this pool.
I can display the relevant objects now:
ld4257:/etc/ceph # rados -p hdb-backup ls
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:7269
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:6357
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:772
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:14039
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1803
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:5549
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:15797
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:20624
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:7322
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:5208
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:17479
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:14361
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:16963
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:4694
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1391
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:1199
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:11359
MTY:file:8669fdbb88fda698afbac6374d826cba133a8d11:11995
[...]

(This is just an extract, there are many more object.)

Now, the question is:
Can I display these files with CephFS?

When I check the content of /mnt/cephfs, there's only one directory "MTY" that 
I have created; this directory is not related to the output of rados at all:
ld4257:/etc/ceph # ll /mnt/cephfs/
total 0
drwxr-xr-x 1 root root 0 Jul 24 15:57 MTY

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Restore RBD image

2017-07-24 Thread Jason Dillaman
Increasing the size of an image only issues a single write to update
the image size metadata in the image header. That operation is atomic
and really shouldn't be able to do what you are saying.  Regardless,
since this is a grow operation, just re-run the resize to update the
metadata again.

On Mon, Jul 24, 2017 at 8:31 AM, Marc Roos  wrote:
>
>
> I would recommend logging into the host and running your commands from a
> screen session, so they keep running.
>
>
> -Original Message-
> From: Martin Wittwer [mailto:martin.witt...@datonus.ch]
> Sent: zondag 23 juli 2017 15:20
> To: ceph-us...@ceph.com
> Subject: [ceph-users] Restore RBD image
>
> Hi list
>
> I have a big problem:
>
> I had to resize a RBD image from 100G to 150G. So I used  rbd resize
> --size 150G volume01 to resize.
>
> Because of a bad internet connection I was cicked from the server a few
> seconds after the start of the resize.
>
> Now the image has a size of only 205M!
>
>
> I now need to restore the RBD image or at least the files which were on
> it. Is there a way to restore them?
>
> Best,
> Martin
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mount CephFS with dedicated user fails: mount error 13 = Permission denied

2017-07-24 Thread c . monty
THX.
Mount is working now.

The auth list for user mtyadm is now:
client.mtyadm
 key: AQAlyXVZEfsYNRAAM4jHuV1Br7lpRx1qaINO+A==
 caps: [mds] allow r,allow rw path=/MTY
 caps: [mon] allow r
 caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata
24. Juli 2017 13:25, "Дмитрий Глушенок"  schrieb:
Check your kernel version, prior to 4.9 it was needed to allow read on root 
path: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014804.html 
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014804.html) 
 
24 июля 2017 г., в 12:36, c.mo...@web.de (mailto:c.mo...@web.de) написал(а): 

Hello!

I want to mount CephFS with a dedicated user in order to avoid putting the 
admin key on every client host.
Therefore I created a user account
ceph auth get-or-create client.mtyadm mon 'allow r' mds 'allow rw path=/MTY' 
osd 'allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata' -o 
/etc/ceph/ceph.client.mtyadm.keyring
and wrote out the keyring
ceph-authtool -p -n client.mtyadm ceph.client.mtyadm.keyring > 
ceph.client.mtyadm.key

This user is now displayed in auth list:
client.mtyadm
key: AQBYu3VZLg66LBAAGM1jW+cvNE6BoJWfsORZKA==
caps: [mds] allow rw path=/MTY
caps: [mon] allow r
caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata

When I try to mount directory /MTY on the client host I get this error:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=mtyadm,secretfile=/etc/ceph/ceph.client.mtyadm.key
mount error 13 = Permission denied

The mount works using admin though:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=admin,secretfile=/etc/ceph/ceph.client.admin.key
ld2398:/etc/ceph # mount | grep cephfs
10.96.5.37,10.96.5.38,10.96.5.38:/MTY on /mnt/cephfs type ceph 
(rw,relatime,name=admin,secret=,acl)

What is causing this mount error?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com (mailto:ceph-users@lists.ceph.com)
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
(http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com)
--
Dmitry Glushenok
Jet Infosystems
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Exclusive-lock Ceph

2017-07-24 Thread Jason Dillaman
You will need to pass the "exclusive" option when running "rbd map"
(and be running kernel >= 4.12).

On Mon, Jul 24, 2017 at 8:42 AM,   wrote:
> I'm testing ceph in my enviroment, but the feature exclusive lock don't
> works fine for me or maybe i'm doing something wrong.
>
> I testing in two machines create one image with exclusive-lock enable, if I
> understood correctly, with this feature, one machine only can mount and
> write in image at time.
>
> But When I'm testing, i saw the lock always is move to machine that try
> mount the volume lastly
>
> Example if i try mount the image in machine1 i see ip the machine1 and i
> mount the volume in machine1 :
> #rbd lock list test-xlock3
> There is 1 exclusive lock on this image.
> Locker  IDAddress
> client.4390 auto  192.168.0.1:0/2940167630
>
> But if now i running rbd map and try mount image in machine2, the lock is
> change to machine2, and i believe this is one error, because if lock already
> in machine one and i write in image, the machine2 don't should can mount the
> same image in the same time.
> If i running in machine2 now, i see :
>
> #rbd lock list test-xlock3
> There is 1 exclusive lock on this image.
> Locker  IDAddress
> client.4491 auto XX 192.168.0.2:0/1260424031
>
>
>
> Exclusive-lock enable in my image :
>
> rbd info  test-xlock3 | grep features
> features: exclusive-lock
>
>
> i'm doing some wrong ? Existing some conf, to add in ceph.conf, to fix this,
> if one machine mount the volume, the machine2 don't can in the same time, i
> read about command rbd
> lock, but this command seem deprecated.
>
>
>
> Thanks, a lot.
> Marcelo
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Exclusive-lock Ceph

2017-07-24 Thread lista

I'm testing ceph in my enviroment, but the feature exclusive lock don't works 
fine for me or maybe i'm doing something wrong.

I testing in two machines create one image with exclusive-lock enable, if I 
understood correctly, with this feature, one machine only can mount and write 
in image at time.

But When I'm testing, i saw the lock always is move to machine that try mount 
the volume lastly

Example if i try mount the image in machine1 i see ip the machine1 and i mount 
the volume in machine1 :
#rbd lock list test-xlock3
There is 1 exclusive lock on this image.
Locker  
ID   
 Address
client.4390 auto  192.168.0.1:0/2940167630

But if now i running rbd map and try mount image in machine2, the lock is 
change to machine2, and i believe this is one error, because if lock already in 
machine one and i write in image, the machine2 don't should can mount the same 
image in the same time.
If i running in machine2 now, i see :

#rbd lock list test-xlock3
There is 1 exclusive lock on this image.
Locker  
ID   
 Address
client.4491 auto XX 192.168.0.2:0/1260424031
 
Exclusive-lock enable in my image :
rbd info  test-xlock3 | grep features
features: exclusive-lock

i'm doing some wrong ? Existing some conf, to add in ceph.conf, to fix this, if 
one machine mount the volume, the machine2 don't can in the same time, i read 
about command rbd
lock, but this command seem deprecated.
 
Thanks, a lot.
Marcelo___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Restore RBD image

2017-07-24 Thread Marc Roos
 

I would recommend logging into the host and running your commands from a 
screen session, so they keep running.


-Original Message-
From: Martin Wittwer [mailto:martin.witt...@datonus.ch] 
Sent: zondag 23 juli 2017 15:20
To: ceph-us...@ceph.com
Subject: [ceph-users] Restore RBD image

Hi list

I have a big problem:

I had to resize a RBD image from 100G to 150G. So I used  rbd resize 
--size 150G volume01 to resize.

Because of a bad internet connection I was cicked from the server a few 
seconds after the start of the resize.

Now the image has a size of only 205M!


I now need to restore the RBD image or at least the files which were on 
it. Is there a way to restore them?

Best,
Martin


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mount CephFS with dedicated user fails: mount error 13 = Permission denied

2017-07-24 Thread Дмитрий Глушенок
Check your kernel version, prior to 4.9 it was needed to allow read on root 
path: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014804.html

> 24 июля 2017 г., в 12:36, c.mo...@web.de написал(а):
> 
> Hello!
> 
> I want to mount CephFS with a dedicated user in order to avoid putting the 
> admin key on every client host.
> Therefore I created a user account
> ceph auth get-or-create client.mtyadm mon 'allow r' mds 'allow rw path=/MTY' 
> osd 'allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata' -o 
> /etc/ceph/ceph.client.mtyadm.keyring
> and wrote out the keyring
> ceph-authtool -p -n client.mtyadm ceph.client.mtyadm.keyring > 
> ceph.client.mtyadm.key
> 
> This user is now displayed in auth list:
> client.mtyadm
>key: AQBYu3VZLg66LBAAGM1jW+cvNE6BoJWfsORZKA==
>caps: [mds] allow rw path=/MTY
>caps: [mon] allow r
>caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata
> 
> When I try to mount directory /MTY on the client host I get this error:
> ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
> /mnt/cephfs -o name=mtyadm,secretfile=/etc/ceph/ceph.client.mtyadm.key
> mount error 13 = Permission denied
> 
> The mount works using admin though:
> ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
> /mnt/cephfs -o name=admin,secretfile=/etc/ceph/ceph.client.admin.key
> ld2398:/etc/ceph # mount | grep cephfs
> 10.96.5.37,10.96.5.38,10.96.5.38:/MTY on /mnt/cephfs type ceph 
> (rw,relatime,name=admin,secret=,acl)
> 
> What is causing this mount error?
> 
> THX
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Dmitry Glushenok
Jet Infosystems

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mount CephFS with dedicated user fails: mount error 13 = Permission denied

2017-07-24 Thread Jaime Ibar

Hi,

I think you there is missing perm for the mds.

Try adding allow r to mds permissions.

Something like

ceph auth get-or-create client.mtyadm mon 'allow r' mds '*allow r*, 
allow rw path=/MTY' osd 'allow rw pool=hdb-backup,allow rw 
pool=hdb-backup_metadata' -o /etc/ceph/ceph.client.mtyadm.keyring


Jaime


On 24/07/17 10:36, c.mo...@web.de wrote:

Hello!

I want to mount CephFS with a dedicated user in order to avoid putting the 
admin key on every client host.
Therefore I created a user account
ceph auth get-or-create client.mtyadm mon 'allow r' mds 'allow rw path=/MTY' 
osd 'allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata' -o 
/etc/ceph/ceph.client.mtyadm.keyring
and wrote out the keyring
ceph-authtool -p -n client.mtyadm ceph.client.mtyadm.keyring > 
ceph.client.mtyadm.key

This user is now displayed in auth list:
client.mtyadm
 key: AQBYu3VZLg66LBAAGM1jW+cvNE6BoJWfsORZKA==
 caps: [mds] allow rw path=/MTY
 caps: [mon] allow r
 caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata

When I try to mount directory /MTY on the client host I get this error:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=mtyadm,secretfile=/etc/ceph/ceph.client.mtyadm.key
mount error 13 = Permission denied

The mount works using admin though:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=admin,secretfile=/etc/ceph/ceph.client.admin.key
ld2398:/etc/ceph # mount | grep cephfs
10.96.5.37,10.96.5.38,10.96.5.38:/MTY on /mnt/cephfs type ceph 
(rw,relatime,name=admin,secret=,acl)

What is causing this mount error?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
Tel: +353-1-896-3725

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mount CephFS with dedicated user fails: mount error 13 = Permission denied

2017-07-24 Thread c . monty
Hello!

I want to mount CephFS with a dedicated user in order to avoid putting the 
admin key on every client host.
Therefore I created a user account
ceph auth get-or-create client.mtyadm mon 'allow r' mds 'allow rw path=/MTY' 
osd 'allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata' -o 
/etc/ceph/ceph.client.mtyadm.keyring
and wrote out the keyring
ceph-authtool -p -n client.mtyadm ceph.client.mtyadm.keyring > 
ceph.client.mtyadm.key

This user is now displayed in auth list:
client.mtyadm
key: AQBYu3VZLg66LBAAGM1jW+cvNE6BoJWfsORZKA==
caps: [mds] allow rw path=/MTY
caps: [mon] allow r
caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata

When I try to mount directory /MTY on the client host I get this error:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=mtyadm,secretfile=/etc/ceph/ceph.client.mtyadm.key
mount error 13 = Permission denied

The mount works using admin though:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=admin,secretfile=/etc/ceph/ceph.client.admin.key
ld2398:/etc/ceph # mount | grep cephfs
10.96.5.37,10.96.5.38,10.96.5.38:/MTY on /mnt/cephfs type ceph 
(rw,relatime,name=admin,secret=,acl)

What is causing this mount error?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com