Re: [ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-24 Thread Brad Hubbard
On Fri, Nov 25, 2016 at 11:55 AM, Craig Chi wrote: > Hi Brad, > > Thank you for your investigation. > > Here are the reasons of why we thought the abnormal Ceph behavior was > caused by memory exhaustion. The following link redirect to the dmesg > output on a toughly

Re: [ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-24 Thread Craig Chi
Hi Brad, Thank you for your investigation. Here are the reasons of why we thought the abnormal Ceph behavior was caused by memory exhaustion. The following link redirect to the dmesg output on a toughly survived Ceph node.http://pastebin.com/Aa1FDd4K However I can not ensure that this is

Re: [ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-24 Thread Craig Chi
Hi Nick, I have seen the report before, if I understand correctly, the osd_map_cache_size generally introduces a fixed amount of memory usage. We are using the default value of 200, and a single osd map I got from our cluster is 404KB. That is totally 404KB * 200 * 90 (osds) = about 7GB on

Re: [ceph-users] metrics.ceph.com

2016-11-24 Thread Brad Hubbard
Patrick, I remember hearing you talk about this site recently. Do you know who can help with this query? On Fri, Nov 25, 2016 at 2:13 AM, Nick Fisk wrote: > Who is responsible for the metrics.ceph.com site? I noticed that the mailing > list stats are still trying to retrieve

Re: [ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-24 Thread Brad Hubbard
Two of these appear to be hung task timeouts and the other is an invalid opcode. There is no evidence here of memory exhaustion (although it remains to be seen whether this is a factor but I'd expect to see evidence of shrinker activity in the stacks) and I would speculate the increased memory

Re: [ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-24 Thread Nick Fisk
There’s a couple of things you can do to reduce memory usage by limiting the number of OSD maps each OSD stores, but you will still be pushing up against the limits of the ram you have available. There is a Cern 30PB test (should be on google) which gives some details on some of the settings,

[ceph-users] Can't download some files from RGW

2016-11-24 Thread Martin Bureau
Hello, I have some files that have been uploaded to a Ceph Jewel (10.2.2) cluster but can't be downloaded afterwards. HEAD on the file is successful but GET returns 404. Here is the output from object stat for one of these files : # radosgw-admin object stat --bucket=sam-storage-mtl-8m-00

[ceph-users] Fwd: RadosGW not responding if ceph cluster in state health_error

2016-11-24 Thread Thomas
Sorry to bring this up again - any ideas? Or should I try the IRC channel? Cheers, Thomas Original Message Subject:RadosGW not responding if ceph cluster in state health_error Date: Mon, 21 Nov 2016 17:22:20 +1300 From: Thomas To:

[ceph-users] Rados GW + CDN

2016-11-24 Thread Daniel Picolli Biazus
Hey Guys, We have a small/medium cluster (~ 30 TB/ 30 OSDs / 5 monitors) mainly used as an Object Storage through 4 Rados S3 API. We'd like to add one or two more RadosGW without S3 authentication in order to delivery these objects using a simple HTTP CDN configuration. Is it possible to

Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-24 Thread Daznis
I will try it, but I wanna see if it stays stable for a few days. Not sure if I should report this bug or not. On Thu, Nov 24, 2016 at 6:05 PM, Nick Fisk wrote: > Can you add them with different ID's, it won't look pretty but might get you > out of this situation? > >>

[ceph-users] Inconsistent PG, is safe pg repair? or manual fix?

2016-11-24 Thread Ana Aviles
Hello, We have a cluster with HEALTH_ERR state for a while now. We are trying to figure out how to solve it without the need of removing the affected rbd image. ceph -s cluster e94277ae-3d38-4547-8add-2cf3306f3efd health HEALTH_ERR 1 pgs inconsistent 5 scrub

[ceph-users] metrics.ceph.com

2016-11-24 Thread Nick Fisk
Who is responsible for the metrics.ceph.com site? I noticed that the mailing list stats are still trying to retrieve data from the gmane archives which are no longer active. Nick ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-24 Thread Nick Fisk
Can you add them with different ID's, it won't look pretty but might get you out of this situation? > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Daznis > Sent: 24 November 2016 15:43 > To: Nick Fisk > Cc: ceph-users

[ceph-users] PG calculate for cluster with a huge small objects

2016-11-24 Thread Mike
Hello. We have a cluster, 32 OSD, 80Tb usage space, FireFly 0.80.9 release. This cluster we use for RBD and RadosGW Object storage witch our Openstack. The data pools (Volumes, compute, images) are fine, but .rgw pool use 101Mb space and have _462446_ objects. Average size an object in this

Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-24 Thread Daznis
Yes, unfortunately, it is. And the story still continues. I have noticed that only 4 OSD are doing this and zapping and readding it does not solve the issue. Removing them completely from the cluster solve that issue, but I can't reuse their ID's. If I add another one with the same ID it starts

Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi, Just checked permissions: > # ceph auth get client.cinder > exported keyring for client.cinder > [client.cinder] > key = REDACTED > caps mon = "allow r" > caps osd = "allow class-read object_prefix rbd_children, allow rwx > pool=cinder-volumes, allow rwx pool=cinder-vms, allow rx

Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi Nick, Oh... In retrospect it makes sense in a way, but it does not as well. ;-) To clarify: it makes sense since the cache is "just a pool" but it does not since "it is an overlay and just a cache in between". Anyway, something that should be well documented and warned for, if you ask me.

Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Kees > Meijs > Sent: 24 November 2016 14:20 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Stalling IO with cache tier > > Hi Nick, > > All Ceph pools have very restrictive

Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi Nick, All Ceph pools have very restrictive permissions for each OpenStack service, indeed. Besides creating the cache pool and enabling it, no additional parameters or configuration was done. Do I understand correctly access parameters (e.g. authx keys) are needed for a cache tier? If yes, it

Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi Burkhard, A testing pool makes absolute sense, thank you. About the complete setup, the documentation states: > The cache tiering agent can flush or evict objects based upon the > total number of bytes *or* the total number of objects. To specify a > maximum number of bytes, execute the

Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Burkhard Linke > Sent: 24 November 2016 14:06 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Stalling IO with cache tier > > Hi, > > > *snipsnap* > > > >> # ceph osd tier

Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Burkhard Linke
Hi, *snipsnap* # ceph osd tier add cinder-volumes cache pool 'cache' is now (or already was) a tier of 'cinder-volumes' # ceph osd tier cache-mode cache writeback set cache-mode for pool 'cache' to writeback # ceph osd tier set-overlay cinder-volumes cache overlay for 'cinder-volumes' is now

Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi, In addition, some log was generated by KVM processes: > qemu: terminating on signal 15 from pid 2827 > osdc/ObjectCacher.cc: In function 'ObjectCacher::~ObjectCacher()' > thread 7f265a77da80 time 2016-11-23 17:26:24.237542 > osdc/ObjectCacher.cc: 551: FAILED assert(i->empty()) > ceph

[ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi list, Our current Ceph production cluster seems to cope with performance issues, so we decided to add a fully flash based cache tier (now running with spinners and journals on separate SSDs). We ordered SSDs (Intel), disk trays and read

Re: [ceph-users] Release schedule and notes.

2016-11-24 Thread Stephen Harker
On 24/11/16 11:23, John Spray wrote: On Thu, Nov 24, 2016 at 11:09 AM, Stephen Harker wrote: Hi All, This morning I went looking for information on the Ceph release timelines and so on and was directed to this page by Google:

Re: [ceph-users] Release schedule and notes.

2016-11-24 Thread Eneko Lacunza
Hi, El 24/11/16 a las 12:09, Stephen Harker escribió: Hi All, This morning I went looking for information on the Ceph release timelines and so on and was directed to this page by Google: http://docs.ceph.com/docs/jewel/releases/ but this doesn't seem to have been updated for a long time.

Re: [ceph-users] Release schedule and notes.

2016-11-24 Thread John Spray
On Thu, Nov 24, 2016 at 11:09 AM, Stephen Harker wrote: > Hi All, > > This morning I went looking for information on the Ceph release timelines > and so on and was directed to this page by Google: > > http://docs.ceph.com/docs/jewel/releases/ Replace jewel with

[ceph-users] Release schedule and notes.

2016-11-24 Thread Stephen Harker
Hi All, This morning I went looking for information on the Ceph release timelines and so on and was directed to this page by Google: http://docs.ceph.com/docs/jewel/releases/ but this doesn't seem to have been updated for a long time. Is there somewhere else I should be looking?

Re: [ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-24 Thread Craig Chi
Hi Nick, Thank you for your helpful information. I knew that Ceph recommends 1GB/1TB RAM, but we are not going to change the hardware architecture now. Are there any methods to set the resource limit one OSD can consume? And for your question, we currently set system configuration as:

Re: [ceph-users] new mon can't join new cluster, probe_timeout / probing

2016-11-24 Thread grin
Replying myself. On Wed, 23 Nov 2016 18:50:02 +0100 grin wrote: > This is possibly some network issue, but I cannot see the indicator > about what to see. mon0 usually stands in quorum alone, and other mons > cannot join. They get the monmap, they intend to join, but it just >

Re: [ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-24 Thread Nick Fisk
Hi Craig, From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Craig Chi Sent: 24 November 2016 08:34 To: ceph-users@lists.ceph.com Subject: [ceph-users] Ceph OSDs cause kernel unresponsive Hi Cephers, We have encountered kernel hanging issue on our Ceph cluster. Just

Re: [ceph-users] OpenStack Keystone with RadosGW

2016-11-24 Thread Orit Wasserman
radosgw supports keystone v3 in Jewel. Can you give more details about the error? what is the exact command are you trying? radosgw log with debug_rgw=20 and debug_ms=5 will be most helpfull On Tue, Nov 22, 2016 at 10:24 AM, 한승진 wrote: > I've figured out the main reason is. >

[ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-24 Thread Craig Chi
Hi Cephers, We have encountered kernel hanging issue on our Ceph cluster. Just likehttp://imgur.com/a/U2Flz,http://imgur.com/a/lyEkoorhttp://imgur.com/a/IGXdu. We believed it is caused by out of memory, because we observed that when OSDs went crazy, the available memory of each node were

Re: [ceph-users] Introducing DeepSea: A tool for deploying Ceph using Salt

2016-11-24 Thread Tim Serong
On 11/12/2016 05:30 AM, Bill Sanders wrote: > I'm curious what the relationship is with python_ceph_cfg[0] and > DeepSea, which have some overlap in contributors and functionality (and > supporting organizations?). DeepSea and python-ceph-cfg look at ceph deployment from two different