[ceph-users] v12.0.1 Luminous (dev) released

2017-03-28 Thread Abhishek Lekshmanan

This is the second development checkpoint release of Luminous, the next
long term stable release.

Major changes from 12.0.0
-
* The original librados rados_objects_list_open (C) and objects_begin
  (C++) object listing API, deprecated in Hammer, has finally been
  removed.  Users of this interface must update their software to use
  either the rados_nobjects_list_open (C) and nobjects_begin (C++) API or
  the new rados_object_list_begin (C) and object_list_begin (C++) API
  before updating the client-side librados library to Luminous.

  Object enumeration (via any API) with the latest librados version
  and pre-Hammer OSDs is no longer supported.  Note that no in-tree
  Ceph services rely on object enumeration via the deprecated APIs, so
  only external librados users might be affected.

  The newest (and recommended) rados_object_list_begin (C) and
  object_list_begin (C++) API is only usable on clusters with the
  SORTBITWISE flag enabled (Jewel and later).  (Note that this flag is
  required to be set before upgrading beyond Jewel.)

* CephFS clients without the 'p' flag in their authentication capability
  string will no longer be able to set quotas or any layout fields.  This
  flag previously only restricted modification of the pool and namespace
  fields in layouts.

* The rados copy-get-classic operation has been removed since it has not 
been

  used by the OSD since before hammer.  It is unlikely any librados user is
  using this operation explicitly since there is also the more modern 
copy-get.


* The RGW api for getting object torrent has changed its params from 
'get_torrent'
  to 'torrent' so that it can be compatible with Amazon S3. Now the 
request for

  object torrent is like 'GET /ObjectName?torrent'.

See http://ceph.com/releases/v12-0-1-luminous-dev-released/ for a more
detailed changelog on this release, and thank you everyone for
contributing.

While we're fixing a few issues in the build system, the arm64 packages
for centos7 are, unfortunately, not available for this dev release.

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-12.0.1.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* For ceph-deploy, see 
http://docs.ceph.com/docs/master/install/install-ceph-deploy

* Release sha1: 5456408827a1a31690514342624a4ff9b66be1d5

--
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, 
HRB 21284 (AG Nürnberg)



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Modification Time of RBD Images

2017-03-28 Thread Dongsheng Yang

Jason, sorry for the typo of your email address in my last mail...

On 29/03/2017, 00:36, Jason Dillaman wrote:

While certainly that could be a feature that could be added to "rbd
info", it will take a while for this feature to reach full use since
it would rely on new versions of librbd / krbd.

Additionally, access and modified timestamps would require sending out
an update notification so that other clients notice the change. You
would also want to highly throttle any updates to the modification
timestamp -- rendering it a rough approximation of the true last
modification time. Finally, a client might not have access to update
an image when it opens it read-only --- rendering the last access
time, again, as a rough approximation.

IMHO, I think there are a lot of other, higher-priority backlog items
for RBD (and supporting services) [1] -- but I've added it to the
bottom of backlog.

[1] https://trello.com/b/ugTc2QFH/ceph-backlog


Yes, agree, let's focus on the other higher-priority backlog items now.

Thanx


On Fri, Mar 24, 2017 at 3:27 AM, Dongsheng Yang
 wrote:

Hi jason,

 do you think this is a good feature for rbd?
maybe we can implement a "rbd stat" command
to show atime, mtime and ctime of an image.

Yang


On 03/23/2017 08:36 PM, Christoph Adomeit wrote:

Hi,

no i did not enable the journalling feature since we do not use mirroring.


On Thu, Mar 23, 2017 at 08:10:05PM +0800, Dongsheng Yang wrote:

Did you enable the journaling feature?

On 03/23/2017 07:44 PM, Christoph Adomeit wrote:

Hi Yang,

I mean "any write" to this image.

I am sure we have a lot of not-used-anymore rbd images in our pool and I
am trying to identify them.

The mtime would be a good hint to show which images might be unused.

Christoph

On Thu, Mar 23, 2017 at 07:32:49PM +0800, Dongsheng Yang wrote:

Hi Christoph,

On 03/23/2017 07:16 PM, Christoph Adomeit wrote:

Hello List,

i am wondering if there is meanwhile an easy method in ceph to find
more information about rbd-images.

For example I am interested in the modification time of an rbd image.

Do you mean some metadata changing? such as resize?

Or any write to this image?

Thanx
Yang

I found some posts from 2015 that say we have to go over all the
objects of an rbd image and find the newest mtime put this is not a
preferred solution for me. It takes to much time and too many system
resources.

Any Ideas ?

Thanks
Christoph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com









___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS Read-Only state in production CephFS

2017-03-28 Thread John Spray
On Tue, Mar 28, 2017 at 8:44 PM, Brady Deetz  wrote:
> Thanks John. Since we're on 10.2.5, the mds package has a dependency on
> 10.2.6
>
> Do you feel it is safe to perform a cluster upgrade to 10.2.6 in this state?

Yes, shouldn't be an issue to upgrade the whole system to 10.2.6 while
you're at it.  Just make a mental note that the "10.2.6-1.gdf5ca2d" is
a different 10.2.6 than the official release.

I forget how picky the dependencies are, if they demand the *exact*
same version (including the trailing -1.gdf5ca2d) then I would just
use the candidate fix version for all the packages on the node where
you're running the MDS.

John

> [root@mds0 ceph-admin]# rpm -Uvh ceph-mds-10.2.6-1.gdf5ca2d.el7.x86_64.rpm
> error: Failed dependencies:
> ceph-base = 1:10.2.6-1.gdf5ca2d.el7 is needed by
> ceph-mds-1:10.2.6-1.gdf5ca2d.el7.x86_64
> ceph-mds = 1:10.2.5-0.el7 is needed by (installed)
> ceph-1:10.2.5-0.el7.x86_64
>
>
>
> On Tue, Mar 28, 2017 at 2:37 PM, John Spray  wrote:
>>
>> On Tue, Mar 28, 2017 at 7:12 PM, Brady Deetz  wrote:
>> > Thank you very much. I've located the directory that's layout is against
>> > that pool. I've dug around to attempt to create a pool with the same ID
>> > as
>> > the deleted one, but for fairly obvious reasons, that doesn't seem to
>> > exist.
>>
>> So there's a candidate fix on a branch called wip-19401-jewel, you can
>> see builds here:
>>
>> https://shaman.ceph.com/repos/ceph/wip-19401-jewel/df5ca2d8e3f930ddae5708c50c6495c03b3dc078/
>> -- click through to one of those and do "repo url" to get to some
>> built artifacts.
>>
>> Hopefully you're running one of centos 7, ubuntu xenial or ubuntu
>> trusty, and therefore one of those builds will work for you (use the
>> "default" variants rather than the "notcmalloc" variants) -- you
>> should only need to pick out the ceph-mds package rather than
>> upgrading everything.
>>
>> Cheers,
>> John
>>
>>
>> > On Tue, Mar 28, 2017 at 1:08 PM, John Spray  wrote:
>> >>
>> >> On Tue, Mar 28, 2017 at 6:45 PM, Brady Deetz  wrote:
>> >> > If I follow the recommendations of this doc, do you suspect we will
>> >> > recover?
>> >> >
>> >> > http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/
>> >>
>> >> You might, but it's overkill and introduces its own risks -- your
>> >> metadata isn't really corrupt, you're just hitting a bug in the
>> >> running code where it's overreacting.  I'm writing a patch now.
>> >>
>> >> John
>> >>
>> >>
>> >>
>> >>
>> >> > On Tue, Mar 28, 2017 at 12:37 PM, Brady Deetz 
>> >> > wrote:
>> >> >>
>> >> >> I did do that. We were experimenting with an ec backed pool on the
>> >> >> fs.
>> >> >> It
>> >> >> was stuck in an incomplete+creating state over night for only 128
>> >> >> pgs
>> >> >> so I
>> >> >> deleted the pool this morning. At the time of deletion, the only
>> >> >> issue
>> >> >> was
>> >> >> the stuck 128 pgs.
>> >> >>
>> >> >> On Tue, Mar 28, 2017 at 12:29 PM, John Spray 
>> >> >> wrote:
>> >> >>>
>> >> >>> Did you at some point add a new data pool to the filesystem, and
>> >> >>> then
>> >> >>> remove the pool?  With a little investigation I've found that the
>> >> >>> MDS
>> >> >>> currently doesn't handle that properly:
>> >> >>> http://tracker.ceph.com/issues/19401
>> >> >>>
>> >> >>> John
>> >> >>>
>> >> >>> On Tue, Mar 28, 2017 at 6:11 PM, John Spray 
>> >> >>> wrote:
>> >> >>> > On Tue, Mar 28, 2017 at 5:54 PM, Brady Deetz 
>> >> >>> > wrote:
>> >> >>> >> Running Jewel 10.2.5 on my production cephfs cluster and came
>> >> >>> >> into
>> >> >>> >> this ceph
>> >> >>> >> status
>> >> >>> >>
>> >> >>> >> [ceph-admin@mds1 brady]$ ceph status
>> >> >>> >> cluster 6f91f60c-7bc0-4aaa-a136-4a90851fbe10
>> >> >>> >>  health HEALTH_WARN
>> >> >>> >> mds0: Behind on trimming (2718/30)
>> >> >>> >> mds0: MDS in read-only mode
>> >> >>> >>  monmap e17: 5 mons at
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> {mon0=10.124.103.60:6789/0,mon1=10.124.103.61:6789/0,mon2=10.124.103.62:6789/0,osd2=10.124.103.72:6789/0,osd3=10.124.103.73:6789/0}
>> >> >>> >> election epoch 378, quorum 0,1,2,3,4
>> >> >>> >> mon0,mon1,mon2,osd2,osd3
>> >> >>> >>   fsmap e6817: 1/1/1 up {0=mds0=up:active}, 1 up:standby
>> >> >>> >>  osdmap e172126: 235 osds: 235 up, 235 in
>> >> >>> >> flags sortbitwise,require_jewel_osds
>> >> >>> >>   pgmap v18008949: 5696 pgs, 2 pools, 291 TB data, 112
>> >> >>> >> Mobjects
>> >> >>> >> 874 TB used, 407 TB / 1282 TB avail
>> >> >>> >> 5670 active+clean
>> >> >>> >>   13 active+clean+scrubbing+deep
>> >> >>> >>   13 active+clean+scrubbing
>> >> >>> >>   client io 760 B/s rd, 0 op/s rd, 0 op/s wr
>> >> >>> >>
>> >> >>> >> I've tried rebooting both mds servers. I've started a rolling

Re: [ceph-users] MDS Read-Only state in production CephFS

2017-03-28 Thread Brady Deetz
Thanks John. Since we're on 10.2.5, the mds package has a dependency on
10.2.6

Do you feel it is safe to perform a cluster upgrade to 10.2.6 in this state?

[root@mds0 ceph-admin]# rpm -Uvh ceph-mds-10.2.6-1.gdf5ca2d.el7.x86_64.rpm
error: Failed dependencies:
ceph-base = 1:10.2.6-1.gdf5ca2d.el7 is needed by
ceph-mds-1:10.2.6-1.gdf5ca2d.el7.x86_64
ceph-mds = 1:10.2.5-0.el7 is needed by (installed)
ceph-1:10.2.5-0.el7.x86_64


On Tue, Mar 28, 2017 at 2:37 PM, John Spray  wrote:

> On Tue, Mar 28, 2017 at 7:12 PM, Brady Deetz  wrote:
> > Thank you very much. I've located the directory that's layout is against
> > that pool. I've dug around to attempt to create a pool with the same ID
> as
> > the deleted one, but for fairly obvious reasons, that doesn't seem to
> exist.
>
> So there's a candidate fix on a branch called wip-19401-jewel, you can
> see builds here:
> https://shaman.ceph.com/repos/ceph/wip-19401-jewel/
> df5ca2d8e3f930ddae5708c50c6495c03b3dc078/
> -- click through to one of those and do "repo url" to get to some
> built artifacts.
>
> Hopefully you're running one of centos 7, ubuntu xenial or ubuntu
> trusty, and therefore one of those builds will work for you (use the
> "default" variants rather than the "notcmalloc" variants) -- you
> should only need to pick out the ceph-mds package rather than
> upgrading everything.
>
> Cheers,
> John
>
>
> > On Tue, Mar 28, 2017 at 1:08 PM, John Spray  wrote:
> >>
> >> On Tue, Mar 28, 2017 at 6:45 PM, Brady Deetz  wrote:
> >> > If I follow the recommendations of this doc, do you suspect we will
> >> > recover?
> >> >
> >> > http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/
> >>
> >> You might, but it's overkill and introduces its own risks -- your
> >> metadata isn't really corrupt, you're just hitting a bug in the
> >> running code where it's overreacting.  I'm writing a patch now.
> >>
> >> John
> >>
> >>
> >>
> >>
> >> > On Tue, Mar 28, 2017 at 12:37 PM, Brady Deetz 
> wrote:
> >> >>
> >> >> I did do that. We were experimenting with an ec backed pool on the
> fs.
> >> >> It
> >> >> was stuck in an incomplete+creating state over night for only 128 pgs
> >> >> so I
> >> >> deleted the pool this morning. At the time of deletion, the only
> issue
> >> >> was
> >> >> the stuck 128 pgs.
> >> >>
> >> >> On Tue, Mar 28, 2017 at 12:29 PM, John Spray 
> wrote:
> >> >>>
> >> >>> Did you at some point add a new data pool to the filesystem, and
> then
> >> >>> remove the pool?  With a little investigation I've found that the
> MDS
> >> >>> currently doesn't handle that properly:
> >> >>> http://tracker.ceph.com/issues/19401
> >> >>>
> >> >>> John
> >> >>>
> >> >>> On Tue, Mar 28, 2017 at 6:11 PM, John Spray 
> wrote:
> >> >>> > On Tue, Mar 28, 2017 at 5:54 PM, Brady Deetz 
> >> >>> > wrote:
> >> >>> >> Running Jewel 10.2.5 on my production cephfs cluster and came
> into
> >> >>> >> this ceph
> >> >>> >> status
> >> >>> >>
> >> >>> >> [ceph-admin@mds1 brady]$ ceph status
> >> >>> >> cluster 6f91f60c-7bc0-4aaa-a136-4a90851fbe10
> >> >>> >>  health HEALTH_WARN
> >> >>> >> mds0: Behind on trimming (2718/30)
> >> >>> >> mds0: MDS in read-only mode
> >> >>> >>  monmap e17: 5 mons at
> >> >>> >>
> >> >>> >>
> >> >>> >> {mon0=10.124.103.60:6789/0,mon1=10.124.103.61:6789/0,
> mon2=10.124.103.62:6789/0,osd2=10.124.103.72:6789/0,
> osd3=10.124.103.73:6789/0}
> >> >>> >> election epoch 378, quorum 0,1,2,3,4
> >> >>> >> mon0,mon1,mon2,osd2,osd3
> >> >>> >>   fsmap e6817: 1/1/1 up {0=mds0=up:active}, 1 up:standby
> >> >>> >>  osdmap e172126: 235 osds: 235 up, 235 in
> >> >>> >> flags sortbitwise,require_jewel_osds
> >> >>> >>   pgmap v18008949: 5696 pgs, 2 pools, 291 TB data, 112
> Mobjects
> >> >>> >> 874 TB used, 407 TB / 1282 TB avail
> >> >>> >> 5670 active+clean
> >> >>> >>   13 active+clean+scrubbing+deep
> >> >>> >>   13 active+clean+scrubbing
> >> >>> >>   client io 760 B/s rd, 0 op/s rd, 0 op/s wr
> >> >>> >>
> >> >>> >> I've tried rebooting both mds servers. I've started a rolling
> >> >>> >> reboot
> >> >>> >> across
> >> >>> >> all of my osd nodes, but each node takes about 10 minutes fully
> >> >>> >> rejoin. so
> >> >>> >> it's going to take a while. Any recommendations other than
> reboot?
> >> >>> >
> >> >>> > As it says in the log, your MDSs are going read only because of
> >> >>> > errors
> >> >>> > writing to the OSDs:
> >> >>> > 2017-03-28 08:04:12.379747 7f25ed0af700 -1 log_channel(cluster)
> log
> >> >>> > [ERR] : failed to store backtrace on ino 10003a398a6 object, pool
> >> >>> > 20,
> >> >>> > errno -2
> >> >>> >
> >> >>> > These messages are also scary and indicates that something has
> gone
> >> >>> > seriously wrong, either with the storage of the 

Re: [ceph-users] New hardware for OSDs

2017-03-28 Thread Nick Fisk
Hi Christian,

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Christian Balzer
> Sent: 28 March 2017 00:59
> To: ceph-users@lists.ceph.com
> Cc: Nick Fisk 
> Subject: Re: [ceph-users] New hardware for OSDs
> 
> 
> Hello,
> 
> On Mon, 27 Mar 2017 16:09:09 +0100 Nick Fisk wrote:
> 
> > > -Original Message-
> > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> > > Behalf Of Wido den Hollander
> > > Sent: 27 March 2017 12:35
> > > To: ceph-users@lists.ceph.com; Christian Balzer 
> > > Subject: Re: [ceph-users] New hardware for OSDs
> > >
> > >
> > > > Op 27 maart 2017 om 13:22 schreef Christian Balzer :
> > > >
> > > >
> > > >
> > > > Hello,
> > > >
> > > > On Mon, 27 Mar 2017 12:27:40 +0200 Mattia Belluco wrote:
> > > >
> > > > > Hello all,
> > > > > we are currently in the process of buying new hardware to expand
> > > > > an existing Ceph cluster that already has 1200 osds.
> > > >
> > > > That's quite sizable, is the expansion driven by the need for more
> > > > space (big data?) or to increase IOPS (or both)?
> > > >
> > > > > We are currently using 24 * 4 TB SAS drives per osd with an SSD
> > > > > journal shared among 4 osds. For the upcoming expansion we were
> > > > > thinking of switching to either 6 or 8 TB hard drives (9 or 12
> > > > > per
> > > > > host) in order to drive down space and cost requirements.
> > > > >
> > > > > Has anyone any experience in mid-sized/large-sized deployment
> > > > > using such hard drives? Our main concern is the rebalance time
> > > > > but we might be overlooking some other aspects.
> > > > >
> > > >
> > > > If you researched the ML archives, you should already know to stay
> > > > well away from SMR HDDs.
> > > >
> > >
> > > Amen! Just don't. Stay away from SMR with Ceph.
> > >
> > > > Both HGST and Seagate have large Enterprise HDDs that have
> > > > journals/caches (MediaCache in HGST speak IIRC) that drastically
> > > > improve write IOPS compared to plain HDDs.
> > > > Even with SSD journals you will want to consider those, as these
> > > > new HDDs will see at least twice the action than your current ones.
> > > >
> >
> > I've got a mixture of WD Red Pro 6TB and HGST He8 8TB drives. Recovery
> > for ~70% full disks takes around 3-4 hours, this is for a cluster
> > containing 60 OSD's. I'm usually seeing recovery speeds up around 1GB/s
> or more.
> >
> Good data point.
> 
> How busy is your cluster at those times, client I/O impact?

Its normally around 20-30% busy through most parts of the day. No real
impact to client IO. Its backup data, so buffered IO coming in via wan
circuit.

> 
> > Depends on your workload, mine is for archiving/backups so big disks
> > are a must. I wouldn't recommend using them for more active workloads
> > unless you are planning a beefy cache tier or some other sort of caching
> solution.
> >
> > The He8 (and He10) drives also use a fair bit less power due to less
> > friction, but I think this only applies to the sata model. My 12x3.5
> > 8TB node with CPU...etc uses ~140W at idle. Hoping to get this down
> > further with a new Xeon-D design on next expansion phase.
> >
> > The only thing I will say about big disks is beware of cold FS
> > inodes/dentry's and PG splitting. The former isn't a problem if you
> > will only be actively accessing a small portion of your data, but I
> > see increases in latency if I access cold data even with VFS cache
pressure
> set to 1.
> > Currently investigating using bcache under the OSD to try and cache
this.
> >
> 
> I've seen this kind of behavior on my (non-Ceph) mailbox servers.
> As in, the maximum SLAB space may not be large enough to hold all inodes
> or the pagecache will eat into it over time when not constantly
referenced,
> despite cache pressure settings.
> 
> > PG splitting becomes a problem when the disks start to fill up,
> > playing with the split/merge thresholds may help, but you have to be
> > careful you don't end up with massive splits when they do finally
> > happen, as otherwise OSD's start timing out.
> >
> Getting this right (and predictable) is one of the darker arts with Ceph.
> OTOH it will go away with Bluestore (just to be replaced by other oddities
no
> doubt).
> 
> > >
> > > I also have good experiences with bcache on NVM-E device in Ceph
> clusters.
> > > A single Intel P3600/P3700 which is the caching device for bcache.
> > >
> > > > Rebalance time is a concern of course, especially if your cluster
> > > > like most HDD based ones has these things throttled down to not
> > > > impede actual client I/O.
> > > >
> > > > To get a rough idea, take a look at:
> > > > https://www.memset.com/tools/raid-calculator/
> > > >
> > > > For Ceph with replication 3 and the typical PG distribution,
> > > > assume
> > > > 100 disks and the RAID6 with hotspares numbers are relevant.
> > > > For rebuild speed, consult your experience, you must have had 

Re: [ceph-users] MDS Read-Only state in production CephFS

2017-03-28 Thread John Spray
On Tue, Mar 28, 2017 at 7:12 PM, Brady Deetz  wrote:
> Thank you very much. I've located the directory that's layout is against
> that pool. I've dug around to attempt to create a pool with the same ID as
> the deleted one, but for fairly obvious reasons, that doesn't seem to exist.

So there's a candidate fix on a branch called wip-19401-jewel, you can
see builds here:
https://shaman.ceph.com/repos/ceph/wip-19401-jewel/df5ca2d8e3f930ddae5708c50c6495c03b3dc078/
-- click through to one of those and do "repo url" to get to some
built artifacts.

Hopefully you're running one of centos 7, ubuntu xenial or ubuntu
trusty, and therefore one of those builds will work for you (use the
"default" variants rather than the "notcmalloc" variants) -- you
should only need to pick out the ceph-mds package rather than
upgrading everything.

Cheers,
John


> On Tue, Mar 28, 2017 at 1:08 PM, John Spray  wrote:
>>
>> On Tue, Mar 28, 2017 at 6:45 PM, Brady Deetz  wrote:
>> > If I follow the recommendations of this doc, do you suspect we will
>> > recover?
>> >
>> > http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/
>>
>> You might, but it's overkill and introduces its own risks -- your
>> metadata isn't really corrupt, you're just hitting a bug in the
>> running code where it's overreacting.  I'm writing a patch now.
>>
>> John
>>
>>
>>
>>
>> > On Tue, Mar 28, 2017 at 12:37 PM, Brady Deetz  wrote:
>> >>
>> >> I did do that. We were experimenting with an ec backed pool on the fs.
>> >> It
>> >> was stuck in an incomplete+creating state over night for only 128 pgs
>> >> so I
>> >> deleted the pool this morning. At the time of deletion, the only issue
>> >> was
>> >> the stuck 128 pgs.
>> >>
>> >> On Tue, Mar 28, 2017 at 12:29 PM, John Spray  wrote:
>> >>>
>> >>> Did you at some point add a new data pool to the filesystem, and then
>> >>> remove the pool?  With a little investigation I've found that the MDS
>> >>> currently doesn't handle that properly:
>> >>> http://tracker.ceph.com/issues/19401
>> >>>
>> >>> John
>> >>>
>> >>> On Tue, Mar 28, 2017 at 6:11 PM, John Spray  wrote:
>> >>> > On Tue, Mar 28, 2017 at 5:54 PM, Brady Deetz 
>> >>> > wrote:
>> >>> >> Running Jewel 10.2.5 on my production cephfs cluster and came into
>> >>> >> this ceph
>> >>> >> status
>> >>> >>
>> >>> >> [ceph-admin@mds1 brady]$ ceph status
>> >>> >> cluster 6f91f60c-7bc0-4aaa-a136-4a90851fbe10
>> >>> >>  health HEALTH_WARN
>> >>> >> mds0: Behind on trimming (2718/30)
>> >>> >> mds0: MDS in read-only mode
>> >>> >>  monmap e17: 5 mons at
>> >>> >>
>> >>> >>
>> >>> >> {mon0=10.124.103.60:6789/0,mon1=10.124.103.61:6789/0,mon2=10.124.103.62:6789/0,osd2=10.124.103.72:6789/0,osd3=10.124.103.73:6789/0}
>> >>> >> election epoch 378, quorum 0,1,2,3,4
>> >>> >> mon0,mon1,mon2,osd2,osd3
>> >>> >>   fsmap e6817: 1/1/1 up {0=mds0=up:active}, 1 up:standby
>> >>> >>  osdmap e172126: 235 osds: 235 up, 235 in
>> >>> >> flags sortbitwise,require_jewel_osds
>> >>> >>   pgmap v18008949: 5696 pgs, 2 pools, 291 TB data, 112 Mobjects
>> >>> >> 874 TB used, 407 TB / 1282 TB avail
>> >>> >> 5670 active+clean
>> >>> >>   13 active+clean+scrubbing+deep
>> >>> >>   13 active+clean+scrubbing
>> >>> >>   client io 760 B/s rd, 0 op/s rd, 0 op/s wr
>> >>> >>
>> >>> >> I've tried rebooting both mds servers. I've started a rolling
>> >>> >> reboot
>> >>> >> across
>> >>> >> all of my osd nodes, but each node takes about 10 minutes fully
>> >>> >> rejoin. so
>> >>> >> it's going to take a while. Any recommendations other than reboot?
>> >>> >
>> >>> > As it says in the log, your MDSs are going read only because of
>> >>> > errors
>> >>> > writing to the OSDs:
>> >>> > 2017-03-28 08:04:12.379747 7f25ed0af700 -1 log_channel(cluster) log
>> >>> > [ERR] : failed to store backtrace on ino 10003a398a6 object, pool
>> >>> > 20,
>> >>> > errno -2
>> >>> >
>> >>> > These messages are also scary and indicates that something has gone
>> >>> > seriously wrong, either with the storage of the metadata or
>> >>> > internally
>> >>> > with the MDS:
>> >>> > 2017-03-28 08:04:12.251543 7f25ef2b5700 -1 log_channel(cluster) log
>> >>> > [ERR] : bad/negative dir size on 608 f(v9 m2017-03-28
>> >>> > 07:56:45.803267
>> >>> > -223=-221+-2)
>> >>> > 2017-03-28 08:04:12.251564 7f25ef2b5700 -1 log_channel(cluster) log
>> >>> > [ERR] : unmatched fragstat on 608, inode has f(v10 m2017-03-28
>> >>> > 07:56:45.803267 -223=-221+-2), dirfrags have f(v0 m2017-03-28
>> >>> > 07:56:45.803267)
>> >>> >
>> >>> > The case that I know of that causes ENOENT on object writes is when
>> >>> > the pool no longer exists.  You can set "debug objecter = 10" on the
>> >>> > MDS and look for a message like "check_op_pool_dne tid 
>> >>> > concluding pool  dne".
>> >>> >

Re: [ceph-users] MDS Read-Only state in production CephFS

2017-03-28 Thread Brady Deetz
Thank you very much. I've located the directory that's layout is against
that pool. I've dug around to attempt to create a pool with the same ID as
the deleted one, but for fairly obvious reasons, that doesn't seem to exist.

On Tue, Mar 28, 2017 at 1:08 PM, John Spray  wrote:

> On Tue, Mar 28, 2017 at 6:45 PM, Brady Deetz  wrote:
> > If I follow the recommendations of this doc, do you suspect we will
> recover?
> >
> > http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/
>
> You might, but it's overkill and introduces its own risks -- your
> metadata isn't really corrupt, you're just hitting a bug in the
> running code where it's overreacting.  I'm writing a patch now.
>
> John
>
>
>
>
> > On Tue, Mar 28, 2017 at 12:37 PM, Brady Deetz  wrote:
> >>
> >> I did do that. We were experimenting with an ec backed pool on the fs.
> It
> >> was stuck in an incomplete+creating state over night for only 128 pgs
> so I
> >> deleted the pool this morning. At the time of deletion, the only issue
> was
> >> the stuck 128 pgs.
> >>
> >> On Tue, Mar 28, 2017 at 12:29 PM, John Spray  wrote:
> >>>
> >>> Did you at some point add a new data pool to the filesystem, and then
> >>> remove the pool?  With a little investigation I've found that the MDS
> >>> currently doesn't handle that properly:
> >>> http://tracker.ceph.com/issues/19401
> >>>
> >>> John
> >>>
> >>> On Tue, Mar 28, 2017 at 6:11 PM, John Spray  wrote:
> >>> > On Tue, Mar 28, 2017 at 5:54 PM, Brady Deetz 
> wrote:
> >>> >> Running Jewel 10.2.5 on my production cephfs cluster and came into
> >>> >> this ceph
> >>> >> status
> >>> >>
> >>> >> [ceph-admin@mds1 brady]$ ceph status
> >>> >> cluster 6f91f60c-7bc0-4aaa-a136-4a90851fbe10
> >>> >>  health HEALTH_WARN
> >>> >> mds0: Behind on trimming (2718/30)
> >>> >> mds0: MDS in read-only mode
> >>> >>  monmap e17: 5 mons at
> >>> >>
> >>> >> {mon0=10.124.103.60:6789/0,mon1=10.124.103.61:6789/0,
> mon2=10.124.103.62:6789/0,osd2=10.124.103.72:6789/0,
> osd3=10.124.103.73:6789/0}
> >>> >> election epoch 378, quorum 0,1,2,3,4
> >>> >> mon0,mon1,mon2,osd2,osd3
> >>> >>   fsmap e6817: 1/1/1 up {0=mds0=up:active}, 1 up:standby
> >>> >>  osdmap e172126: 235 osds: 235 up, 235 in
> >>> >> flags sortbitwise,require_jewel_osds
> >>> >>   pgmap v18008949: 5696 pgs, 2 pools, 291 TB data, 112 Mobjects
> >>> >> 874 TB used, 407 TB / 1282 TB avail
> >>> >> 5670 active+clean
> >>> >>   13 active+clean+scrubbing+deep
> >>> >>   13 active+clean+scrubbing
> >>> >>   client io 760 B/s rd, 0 op/s rd, 0 op/s wr
> >>> >>
> >>> >> I've tried rebooting both mds servers. I've started a rolling reboot
> >>> >> across
> >>> >> all of my osd nodes, but each node takes about 10 minutes fully
> >>> >> rejoin. so
> >>> >> it's going to take a while. Any recommendations other than reboot?
> >>> >
> >>> > As it says in the log, your MDSs are going read only because of
> errors
> >>> > writing to the OSDs:
> >>> > 2017-03-28 08:04:12.379747 7f25ed0af700 -1 log_channel(cluster) log
> >>> > [ERR] : failed to store backtrace on ino 10003a398a6 object, pool 20,
> >>> > errno -2
> >>> >
> >>> > These messages are also scary and indicates that something has gone
> >>> > seriously wrong, either with the storage of the metadata or
> internally
> >>> > with the MDS:
> >>> > 2017-03-28 08:04:12.251543 7f25ef2b5700 -1 log_channel(cluster) log
> >>> > [ERR] : bad/negative dir size on 608 f(v9 m2017-03-28 07:56:45.803267
> >>> > -223=-221+-2)
> >>> > 2017-03-28 08:04:12.251564 7f25ef2b5700 -1 log_channel(cluster) log
> >>> > [ERR] : unmatched fragstat on 608, inode has f(v10 m2017-03-28
> >>> > 07:56:45.803267 -223=-221+-2), dirfrags have f(v0 m2017-03-28
> >>> > 07:56:45.803267)
> >>> >
> >>> > The case that I know of that causes ENOENT on object writes is when
> >>> > the pool no longer exists.  You can set "debug objecter = 10" on the
> >>> > MDS and look for a message like "check_op_pool_dne tid 
> >>> > concluding pool  dne".
> >>> >
> >>> > Otherwise, go look at the OSD logs from the timestamp where the
> failed
> >>> > write is happening to see if there's anything there.
> >>> >
> >>> > John
> >>> >
> >>> >
> >>> >
> >>> >>
> >>> >> Attached are my mds logs during the failure.
> >>> >>
> >>> >> Any ideas?
> >>> >>
> >>> >> ___
> >>> >> ceph-users mailing list
> >>> >> ceph-users@lists.ceph.com
> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >>
> >>
> >>
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS Read-Only state in production CephFS

2017-03-28 Thread John Spray
On Tue, Mar 28, 2017 at 6:45 PM, Brady Deetz  wrote:
> If I follow the recommendations of this doc, do you suspect we will recover?
>
> http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/

You might, but it's overkill and introduces its own risks -- your
metadata isn't really corrupt, you're just hitting a bug in the
running code where it's overreacting.  I'm writing a patch now.

John




> On Tue, Mar 28, 2017 at 12:37 PM, Brady Deetz  wrote:
>>
>> I did do that. We were experimenting with an ec backed pool on the fs. It
>> was stuck in an incomplete+creating state over night for only 128 pgs so I
>> deleted the pool this morning. At the time of deletion, the only issue was
>> the stuck 128 pgs.
>>
>> On Tue, Mar 28, 2017 at 12:29 PM, John Spray  wrote:
>>>
>>> Did you at some point add a new data pool to the filesystem, and then
>>> remove the pool?  With a little investigation I've found that the MDS
>>> currently doesn't handle that properly:
>>> http://tracker.ceph.com/issues/19401
>>>
>>> John
>>>
>>> On Tue, Mar 28, 2017 at 6:11 PM, John Spray  wrote:
>>> > On Tue, Mar 28, 2017 at 5:54 PM, Brady Deetz  wrote:
>>> >> Running Jewel 10.2.5 on my production cephfs cluster and came into
>>> >> this ceph
>>> >> status
>>> >>
>>> >> [ceph-admin@mds1 brady]$ ceph status
>>> >> cluster 6f91f60c-7bc0-4aaa-a136-4a90851fbe10
>>> >>  health HEALTH_WARN
>>> >> mds0: Behind on trimming (2718/30)
>>> >> mds0: MDS in read-only mode
>>> >>  monmap e17: 5 mons at
>>> >>
>>> >> {mon0=10.124.103.60:6789/0,mon1=10.124.103.61:6789/0,mon2=10.124.103.62:6789/0,osd2=10.124.103.72:6789/0,osd3=10.124.103.73:6789/0}
>>> >> election epoch 378, quorum 0,1,2,3,4
>>> >> mon0,mon1,mon2,osd2,osd3
>>> >>   fsmap e6817: 1/1/1 up {0=mds0=up:active}, 1 up:standby
>>> >>  osdmap e172126: 235 osds: 235 up, 235 in
>>> >> flags sortbitwise,require_jewel_osds
>>> >>   pgmap v18008949: 5696 pgs, 2 pools, 291 TB data, 112 Mobjects
>>> >> 874 TB used, 407 TB / 1282 TB avail
>>> >> 5670 active+clean
>>> >>   13 active+clean+scrubbing+deep
>>> >>   13 active+clean+scrubbing
>>> >>   client io 760 B/s rd, 0 op/s rd, 0 op/s wr
>>> >>
>>> >> I've tried rebooting both mds servers. I've started a rolling reboot
>>> >> across
>>> >> all of my osd nodes, but each node takes about 10 minutes fully
>>> >> rejoin. so
>>> >> it's going to take a while. Any recommendations other than reboot?
>>> >
>>> > As it says in the log, your MDSs are going read only because of errors
>>> > writing to the OSDs:
>>> > 2017-03-28 08:04:12.379747 7f25ed0af700 -1 log_channel(cluster) log
>>> > [ERR] : failed to store backtrace on ino 10003a398a6 object, pool 20,
>>> > errno -2
>>> >
>>> > These messages are also scary and indicates that something has gone
>>> > seriously wrong, either with the storage of the metadata or internally
>>> > with the MDS:
>>> > 2017-03-28 08:04:12.251543 7f25ef2b5700 -1 log_channel(cluster) log
>>> > [ERR] : bad/negative dir size on 608 f(v9 m2017-03-28 07:56:45.803267
>>> > -223=-221+-2)
>>> > 2017-03-28 08:04:12.251564 7f25ef2b5700 -1 log_channel(cluster) log
>>> > [ERR] : unmatched fragstat on 608, inode has f(v10 m2017-03-28
>>> > 07:56:45.803267 -223=-221+-2), dirfrags have f(v0 m2017-03-28
>>> > 07:56:45.803267)
>>> >
>>> > The case that I know of that causes ENOENT on object writes is when
>>> > the pool no longer exists.  You can set "debug objecter = 10" on the
>>> > MDS and look for a message like "check_op_pool_dne tid 
>>> > concluding pool  dne".
>>> >
>>> > Otherwise, go look at the OSD logs from the timestamp where the failed
>>> > write is happening to see if there's anything there.
>>> >
>>> > John
>>> >
>>> >
>>> >
>>> >>
>>> >> Attached are my mds logs during the failure.
>>> >>
>>> >> Any ideas?
>>> >>
>>> >> ___
>>> >> ceph-users mailing list
>>> >> ceph-users@lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >>
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS Read-Only state in production CephFS

2017-03-28 Thread John Spray
Did you at some point add a new data pool to the filesystem, and then
remove the pool?  With a little investigation I've found that the MDS
currently doesn't handle that properly:
http://tracker.ceph.com/issues/19401

John

On Tue, Mar 28, 2017 at 6:11 PM, John Spray  wrote:
> On Tue, Mar 28, 2017 at 5:54 PM, Brady Deetz  wrote:
>> Running Jewel 10.2.5 on my production cephfs cluster and came into this ceph
>> status
>>
>> [ceph-admin@mds1 brady]$ ceph status
>> cluster 6f91f60c-7bc0-4aaa-a136-4a90851fbe10
>>  health HEALTH_WARN
>> mds0: Behind on trimming (2718/30)
>> mds0: MDS in read-only mode
>>  monmap e17: 5 mons at
>> {mon0=10.124.103.60:6789/0,mon1=10.124.103.61:6789/0,mon2=10.124.103.62:6789/0,osd2=10.124.103.72:6789/0,osd3=10.124.103.73:6789/0}
>> election epoch 378, quorum 0,1,2,3,4 mon0,mon1,mon2,osd2,osd3
>>   fsmap e6817: 1/1/1 up {0=mds0=up:active}, 1 up:standby
>>  osdmap e172126: 235 osds: 235 up, 235 in
>> flags sortbitwise,require_jewel_osds
>>   pgmap v18008949: 5696 pgs, 2 pools, 291 TB data, 112 Mobjects
>> 874 TB used, 407 TB / 1282 TB avail
>> 5670 active+clean
>>   13 active+clean+scrubbing+deep
>>   13 active+clean+scrubbing
>>   client io 760 B/s rd, 0 op/s rd, 0 op/s wr
>>
>> I've tried rebooting both mds servers. I've started a rolling reboot across
>> all of my osd nodes, but each node takes about 10 minutes fully rejoin. so
>> it's going to take a while. Any recommendations other than reboot?
>
> As it says in the log, your MDSs are going read only because of errors
> writing to the OSDs:
> 2017-03-28 08:04:12.379747 7f25ed0af700 -1 log_channel(cluster) log
> [ERR] : failed to store backtrace on ino 10003a398a6 object, pool 20,
> errno -2
>
> These messages are also scary and indicates that something has gone
> seriously wrong, either with the storage of the metadata or internally
> with the MDS:
> 2017-03-28 08:04:12.251543 7f25ef2b5700 -1 log_channel(cluster) log
> [ERR] : bad/negative dir size on 608 f(v9 m2017-03-28 07:56:45.803267
> -223=-221+-2)
> 2017-03-28 08:04:12.251564 7f25ef2b5700 -1 log_channel(cluster) log
> [ERR] : unmatched fragstat on 608, inode has f(v10 m2017-03-28
> 07:56:45.803267 -223=-221+-2), dirfrags have f(v0 m2017-03-28
> 07:56:45.803267)
>
> The case that I know of that causes ENOENT on object writes is when
> the pool no longer exists.  You can set "debug objecter = 10" on the
> MDS and look for a message like "check_op_pool_dne tid 
> concluding pool  dne".
>
> Otherwise, go look at the OSD logs from the timestamp where the failed
> write is happening to see if there's anything there.
>
> John
>
>
>
>>
>> Attached are my mds logs during the failure.
>>
>> Any ideas?
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS Read-Only state in production CephFS

2017-03-28 Thread John Spray
On Tue, Mar 28, 2017 at 5:54 PM, Brady Deetz  wrote:
> Running Jewel 10.2.5 on my production cephfs cluster and came into this ceph
> status
>
> [ceph-admin@mds1 brady]$ ceph status
> cluster 6f91f60c-7bc0-4aaa-a136-4a90851fbe10
>  health HEALTH_WARN
> mds0: Behind on trimming (2718/30)
> mds0: MDS in read-only mode
>  monmap e17: 5 mons at
> {mon0=10.124.103.60:6789/0,mon1=10.124.103.61:6789/0,mon2=10.124.103.62:6789/0,osd2=10.124.103.72:6789/0,osd3=10.124.103.73:6789/0}
> election epoch 378, quorum 0,1,2,3,4 mon0,mon1,mon2,osd2,osd3
>   fsmap e6817: 1/1/1 up {0=mds0=up:active}, 1 up:standby
>  osdmap e172126: 235 osds: 235 up, 235 in
> flags sortbitwise,require_jewel_osds
>   pgmap v18008949: 5696 pgs, 2 pools, 291 TB data, 112 Mobjects
> 874 TB used, 407 TB / 1282 TB avail
> 5670 active+clean
>   13 active+clean+scrubbing+deep
>   13 active+clean+scrubbing
>   client io 760 B/s rd, 0 op/s rd, 0 op/s wr
>
> I've tried rebooting both mds servers. I've started a rolling reboot across
> all of my osd nodes, but each node takes about 10 minutes fully rejoin. so
> it's going to take a while. Any recommendations other than reboot?

As it says in the log, your MDSs are going read only because of errors
writing to the OSDs:
2017-03-28 08:04:12.379747 7f25ed0af700 -1 log_channel(cluster) log
[ERR] : failed to store backtrace on ino 10003a398a6 object, pool 20,
errno -2

These messages are also scary and indicates that something has gone
seriously wrong, either with the storage of the metadata or internally
with the MDS:
2017-03-28 08:04:12.251543 7f25ef2b5700 -1 log_channel(cluster) log
[ERR] : bad/negative dir size on 608 f(v9 m2017-03-28 07:56:45.803267
-223=-221+-2)
2017-03-28 08:04:12.251564 7f25ef2b5700 -1 log_channel(cluster) log
[ERR] : unmatched fragstat on 608, inode has f(v10 m2017-03-28
07:56:45.803267 -223=-221+-2), dirfrags have f(v0 m2017-03-28
07:56:45.803267)

The case that I know of that causes ENOENT on object writes is when
the pool no longer exists.  You can set "debug objecter = 10" on the
MDS and look for a message like "check_op_pool_dne tid 
concluding pool  dne".

Otherwise, go look at the OSD logs from the timestamp where the failed
write is happening to see if there's anything there.

John



>
> Attached are my mds logs during the failure.
>
> Any ideas?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS Read-Only state in production CephFS

2017-03-28 Thread Brady Deetz
Running Jewel 10.2.5 on my production cephfs cluster and came into this
ceph status

[ceph-admin@mds1 brady]$ ceph status
cluster 6f91f60c-7bc0-4aaa-a136-4a90851fbe10
 health HEALTH_WARN
mds0: Behind on trimming (2718/30)
mds0: MDS in read-only mode
 monmap e17: 5 mons at {mon0=
10.124.103.60:6789/0,mon1=10.124.103.61:6789/0,mon2=10.124.103.62:6789/0,osd2=10.124.103.72:6789/0,osd3=10.124.103.73:6789/0
}
election epoch 378, quorum 0,1,2,3,4 mon0,mon1,mon2,osd2,osd3
  fsmap e6817: 1/1/1 up {0=mds0=up:active}, 1 up:standby
 osdmap e172126: 235 osds: 235 up, 235 in
flags sortbitwise,require_jewel_osds
  pgmap v18008949: 5696 pgs, 2 pools, 291 TB data, 112 Mobjects
874 TB used, 407 TB / 1282 TB avail
5670 active+clean
  13 active+clean+scrubbing+deep
  13 active+clean+scrubbing
  client io 760 B/s rd, 0 op/s rd, 0 op/s wr

I've tried rebooting both mds servers. I've started a rolling reboot across
all of my osd nodes, but each node takes about 10 minutes fully rejoin. so
it's going to take a while. Any recommendations other than reboot?

Attached are my mds logs during the failure.

Any ideas?


mds0
Description: Binary data


mds1
Description: Binary data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Modification Time of RBD Images

2017-03-28 Thread Jason Dillaman
While certainly that could be a feature that could be added to "rbd
info", it will take a while for this feature to reach full use since
it would rely on new versions of librbd / krbd.

Additionally, access and modified timestamps would require sending out
an update notification so that other clients notice the change. You
would also want to highly throttle any updates to the modification
timestamp -- rendering it a rough approximation of the true last
modification time. Finally, a client might not have access to update
an image when it opens it read-only --- rendering the last access
time, again, as a rough approximation.

IMHO, I think there are a lot of other, higher-priority backlog items
for RBD (and supporting services) [1] -- but I've added it to the
bottom of backlog.

[1] https://trello.com/b/ugTc2QFH/ceph-backlog

On Fri, Mar 24, 2017 at 3:27 AM, Dongsheng Yang
 wrote:
> Hi jason,
>
> do you think this is a good feature for rbd?
> maybe we can implement a "rbd stat" command
> to show atime, mtime and ctime of an image.
>
> Yang
>
>
> On 03/23/2017 08:36 PM, Christoph Adomeit wrote:
>>
>> Hi,
>>
>> no i did not enable the journalling feature since we do not use mirroring.
>>
>>
>> On Thu, Mar 23, 2017 at 08:10:05PM +0800, Dongsheng Yang wrote:
>>>
>>> Did you enable the journaling feature?
>>>
>>> On 03/23/2017 07:44 PM, Christoph Adomeit wrote:

 Hi Yang,

 I mean "any write" to this image.

 I am sure we have a lot of not-used-anymore rbd images in our pool and I
 am trying to identify them.

 The mtime would be a good hint to show which images might be unused.

 Christoph

 On Thu, Mar 23, 2017 at 07:32:49PM +0800, Dongsheng Yang wrote:
>
> Hi Christoph,
>
> On 03/23/2017 07:16 PM, Christoph Adomeit wrote:
>>
>> Hello List,
>>
>> i am wondering if there is meanwhile an easy method in ceph to find
>> more information about rbd-images.
>>
>> For example I am interested in the modification time of an rbd image.
>
> Do you mean some metadata changing? such as resize?
>
> Or any write to this image?
>
> Thanx
> Yang
>>
>> I found some posts from 2015 that say we have to go over all the
>> objects of an rbd image and find the newest mtime put this is not a
>> preferred solution for me. It takes to much time and too many system
>> resources.
>>
>> Any Ideas ?
>>
>> Thanks
>>Christoph
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions on rbd-mirror

2017-03-28 Thread Jason Dillaman
This is something we have talked about in the past -- and I think it
would be a good addition. The caveat is that if we would now have
config-level, per-pool level, per-image level (via rbd image-meta),
and command-line/environment variable configuration overrides.

I think there needs to be clear tooling in the rbd CLI to dump RBD
configuration overrides and the source of those overrides -- and it
needs to be well documented.

On Mon, Mar 27, 2017 at 9:31 AM, Dongsheng Yang
 wrote:
> Jason,
> do you think it's good idea to introduce a rbd_config object to
> record some configurations of per-pool, such as default_features.
>
> That means, we can set some configurations differently in different
> pool. In this way, we can also handle the per-pool setting in rbd-mirror.
>
> Thanx
> Yang
>
>
> On 27/03/2017, 21:20, Jason Dillaman wrote:
>>
>> On Mon, Mar 27, 2017 at 4:00 AM, Dongsheng Yang
>>  wrote:
>>>
>>> Hi Fulvio,
>>>
>>> On 03/24/2017 07:19 PM, Fulvio Galeazzi wrote:
>>>
>>> Hallo, apologies for my (silly) questions, I did try to find some doc on
>>> rbd-mirror but was unable to, apart from a number of pages explaining how
>>> to
>>> install it.
>>>
>>> My environment is CenOS7 and Ceph 10.2.5.
>>>
>>> Can anyone help me understand a few minor things:
>>>
>>>   - is there a cleaner way to configure the user which will be used for
>>> rbd-mirror, other than editing the ExecStart in file
>>> /usr/lib/systemd/system/ceph-rbd-mirror@.service ?
>>> For example some line in ceph.conf... looks like the username
>>> defaults to the cluster name, am I right?
>>>
>>>
>>> It should just be "ceph", no matter what the cluster name is, if I read
>>> the
>>> code correctly.
>>
>> The user id is passed in via the systemd instance name. For example,
>> if you wanted to use the "mirror" user id to connect to the local
>> cluster, you would run "systemctl enable ceph-rbd-mirror@mirror".
>>
>>>   - is it possible to throttle mirroring? Sure, it's a crazy thing to do
>>> for "cinder" pools, but may make sense for slowly changing ones, like
>>> a "glance" pool.
>>>
>>>
>>> The rbd core team is working on this. Jason, right?
>>
>> This is in our backlog of desired items for the rbd-mirror daemon.
>> Having different settings for different pools was not in our original
>> plan, but this is something that also came up during the Vault
>> conference last week. I've added an additional backlog item to cover
>> per-pool settings.
>>
>>>   - is it possible to set per-pool default features? I read about
>>>  "rbd default features = ###"
>>> but this is a global setting. (Ok, I can still restrict pools to be
>>> mirrored with "ceph auth" for the user doing mirroring)
>>>
>>>
>>> "per-pool default features" sounds like a reasonable feature request.
>>>
>>> About the "ceph auth" for mirroring, I am working on a rbd acl design,
>>> will consider pool-level, namespace-level and image-level. Then I think
>>> we can do a permission check on this.
>>
>> Right now, the best way to achieve that is by using different configs
>> / user ids for different services. For example, if OpenStack glance
>> used "glance" and cinder user "cinder", the ceph.conf's
>> "[client.glance]" section could have different default features as
>> compared to a "[client.cinder]" section.
>>
>>> Thanx
>>> Yang
>>>
>>>
>>>
>>>Thanks!
>>>
>>>  Fulvio
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-28 Thread Brian Andrus
Just adding some anecdotal input. It likely won't be ultimately helpful
other than a +1..

Seemingly, we also have the same issue since enabling exclusive-lock on
images. We experienced these messages at a large scale when making a CRUSH
map change a few weeks ago that resulted in many many VMs experiencing the
blocked task kernel messages, requiring reboots.

We've since disabled on all images we can, but there are still jewel-era
instances that cannot have the feature disabled. Since disabling the
feature, I have not observed any cases of blocked tasks, but so far given
the limited timeframe I'd consider that anecdotal.


On Mon, Mar 27, 2017 at 12:31 PM, Hall, Eric 
wrote:

> In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel),
> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and
> ceph hosts, we occasionally see hung processes (usually during boot, but
> otherwise as well), with errors reported in the instance logs as shown
> below.  Configuration is vanilla, based on openstack/ceph docs.
>
> Neither the compute hosts nor the ceph hosts appear to be overloaded in
> terms of memory or network bandwidth, none of the 67 osds are over 80%
> full, nor do any of them appear to be overwhelmed in terms of IO.  Compute
> hosts and ceph cluster are connected via a relatively quiet 1Gb network,
> with an IBoE net between the ceph nodes.  Neither network appears
> overloaded.
>
> I don’t see any related (to my eye) errors in client or server logs, even
> with 20/20 logging from various components (rbd, rados, client,
> objectcacher, etc.)  I’ve increased the qemu file descriptor limit
> (currently 64k... overkill for sure.)
>
> I “feels” like a performance problem, but I can’t find any capacity issues
> or constraining bottlenecks.
>
> Any suggestions or insights into this situation are appreciated.  Thank
> you for your time,
> --
> Eric
>
>
> [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more
> than 120 seconds.
> [Fri Mar 24 20:30:40 2017]   Not tainted 3.13.0-52-generic #85-Ubuntu
> [Fri Mar 24 20:30:40 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [Fri Mar 24 20:30:40 2017] jbd2/vda1-8 D 88043fd13180 0   226
> 2 0x
> [Fri Mar 24 20:30:40 2017]  88003728bbd8 0046
> 88042690 88003728bfd8
> [Fri Mar 24 20:30:40 2017]  00013180 00013180
> 88042690 88043fd13a18
> [Fri Mar 24 20:30:40 2017]  88043ffb9478 0002
> 811ef7c0 88003728bc50
> [Fri Mar 24 20:30:40 2017] Call Trace:
> [Fri Mar 24 20:30:40 2017]  [] ?
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [] io_schedule+0x9d/0x140
> [Fri Mar 24 20:30:40 2017]  [] sleep_on_buffer+0xe/0x20
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_bit+0x62/0x90
> [Fri Mar 24 20:30:40 2017]  [] ?
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  []
> out_of_line_wait_on_bit+0x77/0x90
> [Fri Mar 24 20:30:40 2017]  [] ?
> autoremove_wake_function+0x40/0x40
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_buffer+0x2a/0x30
> [Fri Mar 24 20:30:40 2017]  [] jbd2_journal_commit_
> transaction+0x185d/0x1ab0
> [Fri Mar 24 20:30:40 2017]  [] ?
> try_to_del_timer_sync+0x4f/0x70
> [Fri Mar 24 20:30:40 2017]  [] kjournald2+0xbd/0x250
> [Fri Mar 24 20:30:40 2017]  [] ?
> prepare_to_wait_event+0x100/0x100
> [Fri Mar 24 20:30:40 2017]  [] ? commit_timeout+0x10/0x10
> [Fri Mar 24 20:30:40 2017]  [] kthread+0xd2/0xf0
> [Fri Mar 24 20:30:40 2017]  [] ?
> kthread_create_on_node+0x1c0/0x1c0
> [Fri Mar 24 20:30:40 2017]  [] ret_from_fork+0x7c/0xb0
> [Fri Mar 24 20:30:40 2017]  [] ?
> kthread_create_on_node+0x1c0/0x1c0
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.and...@dreamhost.com | www.dreamhost.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] At what point are objects removed?

2017-03-28 Thread Wido den Hollander

> Op 28 maart 2017 om 16:52 schreef Gregory Farnum :
> 
> 
>  CephFS files are deleted asynchronously by the mds once there are no more
> client references to the file (NOT when the file is unlinked -- that's not
> how posix works). If the number of objects isn't going down after a while,
> restarting your samba instance will probably do the trick.

In this case he was talking about RBD I see, not CephFS.

A TRIM/Discard will need to be run on the Samba server to tell the block layer 
that the free space can be given back to Ceph.

Wido

> -Greg
> On Tue, Mar 28, 2017 at 6:59 AM Götz Reinicke - IT Koordinator <
> goetz.reini...@filmakademie.de> wrote:
> 
> > Hi, may be I got something wrong or did not understend it yet in total.
> >
> > I have some pools and created some test rbd images which are mounted to
> > a samba server.
> >
> > After the test I deleted all files from the on the samba server.
> >
> > But "ceph df detail" and "ceph -s" show still used space.
> >
> > The OSDs on the ceph osd nodes are also filled with data.
> >
> > My question: At what point will the still existing but not needed/used
> > objects be removed?
> >
> >
> >  Thanks for feedback and suggestions and Kowtow . Götz
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-28 Thread Jason Dillaman
Eric,

If you already have debug level 20 logs captured from one of these
events, I would love to be able to take a look at them to see what's
going on. Depending on the size, you could either attach the log to a
new RBD tracker ticket [1] or use the ceph-post-file helper to upload
a large file.

Thanks,
Jason

[1] http://tracker.ceph.com/projects/rbd/issues

On Mon, Mar 27, 2017 at 3:31 PM, Hall, Eric  wrote:
> In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel), 
> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and ceph 
> hosts, we occasionally see hung processes (usually during boot, but otherwise 
> as well), with errors reported in the instance logs as shown below.  
> Configuration is vanilla, based on openstack/ceph docs.
>
> Neither the compute hosts nor the ceph hosts appear to be overloaded in terms 
> of memory or network bandwidth, none of the 67 osds are over 80% full, nor do 
> any of them appear to be overwhelmed in terms of IO.  Compute hosts and ceph 
> cluster are connected via a relatively quiet 1Gb network, with an IBoE net 
> between the ceph nodes.  Neither network appears overloaded.
>
> I don’t see any related (to my eye) errors in client or server logs, even 
> with 20/20 logging from various components (rbd, rados, client, objectcacher, 
> etc.)  I’ve increased the qemu file descriptor limit (currently 64k... 
> overkill for sure.)
>
> I “feels” like a performance problem, but I can’t find any capacity issues or 
> constraining bottlenecks.
>
> Any suggestions or insights into this situation are appreciated.  Thank you 
> for your time,
> --
> Eric
>
>
> [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more than 
> 120 seconds.
> [Fri Mar 24 20:30:40 2017]   Not tainted 3.13.0-52-generic #85-Ubuntu
> [Fri Mar 24 20:30:40 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Fri Mar 24 20:30:40 2017] jbd2/vda1-8 D 88043fd13180 0   226 
>  2 0x
> [Fri Mar 24 20:30:40 2017]  88003728bbd8 0046 
> 88042690 88003728bfd8
> [Fri Mar 24 20:30:40 2017]  00013180 00013180 
> 88042690 88043fd13a18
> [Fri Mar 24 20:30:40 2017]  88043ffb9478 0002 
> 811ef7c0 88003728bc50
> [Fri Mar 24 20:30:40 2017] Call Trace:
> [Fri Mar 24 20:30:40 2017]  [] ? 
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [] io_schedule+0x9d/0x140
> [Fri Mar 24 20:30:40 2017]  [] sleep_on_buffer+0xe/0x20
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_bit+0x62/0x90
> [Fri Mar 24 20:30:40 2017]  [] ? 
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [] 
> out_of_line_wait_on_bit+0x77/0x90
> [Fri Mar 24 20:30:40 2017]  [] ? 
> autoremove_wake_function+0x40/0x40
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_buffer+0x2a/0x30
> [Fri Mar 24 20:30:40 2017]  [] 
> jbd2_journal_commit_transaction+0x185d/0x1ab0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> try_to_del_timer_sync+0x4f/0x70
> [Fri Mar 24 20:30:40 2017]  [] kjournald2+0xbd/0x250
> [Fri Mar 24 20:30:40 2017]  [] ? 
> prepare_to_wait_event+0x100/0x100
> [Fri Mar 24 20:30:40 2017]  [] ? commit_timeout+0x10/0x10
> [Fri Mar 24 20:30:40 2017]  [] kthread+0xd2/0xf0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> kthread_create_on_node+0x1c0/0x1c0
> [Fri Mar 24 20:30:40 2017]  [] ret_from_fork+0x7c/0xb0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> kthread_create_on_node+0x1c0/0x1c0
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] At what point are objects removed?

2017-03-28 Thread Gregory Farnum
 CephFS files are deleted asynchronously by the mds once there are no more
client references to the file (NOT when the file is unlinked -- that's not
how posix works). If the number of objects isn't going down after a while,
restarting your samba instance will probably do the trick.
-Greg
On Tue, Mar 28, 2017 at 6:59 AM Götz Reinicke - IT Koordinator <
goetz.reini...@filmakademie.de> wrote:

> Hi, may be I got something wrong or did not understend it yet in total.
>
> I have some pools and created some test rbd images which are mounted to
> a samba server.
>
> After the test I deleted all files from the on the samba server.
>
> But "ceph df detail" and "ceph -s" show still used space.
>
> The OSDs on the ceph osd nodes are also filled with data.
>
> My question: At what point will the still existing but not needed/used
> objects be removed?
>
>
>  Thanks for feedback and suggestions and Kowtow . Götz
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds down after upgrade hammer to jewel

2017-03-28 Thread Jaime Ibar

Nope, all osds are running 0.94.9


On 28/03/17 14:53, Brian Andrus wrote:
Well, you said you were running v0.94.9, but are there any OSDs 
running pre-v0.94.4 as the error states?


On Tue, Mar 28, 2017 at 6:51 AM, Jaime Ibar > wrote:




On 28/03/17 14:41, Brian Andrus wrote:

What does
# ceph tell osd.* version

ceph tell osd.21 version
Error ENXIO: problem getting command descriptions from osd.21


reveal? Any pre-v0.94.4 hammer OSDs running as the error states?

Yes, as this is the first one I tried to upgrade.
The other ones are running hammer

Thanks




On Tue, Mar 28, 2017 at 1:21 AM, Jaime Ibar > wrote:

Hi,

I did change the ownership to user ceph. In fact, OSD
processes are running

ps aux | grep ceph
ceph2199  0.0  2.7 1729044 918792 ?   Ssl  Mar27 
 0:21 /usr/bin/ceph-osd --cluster=ceph -i 42 -f --setuser

ceph --setgroup ceph
ceph2200  0.0  2.7 1721212 911084 ?   Ssl  Mar27 
 0:20 /usr/bin/ceph-osd --cluster=ceph -i 18 -f --setuser

ceph --setgroup ceph
ceph2212  0.0  2.8 1732532 926580 ?   Ssl  Mar27 
 0:20 /usr/bin/ceph-osd --cluster=ceph -i 3 -f --setuser ceph

--setgroup ceph
ceph2215  0.0  2.8 1743552 935296 ?   Ssl  Mar27 
 0:20 /usr/bin/ceph-osd --cluster=ceph -i 47 -f --setuser

ceph --setgroup ceph
ceph2341  0.0  2.7 1715548 908312 ?   Ssl  Mar27 
 0:20 /usr/bin/ceph-osd --cluster=ceph -i 51 -f --setuser

ceph --setgroup ceph
ceph2383  0.0  2.7 1694944 893768 ?   Ssl  Mar27 
 0:20 /usr/bin/ceph-osd --cluster=ceph -i 56 -f --setuser

ceph --setgroup ceph
[...]

If I run one of the osd increasing debug

ceph-osd --debug_osd 5 -i 31

this is what I get in logs

[...]

0 osd.31 14016 done with init, starting boot process
2017-03-28 09:19:15.280182 7f083df0c800  1 osd.31 14016 We
are healthy, booting
2017-03-28 09:19:15.280685 7f081cad3700  1 osd.31 14016
osdmap indicates one or more pre-v0.94.4 hammer OSDs is running
[...]

It seems the osd is running but ceph is not aware of it

Thanks
Jaime




On 27/03/17 21:56, George Mihaiescu wrote:

Make sure the OSD processes on the Jewel node are
running. If you didn't change the ownership to user ceph,
they won't start.


On Mar 27, 2017, at 11:53, Jaime Ibar
> wrote:

Hi all,

I'm upgrading ceph cluster from Hammer 0.94.9 to
jewel 10.2.6.

The ceph cluster has 3 servers (one mon and one mds
each) and another 6 servers with
12 osds each.
The monitoring and mds have been succesfully upgraded
to latest jewel release, however
after upgrade the first osd server(12 osds), ceph is
not aware of them and
are marked as down

ceph -s

cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45
 health HEALTH_WARN
[...]
12/72 in osds are down
noout flag(s) set
 osdmap e14010: 72 osds: 60 up, 72 in; 14641
remapped pgs
flags noout
[...]

ceph osd tree

3   3.64000 osd.3 down  1.0 1.0
8   3.64000 osd.8 down  1.0 1.0
14   3.64000 osd.14  down  1.0 1.0
18   3.64000 osd.18  down  1.0 
1.0
21   3.64000 osd.21  down  1.0 
1.0
28   3.64000 osd.28  down  1.0 
1.0
31   3.64000 osd.31  down  1.0 
1.0
37   3.64000 osd.37  down  1.0 
1.0
42   3.64000 osd.42  down  1.0 
1.0
47   3.64000 osd.47  down  1.0 
1.0
51   3.64000 osd.51  down  1.0 
1.0
56   3.64000 osd.56  down  1.0 
1.0


If I run this command with one of the down osd
ceph osd in 14
osd.14 is already in.
however ceph doesn't mark it as up and the cluster
health remains

[ceph-users] At what point are objects removed?

2017-03-28 Thread Götz Reinicke - IT Koordinator

Hi, may be I got something wrong or did not understend it yet in total.

I have some pools and created some test rbd images which are mounted to 
a samba server.


After the test I deleted all files from the on the samba server.

But "ceph df detail" and "ceph -s" show still used space.

The OSDs on the ceph osd nodes are also filled with data.

My question: At what point will the still existing but not needed/used 
objects be removed?



Thanks for feedback and suggestions and Kowtow . Götz





smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds down after upgrade hammer to jewel

2017-03-28 Thread Brian Andrus
Well, you said you were running v0.94.9, but are there any OSDs running
pre-v0.94.4 as the error states?

On Tue, Mar 28, 2017 at 6:51 AM, Jaime Ibar  wrote:

>
>
> On 28/03/17 14:41, Brian Andrus wrote:
>
> What does
> # ceph tell osd.* version
>
> ceph tell osd.21 version
> Error ENXIO: problem getting command descriptions from osd.21
>
>
> reveal? Any pre-v0.94.4 hammer OSDs running as the error states?
>
> Yes, as this is the first one I tried to upgrade.
> The other ones are running hammer
>
> Thanks
>
>
>
> On Tue, Mar 28, 2017 at 1:21 AM, Jaime Ibar  wrote:
>
>> Hi,
>>
>> I did change the ownership to user ceph. In fact, OSD processes are
>> running
>>
>> ps aux | grep ceph
>> ceph2199  0.0  2.7 1729044 918792 ?  Ssl  Mar27   0:21
>> /usr/bin/ceph-osd --cluster=ceph -i 42 -f --setuser ceph --setgroup ceph
>> ceph2200  0.0  2.7 1721212 911084 ?  Ssl  Mar27   0:20
>> /usr/bin/ceph-osd --cluster=ceph -i 18 -f --setuser ceph --setgroup ceph
>> ceph2212  0.0  2.8 1732532 926580 ?  Ssl  Mar27   0:20
>> /usr/bin/ceph-osd --cluster=ceph -i 3 -f --setuser ceph --setgroup ceph
>> ceph2215  0.0  2.8 1743552 935296 ?  Ssl  Mar27   0:20
>> /usr/bin/ceph-osd --cluster=ceph -i 47 -f --setuser ceph --setgroup ceph
>> ceph2341  0.0  2.7 1715548 908312 ?  Ssl  Mar27   0:20
>> /usr/bin/ceph-osd --cluster=ceph -i 51 -f --setuser ceph --setgroup ceph
>> ceph2383  0.0  2.7 1694944 893768 ?  Ssl  Mar27   0:20
>> /usr/bin/ceph-osd --cluster=ceph -i 56 -f --setuser ceph --setgroup ceph
>> [...]
>>
>> If I run one of the osd increasing debug
>>
>> ceph-osd --debug_osd 5 -i 31
>>
>> this is what I get in logs
>>
>> [...]
>>
>> 0 osd.31 14016 done with init, starting boot process
>> 2017-03-28 09:19:15.280182 7f083df0c800  1 osd.31 14016 We are healthy,
>> booting
>> 2017-03-28 09:19:15.280685 7f081cad3700  1 osd.31 14016 osdmap indicates
>> one or more pre-v0.94.4 hammer OSDs is running
>> [...]
>>
>> It seems the osd is running but ceph is not aware of it
>>
>> Thanks
>> Jaime
>>
>>
>>
>>
>> On 27/03/17 21:56, George Mihaiescu wrote:
>>
>>> Make sure the OSD processes on the Jewel node are running. If you didn't
>>> change the ownership to user ceph, they won't start.
>>>
>>>
>>> On Mar 27, 2017, at 11:53, Jaime Ibar  wrote:

 Hi all,

 I'm upgrading ceph cluster from Hammer 0.94.9 to jewel 10.2.6.

 The ceph cluster has 3 servers (one mon and one mds each) and another 6
 servers with
 12 osds each.
 The monitoring and mds have been succesfully upgraded to latest jewel
 release, however
 after upgrade the first osd server(12 osds), ceph is not aware of them
 and
 are marked as down

 ceph -s

 cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45
  health HEALTH_WARN
 [...]
 12/72 in osds are down
 noout flag(s) set
  osdmap e14010: 72 osds: 60 up, 72 in; 14641 remapped pgs
 flags noout
 [...]

 ceph osd tree

 3   3.64000 osd.3  down  1.0 1.0
 8   3.64000 osd.8  down  1.0 1.0
 14   3.64000 osd.14 down  1.0 1.0
 18   3.64000 osd.18 down  1.0  1.0
 21   3.64000 osd.21 down  1.0  1.0
 28   3.64000 osd.28 down  1.0  1.0
 31   3.64000 osd.31 down  1.0  1.0
 37   3.64000 osd.37 down  1.0  1.0
 42   3.64000 osd.42 down  1.0  1.0
 47   3.64000 osd.47 down  1.0  1.0
 51   3.64000 osd.51 down  1.0  1.0
 56   3.64000 osd.56 down  1.0  1.0

 If I run this command with one of the down osd
 ceph osd in 14
 osd.14 is already in.
 however ceph doesn't mark it as up and the cluster health remains
 in degraded state.

 Do I have to upgrade all the osds to jewel first?
 Any help as I'm running out of ideas?

 Thanks
 Jaime

 --

 Jaime Ibar
 High Performance & Research Computing, IS Services
 Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
 http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
 Tel: +353-1-896-3725

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>> --
>>
>> Jaime Ibar
>> High Performance & Research Computing, IS Services
>> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
>> http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
>> Tel: +353-1-896-3725
>>
>> ___
>> ceph-users mailing list
>> 

Re: [ceph-users] osds down after upgrade hammer to jewel

2017-03-28 Thread Jaime Ibar



On 28/03/17 14:41, Brian Andrus wrote:

What does
# ceph tell osd.* version

ceph tell osd.21 version
Error ENXIO: problem getting command descriptions from osd.21


reveal? Any pre-v0.94.4 hammer OSDs running as the error states?

Yes, as this is the first one I tried to upgrade.
The other ones are running hammer

Thanks



On Tue, Mar 28, 2017 at 1:21 AM, Jaime Ibar > wrote:


Hi,

I did change the ownership to user ceph. In fact, OSD processes
are running

ps aux | grep ceph
ceph2199  0.0  2.7 1729044 918792 ?  Ssl  Mar27  0:21
/usr/bin/ceph-osd --cluster=ceph -i 42 -f --setuser ceph
--setgroup ceph
ceph2200  0.0  2.7 1721212 911084 ?  Ssl  Mar27  0:20
/usr/bin/ceph-osd --cluster=ceph -i 18 -f --setuser ceph
--setgroup ceph
ceph2212  0.0  2.8 1732532 926580 ?  Ssl  Mar27  0:20
/usr/bin/ceph-osd --cluster=ceph -i 3 -f --setuser ceph --setgroup
ceph
ceph2215  0.0  2.8 1743552 935296 ?  Ssl  Mar27  0:20
/usr/bin/ceph-osd --cluster=ceph -i 47 -f --setuser ceph
--setgroup ceph
ceph2341  0.0  2.7 1715548 908312 ?  Ssl  Mar27  0:20
/usr/bin/ceph-osd --cluster=ceph -i 51 -f --setuser ceph
--setgroup ceph
ceph2383  0.0  2.7 1694944 893768 ?  Ssl  Mar27  0:20
/usr/bin/ceph-osd --cluster=ceph -i 56 -f --setuser ceph
--setgroup ceph
[...]

If I run one of the osd increasing debug

ceph-osd --debug_osd 5 -i 31

this is what I get in logs

[...]

0 osd.31 14016 done with init, starting boot process
2017-03-28 09:19:15.280182 7f083df0c800  1 osd.31 14016 We are
healthy, booting
2017-03-28 09:19:15.280685 7f081cad3700  1 osd.31 14016 osdmap
indicates one or more pre-v0.94.4 hammer OSDs is running
[...]

It seems the osd is running but ceph is not aware of it

Thanks
Jaime




On 27/03/17 21:56, George Mihaiescu wrote:

Make sure the OSD processes on the Jewel node are running. If
you didn't change the ownership to user ceph, they won't start.


On Mar 27, 2017, at 11:53, Jaime Ibar > wrote:

Hi all,

I'm upgrading ceph cluster from Hammer 0.94.9 to jewel 10.2.6.

The ceph cluster has 3 servers (one mon and one mds each)
and another 6 servers with
12 osds each.
The monitoring and mds have been succesfully upgraded to
latest jewel release, however
after upgrade the first osd server(12 osds), ceph is not
aware of them and
are marked as down

ceph -s

cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45
 health HEALTH_WARN
[...]
12/72 in osds are down
noout flag(s) set
 osdmap e14010: 72 osds: 60 up, 72 in; 14641 remapped pgs
flags noout
[...]

ceph osd tree

3   3.64000 osd.3  down  1.0 1.0
8   3.64000 osd.8  down  1.0 1.0
14   3.64000 osd.14 down  1.0 1.0
18   3.64000 osd.18 down  1.0  
1.0
21   3.64000 osd.21 down  1.0  
1.0
28   3.64000 osd.28 down  1.0  
1.0
31   3.64000 osd.31 down  1.0  
1.0
37   3.64000 osd.37 down  1.0  
1.0
42   3.64000 osd.42 down  1.0  
1.0
47   3.64000 osd.47 down  1.0  
1.0
51   3.64000 osd.51 down  1.0  
1.0
56   3.64000 osd.56 down  1.0  
1.0


If I run this command with one of the down osd
ceph osd in 14
osd.14 is already in.
however ceph doesn't mark it as up and the cluster health
remains
in degraded state.

Do I have to upgrade all the osds to jewel first?
Any help as I'm running out of ideas?

Thanks
Jaime

-- 


Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie

Tel: +353-1-896-3725 

___
ceph-users mailing list
ceph-users@lists.ceph.com 

Re: [ceph-users] systemd and ceph-mon autostart on Ubuntu 16.04

2017-03-28 Thread David Welch
We also ran into this problem on upgrading Ubuntu from 14.04 to 16.04. 
The service file is not being automatically created. The issue was 
resolved with the following steps:


$ sudo systemctl enable ceph-mon@your-hostname
/Created symlink from 
/etc/systemd/system/ceph-mon.target.wants/ceph-mon@your-hostname.service 
to /lib/systemd/system/ceph-mon@.service./


$ sudo systemctl enable ceph-mon@your-hostname

$ sudo systemctl start ceph-mon@your-hostname

Now it should start and join the cluster.

-David



On 01/25/2017 02:35 PM, Wido den Hollander wrote:

Op 25 januari 2017 om 20:25 schreef Patrick Donnelly :


On Wed, Jan 25, 2017 at 2:19 PM, Wido den Hollander  wrote:

Hi,

I thought this issue was resolved a while ago, but while testing Kraken with 
BlueStore I ran into the problem again.

My monitors are not being started on boot:

Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-59-generic x86_64)

  * Documentation:  https://help.ubuntu.com
  * Management: https://landscape.canonical.com
  * Support:https://ubuntu.com/advantage
Last login: Wed Jan 25 15:08:57 2017 from 2001:db8::100
root@bravo:~# systemctl status ceph-mon.target
● ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service 
instances at once
Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor 
preset: enabled)
Active: inactive (dead)
root@bravo:~#

If I enable ceph-mon.target my Monitors start just fine on boot:

root@bravo:~# systemctl enable ceph-mon.target
Created symlink from 
/etc/systemd/system/multi-user.target.wants/ceph-mon.target to 
/lib/systemd/system/ceph-mon.target.
Created symlink from /etc/systemd/system/ceph.target.wants/ceph-mon.target to 
/lib/systemd/system/ceph-mon.target.
root@bravo:~# ceph -v
ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
root@bravo:~#

Anybody else seeing this before I start digging into the .deb packaging?

Are you wanting ceph-mon.target to automatically be enabled on package
install? That doesn't sound good to me but I'm not familiar with
Ubuntu's packaging rules. I would think the sysadmin must enable the
services they install themselves.


Under Ubuntu that usually happens yes. This system however was installed with 
ceph-deploy (1.5.37) OSDs started on boot, but the MONs didn't.

The OSDs were started by udev/ceph-disk however.

I checked my ceph-deploy log and I found:

[2017-01-23 18:56:56,370][alpha][INFO  ] Running command: systemctl enable 
ceph.target
[2017-01-23 18:56:56,394][alpha][WARNING] Created symlink from 
/etc/systemd/system/multi-user.target.wants/ceph.target to 
/lib/systemd/system/ceph.target.
[2017-01-23 18:56:56,487][alpha][INFO  ] Running command: systemctl enable 
ceph-mon@alpha
[2017-01-23 18:56:56,504][alpha][WARNING] Created symlink from 
/etc/systemd/system/ceph-mon.target.wants/ceph-mon@alpha.service to 
/lib/systemd/system/ceph-mon@.service.
[2017-01-23 18:56:56,656][alpha][INFO  ] Running command: systemctl start 
ceph-mon@alpha

It doesn't seem to enable ceph-mon.target thus not enabling the MON to start on 
boot.

This small cluster runs inside VirtualBox with the machines alpha, bravo and 
charlie.

Wido


--
Patrick Donnelly

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
~~
David Welch
DevOps
ARS
http://thinkars.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds down after upgrade hammer to jewel

2017-03-28 Thread Brian Andrus
What does
# ceph tell osd.* version

reveal? Any pre-v0.94.4 hammer OSDs running as the error states?


On Tue, Mar 28, 2017 at 1:21 AM, Jaime Ibar  wrote:

> Hi,
>
> I did change the ownership to user ceph. In fact, OSD processes are running
>
> ps aux | grep ceph
> ceph2199  0.0  2.7 1729044 918792 ?  Ssl  Mar27   0:21
> /usr/bin/ceph-osd --cluster=ceph -i 42 -f --setuser ceph --setgroup ceph
> ceph2200  0.0  2.7 1721212 911084 ?  Ssl  Mar27   0:20
> /usr/bin/ceph-osd --cluster=ceph -i 18 -f --setuser ceph --setgroup ceph
> ceph2212  0.0  2.8 1732532 926580 ?  Ssl  Mar27   0:20
> /usr/bin/ceph-osd --cluster=ceph -i 3 -f --setuser ceph --setgroup ceph
> ceph2215  0.0  2.8 1743552 935296 ?  Ssl  Mar27   0:20
> /usr/bin/ceph-osd --cluster=ceph -i 47 -f --setuser ceph --setgroup ceph
> ceph2341  0.0  2.7 1715548 908312 ?  Ssl  Mar27   0:20
> /usr/bin/ceph-osd --cluster=ceph -i 51 -f --setuser ceph --setgroup ceph
> ceph2383  0.0  2.7 1694944 893768 ?  Ssl  Mar27   0:20
> /usr/bin/ceph-osd --cluster=ceph -i 56 -f --setuser ceph --setgroup ceph
> [...]
>
> If I run one of the osd increasing debug
>
> ceph-osd --debug_osd 5 -i 31
>
> this is what I get in logs
>
> [...]
>
> 0 osd.31 14016 done with init, starting boot process
> 2017-03-28 09:19:15.280182 7f083df0c800  1 osd.31 14016 We are healthy,
> booting
> 2017-03-28 09:19:15.280685 7f081cad3700  1 osd.31 14016 osdmap indicates
> one or more pre-v0.94.4 hammer OSDs is running
> [...]
>
> It seems the osd is running but ceph is not aware of it
>
> Thanks
> Jaime
>
>
>
>
> On 27/03/17 21:56, George Mihaiescu wrote:
>
>> Make sure the OSD processes on the Jewel node are running. If you didn't
>> change the ownership to user ceph, they won't start.
>>
>>
>> On Mar 27, 2017, at 11:53, Jaime Ibar  wrote:
>>>
>>> Hi all,
>>>
>>> I'm upgrading ceph cluster from Hammer 0.94.9 to jewel 10.2.6.
>>>
>>> The ceph cluster has 3 servers (one mon and one mds each) and another 6
>>> servers with
>>> 12 osds each.
>>> The monitoring and mds have been succesfully upgraded to latest jewel
>>> release, however
>>> after upgrade the first osd server(12 osds), ceph is not aware of them
>>> and
>>> are marked as down
>>>
>>> ceph -s
>>>
>>> cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45
>>>  health HEALTH_WARN
>>> [...]
>>> 12/72 in osds are down
>>> noout flag(s) set
>>>  osdmap e14010: 72 osds: 60 up, 72 in; 14641 remapped pgs
>>> flags noout
>>> [...]
>>>
>>> ceph osd tree
>>>
>>> 3   3.64000 osd.3  down  1.0 1.0
>>> 8   3.64000 osd.8  down  1.0 1.0
>>> 14   3.64000 osd.14 down  1.0 1.0
>>> 18   3.64000 osd.18 down  1.0  1.0
>>> 21   3.64000 osd.21 down  1.0  1.0
>>> 28   3.64000 osd.28 down  1.0  1.0
>>> 31   3.64000 osd.31 down  1.0  1.0
>>> 37   3.64000 osd.37 down  1.0  1.0
>>> 42   3.64000 osd.42 down  1.0  1.0
>>> 47   3.64000 osd.47 down  1.0  1.0
>>> 51   3.64000 osd.51 down  1.0  1.0
>>> 56   3.64000 osd.56 down  1.0  1.0
>>>
>>> If I run this command with one of the down osd
>>> ceph osd in 14
>>> osd.14 is already in.
>>> however ceph doesn't mark it as up and the cluster health remains
>>> in degraded state.
>>>
>>> Do I have to upgrade all the osds to jewel first?
>>> Any help as I'm running out of ideas?
>>>
>>> Thanks
>>> Jaime
>>>
>>> --
>>>
>>> Jaime Ibar
>>> High Performance & Research Computing, IS Services
>>> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
>>> http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
>>> Tel: +353-1-896-3725
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
> --
>
> Jaime Ibar
> High Performance & Research Computing, IS Services
> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
> http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
> Tel: +353-1-896-3725
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.and...@dreamhost.com | www.dreamhost.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD network with IPv6 SLAAC networks?

2017-03-28 Thread Wido den Hollander

> Op 27 maart 2017 om 21:49 schreef Richard Hesse :
> 
> 
> Has anyone run their Ceph OSD cluster network on IPv6 using SLAAC? I know
> that ceph supports IPv6, but I'm not sure how it would deal with the
> address rotation in SLAAC, permanent vs outgoing address, etc. It would be
> very nice for me, as I wouldn't have to run any kind of DHCP server or use
> static addressing -- just configure RA's and go.
> 

Yes, I do in many clusters. Works fine! SLAAC doesn't generate random addresses 
which change over time. That's a feature called 'Privacy Extensions' and is 
controlled on Linux by:

- net.ipv6.conf.all.use_tempaddr
- net.ipv6.conf.default.use_tempaddr
- net.ipv6.conf.X.use_tempaddr

Set this to 0 and the kernel will generate one address based on the MAC-Address 
(EUI64) of the interface. This address is stable and will not change.

I like this very much as I don't have any static or complex network 
configurations on the hosts. It moves the whole responsibility of networking 
and addresses to the network. A host just boots and obtains a IP.

The OSDs contact the MONs on boot and they will tell them their address. OSDs 
do not need a fixed address for Ceph.

However, using SLAAC without Privacy Extensions means that in practice the 
address will not change of a machine, so you don't need to worry about it that 
much.

The biggest system I have running this way is 400 nodes running IPv6-only. 10 
racks, 40 nodes per rack. Each rack has a Top-of-Rack switch running in Layer 3 
and a /64 is assigned per rack.

Layer 3 routing is used between the racks that based on the IPv6 address we can 
even determine in which rack the host/OSD is.

Layer 2 domains don't expand over racks which makes a rack a true failure 
domain in our case.

Wido

> On that note, does anyone have any experience with running ceph in a mixed
> v4 and v6 environment?
> 
> Thanks,
> -richard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds down after upgrade hammer to jewel

2017-03-28 Thread Jaime Ibar

Hi,

I did change the ownership to user ceph. In fact, OSD processes are running

ps aux | grep ceph
ceph2199  0.0  2.7 1729044 918792 ?  Ssl  Mar27   0:21 
/usr/bin/ceph-osd --cluster=ceph -i 42 -f --setuser ceph --setgroup ceph
ceph2200  0.0  2.7 1721212 911084 ?  Ssl  Mar27   0:20 
/usr/bin/ceph-osd --cluster=ceph -i 18 -f --setuser ceph --setgroup ceph
ceph2212  0.0  2.8 1732532 926580 ?  Ssl  Mar27   0:20 
/usr/bin/ceph-osd --cluster=ceph -i 3 -f --setuser ceph --setgroup ceph
ceph2215  0.0  2.8 1743552 935296 ?  Ssl  Mar27   0:20 
/usr/bin/ceph-osd --cluster=ceph -i 47 -f --setuser ceph --setgroup ceph
ceph2341  0.0  2.7 1715548 908312 ?  Ssl  Mar27   0:20 
/usr/bin/ceph-osd --cluster=ceph -i 51 -f --setuser ceph --setgroup ceph
ceph2383  0.0  2.7 1694944 893768 ?  Ssl  Mar27   0:20 
/usr/bin/ceph-osd --cluster=ceph -i 56 -f --setuser ceph --setgroup ceph

[...]

If I run one of the osd increasing debug

ceph-osd --debug_osd 5 -i 31

this is what I get in logs

[...]

0 osd.31 14016 done with init, starting boot process
2017-03-28 09:19:15.280182 7f083df0c800  1 osd.31 14016 We are healthy, 
booting
2017-03-28 09:19:15.280685 7f081cad3700  1 osd.31 14016 osdmap indicates 
one or more pre-v0.94.4 hammer OSDs is running

[...]

It seems the osd is running but ceph is not aware of it

Thanks
Jaime



On 27/03/17 21:56, George Mihaiescu wrote:

Make sure the OSD processes on the Jewel node are running. If you didn't change 
the ownership to user ceph, they won't start.



On Mar 27, 2017, at 11:53, Jaime Ibar  wrote:

Hi all,

I'm upgrading ceph cluster from Hammer 0.94.9 to jewel 10.2.6.

The ceph cluster has 3 servers (one mon and one mds each) and another 6 servers 
with
12 osds each.
The monitoring and mds have been succesfully upgraded to latest jewel release, 
however
after upgrade the first osd server(12 osds), ceph is not aware of them and
are marked as down

ceph -s

cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45
 health HEALTH_WARN
[...]
12/72 in osds are down
noout flag(s) set
 osdmap e14010: 72 osds: 60 up, 72 in; 14641 remapped pgs
flags noout
[...]

ceph osd tree

3   3.64000 osd.3  down  1.0 1.0
8   3.64000 osd.8  down  1.0 1.0
14   3.64000 osd.14 down  1.0 1.0
18   3.64000 osd.18 down  1.0  1.0
21   3.64000 osd.21 down  1.0  1.0
28   3.64000 osd.28 down  1.0  1.0
31   3.64000 osd.31 down  1.0  1.0
37   3.64000 osd.37 down  1.0  1.0
42   3.64000 osd.42 down  1.0  1.0
47   3.64000 osd.47 down  1.0  1.0
51   3.64000 osd.51 down  1.0  1.0
56   3.64000 osd.56 down  1.0  1.0

If I run this command with one of the down osd
ceph osd in 14
osd.14 is already in.
however ceph doesn't mark it as up and the cluster health remains
in degraded state.

Do I have to upgrade all the osds to jewel first?
Any help as I'm running out of ideas?

Thanks
Jaime

--

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
Tel: +353-1-896-3725

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
Tel: +353-1-896-3725

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-28 Thread Marius Vaitiekunas
On Mon, Mar 27, 2017 at 11:17 PM, Peter Maloney <
peter.malo...@brockmann-consult.de> wrote:

> I can't guarantee it's the same as my issue, but from that it sounds the
> same.
>
> Jewel 10.2.4, 10.2.5 tested
> hypervisors are proxmox qemu-kvm, using librbd
> 3 ceph nodes with mon+osd on each
>
> -faster journals, more disks, bcache, rbd_cache, fewer VMs on ceph, iops
> and bw limits on client side, jumbo frames, etc. all improve/smooth out
> performance and mitigate the hangs, but don't prevent it.
> -hangs are usually associated with blocked requests (I set the complaint
> time to 5s to see them)
> -hangs are very easily caused by rbd snapshot + rbd export-diff to do
> incremental backup (one snap persistent, plus one more during backup)
> -when qemu VM io hangs, I have to kill -9 the qemu process for it to
> stop. Some broken VMs don't appear to be hung until I try to live
> migrate them (live migrating all VMs helped test solutions)
>
> Finally I have a workaround... disable exclusive-lock, object-map, and
> fast-diff rbd features (and restart clients via live migrate).
> (object-map and fast-diff appear to have no effect on dif or export-diff
> ... so I don't miss them). I'll file a bug at some point (after I move
> all VMs back and see if it is still stable). And one other user on IRC
> said this solved the same problem (also using rbd snapshots).
>
> And strangely, they don't seem to hang if I put back those features,
> until a few days later (making testing much less easy...but now I'm very
> sure removing them prevents the issue)
>
> I hope this works for you (and maybe gets some attention from devs too),
> so you don't waste months like me.
>
> On 03/27/17 19:31, Hall, Eric wrote:
> > In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel),
> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and
> ceph hosts, we occasionally see hung processes (usually during boot, but
> otherwise as well), with errors reported in the instance logs as shown
> below.  Configuration is vanilla, based on openstack/ceph docs.
> >
> > Neither the compute hosts nor the ceph hosts appear to be overloaded in
> terms of memory or network bandwidth, none of the 67 osds are over 80%
> full, nor do any of them appear to be overwhelmed in terms of IO.  Compute
> hosts and ceph cluster are connected via a relatively quiet 1Gb network,
> with an IBoE net between the ceph nodes.  Neither network appears
> overloaded.
> >
> > I don’t see any related (to my eye) errors in client or server logs,
> even with 20/20 logging from various components (rbd, rados, client,
> objectcacher, etc.)  I’ve increased the qemu file descriptor limit
> (currently 64k... overkill for sure.)
> >
> > I “feels” like a performance problem, but I can’t find any capacity
> issues or constraining bottlenecks.
> >
> > Any suggestions or insights into this situation are appreciated.  Thank
> you for your time,
> > --
> > Eric
> >
> >
> > [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more
> than 120 seconds.
> > [Fri Mar 24 20:30:40 2017]   Not tainted 3.13.0-52-generic #85-Ubuntu
> > [Fri Mar 24 20:30:40 2017] "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> > [Fri Mar 24 20:30:40 2017] jbd2/vda1-8 D 88043fd13180 0
>  226  2 0x
> > [Fri Mar 24 20:30:40 2017]  88003728bbd8 0046
> 88042690 88003728bfd8
> > [Fri Mar 24 20:30:40 2017]  00013180 00013180
> 88042690 88043fd13a18
> > [Fri Mar 24 20:30:40 2017]  88043ffb9478 0002
> 811ef7c0 88003728bc50
> > [Fri Mar 24 20:30:40 2017] Call Trace:
> > [Fri Mar 24 20:30:40 2017]  [] ?
> generic_block_bmap+0x50/0x50
> > [Fri Mar 24 20:30:40 2017]  [] io_schedule+0x9d/0x140
> > [Fri Mar 24 20:30:40 2017]  [] sleep_on_buffer+0xe/0x20
> > [Fri Mar 24 20:30:40 2017]  [] __wait_on_bit+0x62/0x90
> > [Fri Mar 24 20:30:40 2017]  [] ?
> generic_block_bmap+0x50/0x50
> > [Fri Mar 24 20:30:40 2017]  []
> out_of_line_wait_on_bit+0x77/0x90
> > [Fri Mar 24 20:30:40 2017]  [] ?
> autoremove_wake_function+0x40/0x40
> > [Fri Mar 24 20:30:40 2017]  []
> __wait_on_buffer+0x2a/0x30
> > [Fri Mar 24 20:30:40 2017]  [] jbd2_journal_commit_
> transaction+0x185d/0x1ab0
> > [Fri Mar 24 20:30:40 2017]  [] ?
> try_to_del_timer_sync+0x4f/0x70
> > [Fri Mar 24 20:30:40 2017]  [] kjournald2+0xbd/0x250
> > [Fri Mar 24 20:30:40 2017]  [] ?
> prepare_to_wait_event+0x100/0x100
> > [Fri Mar 24 20:30:40 2017]  [] ?
> commit_timeout+0x10/0x10
> > [Fri Mar 24 20:30:40 2017]  [] kthread+0xd2/0xf0
> > [Fri Mar 24 20:30:40 2017]  [] ?
> kthread_create_on_node+0x1c0/0x1c0
> > [Fri Mar 24 20:30:40 2017]  [] ret_from_fork+0x7c/0xb0
> > [Fri Mar 24 20:30:40 2017]  [] ?
> kthread_create_on_node+0x1c0/0x1c0
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > 

Re: [ceph-users] XFS attempt to access beyond end of device

2017-03-28 Thread Brad Hubbard
On Tue, Mar 28, 2017 at 4:22 PM, Marcus Furlong  wrote:
> On 22 March 2017 at 19:36, Brad Hubbard  wrote:
>> On Wed, Mar 22, 2017 at 5:24 PM, Marcus Furlong  wrote:
>
>>> [435339.965817] [ cut here ]
>>> [435339.965874] WARNING: at fs/xfs/xfs_aops.c:1244
>>> xfs_vm_releasepage+0xcb/0x100 [xfs]()
>>> [435339.965876] Modules linked in: vfat fat uas usb_storage mpt3sas
>>> mpt2sas raid_class scsi_transport_sas mptctl mptbase iptable_filter
>>> dell_rbu team_mode_loadbalance team rpcrdma ib_isert iscsi_target_mod
>>> ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp
>>> scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
>>> rdma_cm ib_cm iw_cm mlx5_ib ib_core intel_powerclamp coretemp
>>> intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul
>>> ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper
>>> cryptd ipmi_devintf iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas pcspkr
>>> ipmi_ssif sb_edac edac_core sg mei_me mei lpc_ich shpchp ipmi_si
>>> ipmi_msghandler wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd
>>> grace sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic mgag200
>>> i2c_algo_bit
>>> [435339.965942]  crct10dif_pclmul crct10dif_common drm_kms_helper
>>> crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm
>>> bnx2x ahci libahci mlx5_core i2c_core libata mdio ptp megaraid_sas
>>> nvme pps_core libcrc32c fjes dm_mirror dm_region_hash dm_log dm_mod
>>> [435339.965991] CPU: 8 PID: 223 Comm: kswapd0 Not tainted
>>> 3.10.0-514.10.2.el7.x86_64 #1
>>> [435339.965993] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS
>>> 2.3.4 11/08/2016
>>> [435339.965994]   6ea9561d 881ffc2c7aa0
>>> 816863ef
>>> [435339.965998]  881ffc2c7ad8 81085940 ea00015d4e20
>>> ea00015d4e00
>>> [435339.966000]  880f4d7c5af8 881ffc2c7da0 ea00015d4e00
>>> 881ffc2c7ae8
>>> [435339.966003] Call Trace:
>>> [435339.966010]  [] dump_stack+0x19/0x1b
>>> [435339.966015]  [] warn_slowpath_common+0x70/0xb0
>>> [435339.966018]  [] warn_slowpath_null+0x1a/0x20
>>> [435339.966060]  [] xfs_vm_releasepage+0xcb/0x100 [xfs]
>>> [435339.966120]  [] try_to_release_page+0x32/0x50
>>> [435339.966128]  [] shrink_active_list+0x3d6/0x3e0
>>> [435339.966133]  [] shrink_lruvec+0x3f1/0x770
>>> [435339.966138]  [] shrink_zone+0x76/0x1a0
>>> [435339.966143]  [] balance_pgdat+0x48c/0x5e0
>>> [435339.966147]  [] kswapd+0x173/0x450
>>> [435339.966155]  [] ? wake_up_atomic_t+0x30/0x30
>>> [435339.966158]  [] ? balance_pgdat+0x5e0/0x5e0
>>> [435339.966161]  [] kthread+0xcf/0xe0
>>> [435339.966165]  [] ? kthread_create_on_node+0x140/0x140
>>> [435339.966170]  [] ret_from_fork+0x58/0x90
>>> [435339.966173]  [] ? kthread_create_on_node+0x140/0x140
>>> [435339.966175] ---[ end trace 58233bbca77fd5e2 ]---
>>
>> With regards to the above stack trace,
>> https://bugzilla.redhat.com/show_bug.cgi?id=1079818 was opened, and
>> remains open, for the same stack. I would suggest discussing this
>> issue with your kernel support organisation as it is likely unrelated
>> to the sizing issue IIUC.
>
> Hi Brad,
>
> Thanks for clarifying that. That bug is not public. Is there any
> workaround mentioned in it?

No, there isn't. The upstream fix is
http://oss.sgi.com/pipermail/xfs/2016-July/050281.html

>
> Cheers,
> Marcus.
>
> --
> Marcus Furlong



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] XFS attempt to access beyond end of device

2017-03-28 Thread Marcus Furlong
On 22 March 2017 at 19:36, Brad Hubbard  wrote:
> On Wed, Mar 22, 2017 at 5:24 PM, Marcus Furlong  wrote:

>> [435339.965817] [ cut here ]
>> [435339.965874] WARNING: at fs/xfs/xfs_aops.c:1244
>> xfs_vm_releasepage+0xcb/0x100 [xfs]()
>> [435339.965876] Modules linked in: vfat fat uas usb_storage mpt3sas
>> mpt2sas raid_class scsi_transport_sas mptctl mptbase iptable_filter
>> dell_rbu team_mode_loadbalance team rpcrdma ib_isert iscsi_target_mod
>> ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp
>> scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
>> rdma_cm ib_cm iw_cm mlx5_ib ib_core intel_powerclamp coretemp
>> intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul
>> ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper
>> cryptd ipmi_devintf iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas pcspkr
>> ipmi_ssif sb_edac edac_core sg mei_me mei lpc_ich shpchp ipmi_si
>> ipmi_msghandler wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd
>> grace sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic mgag200
>> i2c_algo_bit
>> [435339.965942]  crct10dif_pclmul crct10dif_common drm_kms_helper
>> crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm
>> bnx2x ahci libahci mlx5_core i2c_core libata mdio ptp megaraid_sas
>> nvme pps_core libcrc32c fjes dm_mirror dm_region_hash dm_log dm_mod
>> [435339.965991] CPU: 8 PID: 223 Comm: kswapd0 Not tainted
>> 3.10.0-514.10.2.el7.x86_64 #1
>> [435339.965993] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS
>> 2.3.4 11/08/2016
>> [435339.965994]   6ea9561d 881ffc2c7aa0
>> 816863ef
>> [435339.965998]  881ffc2c7ad8 81085940 ea00015d4e20
>> ea00015d4e00
>> [435339.966000]  880f4d7c5af8 881ffc2c7da0 ea00015d4e00
>> 881ffc2c7ae8
>> [435339.966003] Call Trace:
>> [435339.966010]  [] dump_stack+0x19/0x1b
>> [435339.966015]  [] warn_slowpath_common+0x70/0xb0
>> [435339.966018]  [] warn_slowpath_null+0x1a/0x20
>> [435339.966060]  [] xfs_vm_releasepage+0xcb/0x100 [xfs]
>> [435339.966120]  [] try_to_release_page+0x32/0x50
>> [435339.966128]  [] shrink_active_list+0x3d6/0x3e0
>> [435339.966133]  [] shrink_lruvec+0x3f1/0x770
>> [435339.966138]  [] shrink_zone+0x76/0x1a0
>> [435339.966143]  [] balance_pgdat+0x48c/0x5e0
>> [435339.966147]  [] kswapd+0x173/0x450
>> [435339.966155]  [] ? wake_up_atomic_t+0x30/0x30
>> [435339.966158]  [] ? balance_pgdat+0x5e0/0x5e0
>> [435339.966161]  [] kthread+0xcf/0xe0
>> [435339.966165]  [] ? kthread_create_on_node+0x140/0x140
>> [435339.966170]  [] ret_from_fork+0x58/0x90
>> [435339.966173]  [] ? kthread_create_on_node+0x140/0x140
>> [435339.966175] ---[ end trace 58233bbca77fd5e2 ]---
>
> With regards to the above stack trace,
> https://bugzilla.redhat.com/show_bug.cgi?id=1079818 was opened, and
> remains open, for the same stack. I would suggest discussing this
> issue with your kernel support organisation as it is likely unrelated
> to the sizing issue IIUC.

Hi Brad,

Thanks for clarifying that. That bug is not public. Is there any
workaround mentioned in it?

Cheers,
Marcus.

-- 
Marcus Furlong
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-rest-api's behavior

2017-03-28 Thread Brad Hubbard
I've copied Dan who may have some thoughts on this and has been
involved with this code.

On Tue, Mar 28, 2017 at 3:58 PM, Mika c  wrote:
> Hi Brad,
>Thanks for your help. I found that's my problem. Forget attach file name
> with words ''keyring".
>
> And sorry to bother you again. Is it possible to create a minimum privilege
> client for the api to run?
>
>
>
> Best wishes,
> Mika
>
>
> 2017-03-24 19:32 GMT+08:00 Brad Hubbard :
>>
>> On Fri, Mar 24, 2017 at 8:20 PM, Mika c  wrote:
>> > Hi Brad,
>> >  Thanks for your reply. The environment already created keyring file
>> > and
>> > put it in /etc/ceph but not working.
>>
>> What was it called?
>>
>> > I have to write config into ceph.conf like below.
>> >
>> > ---ceph.conf start---
>> > [client.symphony]
>> > log_file = /
>> > var/log/ceph/rest-api.log
>> >
>> > keyring = /etc/ceph/ceph.client.symphony
>> > public addr =
>> > 0.0.0.0
>> > :5
>> > 000
>> >
>> > restapi base url = /api/v0.1
>> > ---ceph.conf
>> > end
>> > ---
>> >
>> >
>> > Another question, have I must setting capabilities for this client like
>> > admin ?
>> > But I just want to take some information like health or df.
>> >
>> > If this client setting
>> > for a particular
>> > capabilities
>> > like..
>> > ---
>> > ---
>> >
>> > client.symphony
>> >key: AQBP8NRYGehDKRAAzyChAvAivydLqRBsHeTPjg==
>> >caps: [mon] allow r
>> >caps: [osd] allow r
>> > x
>> > ---
>> > ---
>> > Error list:
>> > Traceback (most recent call last):
>> >  File "/usr/bin/ceph-rest-api", line 59, in 
>> >rest,
>> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 495, in
>> > generate_a
>> > pp
>> >addr, port = api_setup(app, conf, cluster, clientname, clientid,
>> > args)
>> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 146, in
>> > api_setup
>> >target=('osd', int(osdid)))
>> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 84, in
>> > get_command
>> > _descriptions
>> >raise EnvironmentError(ret, err)
>> > EnvironmentError: [Errno -1] Can't get command descriptions:
>> >
>> >
>> >
>> >
>> > Best wishes,
>> > Mika
>> >
>> >
>> > 2017-03-24 16:21 GMT+08:00 Brad Hubbard :
>> >>
>> >> On Fri, Mar 24, 2017 at 4:06 PM, Mika c  wrote:
>> >> > Hi all,
>> >> >  Same question with CEPH 10.2.3 and 11.2.0.
>> >> >   Is this command only for client.admin ?
>> >> >
>> >> > client.symphony
>> >> >key: AQD0tdRYjhABEhAAaG49VhVXBTw0MxltAiuvgg==
>> >> >caps: [mon] allow *
>> >> >caps: [osd] allow *
>> >> >
>> >> > Traceback (most recent call last):
>> >> >  File "/usr/bin/ceph-rest-api", line 43, in 
>> >> >rest,
>> >> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 504,
>> >> > in
>> >> > generate_a
>> >> > pp
>> >> >addr, port = api_setup(app, conf, cluster, clientname, clientid,
>> >> > args)
>> >> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 106,
>> >> > in
>> >> > api_setup
>> >> >app.ceph_cluster.connect()
>> >> >  File "rados.pyx", line 811, in rados.Rados.connect
>> >> > (/tmp/buildd/ceph-11.2.0/obj-x
>> >> > 86_64-linux-gnu/src/pybind/rados/pyrex/rados.c:10178)
>> >> > rados.ObjectNotFound: error connecting to the cluster
>> >>
>> >> # strace -eopen /bin/ceph-rest-api |& grep keyring
>> >> open("/etc/ceph/ceph.client.restapi.keyring", O_RDONLY) = -1 ENOENT
>> >> (No such file or directory)
>> >> open("/etc/ceph/ceph.keyring", O_RDONLY) = -1 ENOENT (No such file or
>> >> directory)
>> >> open("/etc/ceph/keyring", O_RDONLY) = -1 ENOENT (No such file or
>> >> directory)
>> >> open("/etc/ceph/keyring.bin", O_RDONLY) = -1 ENOENT (No such file or
>> >> directory)
>> >>
>> >> # ceph auth get-or-create client.restapi mon 'allow *' mds 'allow *'
>> >> osd 'allow *' >/etc/ceph/ceph.client.restapi.keyring
>> >>
>> >> # /bin/ceph-rest-api
>> >>  * Running on http://0.0.0.0:5000/
>> >>
>> >> >
>> >> >
>> >> >
>> >> > Best wishes,
>> >> > Mika
>> >> >
>> >> >
>> >> > 2016-03-03 12:25 GMT+08:00 Shinobu Kinjo :
>> >> >>
>> >> >> Yes.
>> >> >>
>> >> >> On Wed, Jan 27, 2016 at 1:10 PM, Dan Mick  wrote:
>> >> >> > Is the client.test-admin key in the keyring read by ceph-rest-api?
>> >> >> >
>> >> >> > On 01/22/2016 04:05 PM, Shinobu Kinjo wrote:
>> >> >> >> Does anyone have any idea about that?
>> >> >> >>
>> >> >> >> Rgds,
>> >> >> >> Shinobu
>> >> >> >>
>> >> >> >> - Original Message -
>> >> >> >> From: "Shinobu Kinjo" 
>> >> >> >> To: "ceph-users" 
>> >> >> >> Sent: Friday, January 22, 2016 7:15:36 AM
>> >> >> >> Subject: ceph-rest-api's behavior
>> >> >> >>
>> >> >> >> Hello,
>> >> >> >>
>> >> >> >> "ceph-rest-api" works greatly with client.admin.
>> >> >> 

Re: [ceph-users] ceph-rest-api's behavior

2017-03-28 Thread Mika c
Hi Brad,
   Thanks for your help. I found that's my problem. Forget attach file name
with words ''keyring".

And sorry to bother you again. Is it possible to create a minimum privilege
client for the api to run?



Best wishes,
Mika


2017-03-24 19:32 GMT+08:00 Brad Hubbard :

> On Fri, Mar 24, 2017 at 8:20 PM, Mika c  wrote:
> > Hi Brad,
> >  Thanks for your reply. The environment already created keyring file
> and
> > put it in /etc/ceph but not working.
>
> What was it called?
>
> > I have to write config into ceph.conf like below.
> >
> > ---ceph.conf start---
> > [client.symphony]
> > log_file = /
> > var/log/ceph/rest-api.log
> >
> > keyring = /etc/ceph/ceph.client.symphony
> > public addr =
> > 0.0.0.0
> > :5
> > 000
> >
> > restapi base url = /api/v0.1
> > ---ceph.conf
> > end
> > ---
> >
> >
> > Another question, have I must setting capabilities for this client like
> > admin ?
> > But I just want to take some information like health or df.
> >
> > If this client setting
> > for a particular
> > capabilities
> > like..
> > ---
> > ---
> >
> > client.symphony
> >key: AQBP8NRYGehDKRAAzyChAvAivydLqRBsHeTPjg==
> >caps: [mon] allow r
> >caps: [osd] allow r
> > x
> > ---
> > ---
> > Error list:
> > Traceback (most recent call last):
> >  File "/usr/bin/ceph-rest-api", line 59, in 
> >rest,
> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 495, in
> > generate_a
> > pp
> >addr, port = api_setup(app, conf, cluster, clientname, clientid, args)
> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 146, in
> > api_setup
> >target=('osd', int(osdid)))
> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 84, in
> > get_command
> > _descriptions
> >raise EnvironmentError(ret, err)
> > EnvironmentError: [Errno -1] Can't get command descriptions:
> >
> >
> >
> >
> > Best wishes,
> > Mika
> >
> >
> > 2017-03-24 16:21 GMT+08:00 Brad Hubbard :
> >>
> >> On Fri, Mar 24, 2017 at 4:06 PM, Mika c  wrote:
> >> > Hi all,
> >> >  Same question with CEPH 10.2.3 and 11.2.0.
> >> >   Is this command only for client.admin ?
> >> >
> >> > client.symphony
> >> >key: AQD0tdRYjhABEhAAaG49VhVXBTw0MxltAiuvgg==
> >> >caps: [mon] allow *
> >> >caps: [osd] allow *
> >> >
> >> > Traceback (most recent call last):
> >> >  File "/usr/bin/ceph-rest-api", line 43, in 
> >> >rest,
> >> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 504,
> in
> >> > generate_a
> >> > pp
> >> >addr, port = api_setup(app, conf, cluster, clientname, clientid,
> >> > args)
> >> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 106,
> in
> >> > api_setup
> >> >app.ceph_cluster.connect()
> >> >  File "rados.pyx", line 811, in rados.Rados.connect
> >> > (/tmp/buildd/ceph-11.2.0/obj-x
> >> > 86_64-linux-gnu/src/pybind/rados/pyrex/rados.c:10178)
> >> > rados.ObjectNotFound: error connecting to the cluster
> >>
> >> # strace -eopen /bin/ceph-rest-api |& grep keyring
> >> open("/etc/ceph/ceph.client.restapi.keyring", O_RDONLY) = -1 ENOENT
> >> (No such file or directory)
> >> open("/etc/ceph/ceph.keyring", O_RDONLY) = -1 ENOENT (No such file or
> >> directory)
> >> open("/etc/ceph/keyring", O_RDONLY) = -1 ENOENT (No such file or
> >> directory)
> >> open("/etc/ceph/keyring.bin", O_RDONLY) = -1 ENOENT (No such file or
> >> directory)
> >>
> >> # ceph auth get-or-create client.restapi mon 'allow *' mds 'allow *'
> >> osd 'allow *' >/etc/ceph/ceph.client.restapi.keyring
> >>
> >> # /bin/ceph-rest-api
> >>  * Running on http://0.0.0.0:5000/
> >>
> >> >
> >> >
> >> >
> >> > Best wishes,
> >> > Mika
> >> >
> >> >
> >> > 2016-03-03 12:25 GMT+08:00 Shinobu Kinjo :
> >> >>
> >> >> Yes.
> >> >>
> >> >> On Wed, Jan 27, 2016 at 1:10 PM, Dan Mick  wrote:
> >> >> > Is the client.test-admin key in the keyring read by ceph-rest-api?
> >> >> >
> >> >> > On 01/22/2016 04:05 PM, Shinobu Kinjo wrote:
> >> >> >> Does anyone have any idea about that?
> >> >> >>
> >> >> >> Rgds,
> >> >> >> Shinobu
> >> >> >>
> >> >> >> - Original Message -
> >> >> >> From: "Shinobu Kinjo" 
> >> >> >> To: "ceph-users" 
> >> >> >> Sent: Friday, January 22, 2016 7:15:36 AM
> >> >> >> Subject: ceph-rest-api's behavior
> >> >> >>
> >> >> >> Hello,
> >> >> >>
> >> >> >> "ceph-rest-api" works greatly with client.admin.
> >> >> >> But with client.test-admin which I created just after building the
> >> >> >> Ceph
> >> >> >> cluster , it does not work.
> >> >> >>
> >> >> >>  ~$ ceph auth get-or-create client.test-admin mon 'allow *' mds
> >> >> >> 'allow
> >> >> >> *' osd 'allow *'
> >> >> >>
> >> >> >>  ~$ sudo ceph auth list
> >> >> >>  installed auth entries:
> >> >> >>