Re: [ceph-users] IOPS requirements

2016-06-17 Thread Christian Balzer

Hello,

On Fri, 17 Jun 2016 14:51:08 +0200 Gandalf Corvotempesta wrote:

> 2016-06-17 10:03 GMT+02:00 Christian Balzer :
> > I'm unfamilar with Xen and Xenserver (the later doesn't support RBD,
> > btw), but if you can see all the combined activity of your VMs on your
> > HW in the dom0 like with KVM/qemu, a simple "iostat" or "iostat -x"
> > will give you the average IOPS of a device.
> > Same of course within a VM.
> 
> I'm able to see the combined activity directly from Dom0.
>
Good.
 
> With "iostat" should I look for 'tps' column ?
>
Yes, though you want to look at "iostat -x" to get a breakdown between
reads and writes.
What's the output from a single run (averages since boot)?

> > However that's the average, you're likely to have peaks much higher
> > than that.
> > For this you'll either have to collect and graph that data
> > (collectd/graphite, etc) and/or run something like atop during peak
> > hours and watch it or have it write logs with a high sample rate.
> > As in, atop can keep a log of all states, but the default interval of
> > 10 minutes with Debian is likely too course to spot real peaks.
> > See the atop documentation.
> 
> Running "iostat" every seconds, i can see about 800-1000 tps
> (transactions per seconds)
That sounds extremely high, is that more or less consistent?
How many VMs is that for?
What are you looking at, as in are those individual disks/SSDs, a raid
(what kind)?

> I can try to install "atop" but i'm totally new to this, I don't know
> how use. Any hint?
For starters, run atop in a large window, you will see a lot of details
and field names are both pretty obvious and explained in the documentation.

> Currently i'm reading the "atop" man page but i don't want to run in
> for the whole weekend
> with wrong parameters.
> 
Weekend would be the wrong time to look for peaks anyway, wouldn't it?
With Debian there's a /etc/default/atop config file, you would want to
change the INTERVAL from 600 to at least 60 or even less and then run it
for a few hours during peak times.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reweight command

2016-06-17 Thread Christian Balzer
On Fri, 17 Jun 2016 16:29:31 +0530 M Ranga Swami Reddy wrote:

> Hello,
> what is the diff between below reweight command and which one is
> preferable to use?
> 
> ceph osd rewight 
> 
> ceph osd crush reweight 
> 
Never mind the quite recent and frequent discussions about this,
googling for
"ceph differece between reweight and crush reweight"

gives you these primary results:
http://ceph.com/planet/difference-between-ceph-osd-reweight-and-ceph-osd-crush-reweight/
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-June/040967.html

Google is your friend (when they're not giving your mails to the NSA).

Christian
> 
> Thanks
> Swami
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Stripe/Chunk Size (Order Number) Pros Cons

2016-06-17 Thread Lazuardi Nasution
Hi Wido,

Do you mean TCP connections overhead is on OSD nodes or on Ceph clients? If
it is the TCP connections on Ceph clients, I think the maximum number will
be not more than the number of OSDs and no matter of the chunk/stripe size
the number of connection will still same if the image is spreaded through
the all OSDs. Isn't it?

On your 64MB stripes case, how about the performance of random I/O access?
Any suggestion of chunk/stripe size for database block image?

Best regards,



> Date: Fri, 17 Jun 2016 12:45:22 +0200 (CEST)
> From: Wido den Hollander 
> To: ceph-users@lists.ceph.com, Lazuardi Nasution
> 
> Subject: Re: [ceph-users] RBD Stripe/Chunk Size (Order Number) Pros
> Cons
> Message-ID: <100239228.47.1466160323...@ox.pcextreme.nl>
> Content-Type: text/plain; charset=UTF-8
>
>
> > Op 17 juni 2016 om 12:12 schreef Lazuardi Nasution <
> mrxlazuar...@gmail.com>:
> >
> >
> > Hi Mark,
> >
> > What overhead do you mean? Can it be negligible if I use 4KB (extremly,
> > same with I/O size) stripe/chunk size for making sure that all random I/O
> > will spreaded through all OSDs?
> >
>
> Keep in mind that this involves opening additional TCP connections to
> OSDs. That will come with some overhead. Especially when new connections
> have to go through the handshake process.
>
> I am using 64MB stripes in a case with a customer. They only need
> sequential writes and reads at high speed. Works great for them.
>
> Wido
>
> > Anyway, I love coffee too :)
> >
> > Best regards,
> >
> >
> > > Date: Thu, 16 Jun 2016 04:01:37 -0500
> > > From: Mark Nelson 
> > > To: ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] RBD Stripe/Chunk Size (Order Number) Pros
> > > Cons
> > > Message-ID: 
> > > Content-Type: text/plain; charset=windows-1252; format=flowed
> > >
> > >
> > >
> > > On 06/16/2016 03:54 AM, Mark Nelson wrote:
> > > > Hi,
> > > >
> > > > larger stripe size (to an extent) will generally improve large
> > > > sequential read and write performance.
> > >
> > > Oops, I should have had my coffee. I missed a sentence here.  larger
> > > strip size will generally improve large sequential read and write
> > > performance.  Smaller stripe size can provide some of the advantages
> you
> > > mention below, but there's overhead though.  Ok fixed, now back to find
> > > coffee. :)
> > >
> > > > There's overhead though.  It
> > > > means more objects which can slow things down at the filestore level
> > > > when PG splits occur and also potentially means more inodes fall out
> of
> > > > cache, longer syncfs, etc.  On the other hand, if using cache
> tiering,
> > > > smaller objects means less data to promote which can be a big win for
> > > > small IO.
> > > >
> > > > Basically the answer is that there are pluses and minuses, and the
> exact
> > > > behavior will depend on your kernel configuration, hardware, and use
> > > > case.  I think 4MB has been a fairly good default thus far (might
> change
> > > > with bluestore), but tuning for a specific use case may mean a
> smaller
> > > > or larger size is better.
> > > >
> > > > Mark
> > > >
> > > > On 06/16/2016 03:20 AM, Lazuardi Nasution wrote:
> > > >> Hi,
> > > >>
> > > >> I'm looking for some pros cons related to RBD stripe/chunk size
> > > >> indicated by image order number. Default is 4MB (order 22), but
> > > >> OpenStack use 8MB (order 23) as default. What if we use smaller size
> > > >> (lower order number), isn't it more chance that image objects is
> > > >> spreaded through OSDs and cached on OSD nodes RAM? What if we use
> bigger
> > > >> size (higher order number), isn't it more chance that image objects
> is
> > > >> cached as contiguos blocks and may be have read ahead advantage?
> Please
> > > >> give your opnion and reason.
> > > >>
> > > >> Best regards,
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph OSD journal utilization

2016-06-17 Thread EP Komarla
Hi,

I am looking for a way to monitor the utilization of OSD journals - by 
observing the utilization pattern over time, I can determine if I have over 
provisioned them or not. Is there a way to do this?

When I googled on this topic, I saw one similar request about 4 years back.  I 
am wondering if there is some traction on this topic since then.

Thanks a lot.

- epk

Legal Disclaimer:
The information contained in this message may be privileged and confidential. 
It is intended to be read only by the individual or entity to whom it is 
addressed or by their designee. If the reader of this message is not the 
intended recipient, you are on notice that any distribution of this message, in 
any form, is strictly prohibited. If you have received this message in error, 
please immediately notify the sender and delete or destroy any copy of this 
message!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Installing ceph monitor on Ubuntu denial: segmentation fault

2016-06-17 Thread Brad Hubbard
On Fri, May 20, 2016 at 7:32 PM, Daniel Wilhelm  wrote:
> Hi
>
>
>
> I relieved to have found a solution to this problem.
>
>
>
> The ansible script for generating the key did not pass the key to the
> following command line and sent therefore an empty string to this script
> (see monitor_secret).
>
>
>
> ceph-authtool /var/lib/ceph/tmp/keyring.mon.{{ monitor_name }}
> --create-keyring --name=mon. --add-key={{ monitor_secret }} --cap mon 'allow
> *'

This issue is being handled in existing tracker
http://tracker.ceph.com/issues/2904

>
>
>
> Now when this invalid key is being used to create the ceph file systems it
> seems to be copied to the location indicated below
> (/var/lib/ceph/mon/ceph-control01/keyring), and is crashing the ceph command
> below.
>
>
>
> Maybe a developer should have a look into this. It seems to me as if a
> base64 decoding went wrong in his case and crashes the process.

I was able to reproduce this and have created a patch. I've opened
http://tracker.ceph.com/issues/16266
for it.

Cheers,
Brad

>
>
>
> Thanks anyway
>
>
>
> Cheers
>
>
>
> Daniel
>
>
>
>
>
> From: Daniel Wilhelm
> Sent: Donnerstag, 19. Mai 2016 12:00
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Installing ceph monitor on Ubuntu denial: segmentation
> fault
>
>
>
> Hi
>
>
>
> I am trying to install ceph with the ceph ansible role:
> https://github.com/shieldwed/ceph-ansible.
>
>
>
> I had to fix some ansible tasks to work correctly with ansible 2.0.2.0 but
> now it seems to work quite well.
>
> Sadly I have now come across a bug, I cannot solve myself:
>
>
>
> When ansible is starting the service ceph-mon@ceph-control01.service,
> ceph-create-keys@control01.service gets started as a dependency to create
> the admin key.
>
>
>
> Within the unit log the following lines are shown:
>
>
>
> May 19 11:42:14 control01 ceph-create-keys[21818]:
> INFO:ceph-create-keys:Talking to monitor...
>
> May 19 11:42:14 control01 ceph-create-keys[21818]:
> INFO:ceph-create-keys:Cannot get or create admin key
>
> May 19 11:42:15 control01 ceph-create-keys[21818]:
> INFO:ceph-create-keys:Talking to monitor...
>
> May 19 11:42:15 control01 ceph-create-keys[21818]:
> INFO:ceph-create-keys:Cannot get or create admin key
>
>
>
> And so on.
>
>
>
> Since this script is calling “ceph --cluster=ceph --name=mon.
> --keyring=/var/lib/ceph/mon/ceph-control01/keyring auth get-or-create
> client.admin mon allow * osd allow * mds allow *”
>
>
>
> I tried to call this command myself and got this as a result:
>
> Segmentation fault (core dumped)
>
>
>
> As for the ceph versions, I tried two different with the same result:
>
> ·   Ubuntu integrated: ceph 10.1.2
>
> ·   Official stable repo: http://download.ceph.com/debian-jewel so:
> 10.2.1
>
>
>
> How can I circumvent this problem? Or is there any solution to that?
>
>
>
> Thanks
>
>
>
> Daniel
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Performance Testing

2016-06-17 Thread David
On 17 Jun 2016 3:33 p.m., "Carlos M. Perez"  wrote:

>
> Hi,
>
>
>
> I found the following on testing performance  -
http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance
and have a few questions:
>
>
>
> -  By testing the block device Do the performance tests take the
overall cluster performance (how long it takes the data to replicate to the
other nodes based on copies, etc.)? or is it just a local portion, ignoring
the backend/external ceph processes?  We’re using ceph as block devices for
proxmox storage for kvms/containers.
>

I'm not sure what you mean by "local portion", are you doing the
benchmarking directly on an OSD node? When writing with rbd bench or fio,
the writes will be distributed across the cluster according to your cluster
config so the performance will reflect the various attributes of your
cluster (replication count, journal speed, network latency etc.).

>
>
> -  If the above is as a whole, is there a way to test the “local”
storage independently of the cluster/pool as a whole.  Basically, I’m
testing a few different journal drive options (Intel S3700, Samsung SM863)
and controllers (ICH, LSI, Adaptec) and would prefer to change hardware in
one node (also limits purchasing requirements for testing), rather than
having to replicate it in all nodes.  Getting close enough numbers to a
fully deployed setup is good enough for .  We currently have three nodes,
two pools, 6 OSDs per node, and trying to find an appropriate drive before
we scale the system and start putting workloads.
>

If I understand correctly, you're doing your rbd testing on an OSD node and
you want to just test the performance of the OSD's in that node. Localising
in this way isn't really a common use case for Ceph. You could potentially
create a new pool containing just the OSD's in the node but you would need
to play around with your crush map to get that working e.g changing the
'osd crush chooseleaf type'.

>
>
> -  Write Cache – In most benchmarking scenarios, it’s said to
disable write caching on the drive.  However, according to this (
http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance)
it seems to indicate that “Newer kernels should work fine” – does this mean
that on a “modern” kernel this setting is not necessary since it’s
accounted for during the use of the journal, or that the disabling should
work fine?  We’ve seen vast differences using Sebastien Han’s guide (
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/)
but that uses fio directly to the device (which will clear out the
partitions on a “live” journal…yes it was a test system so nothing major,
just an unexpected issue of the OSD’s not coming up after reboot).  We’ve
been disabling it but just want to check to see if this is an unnecessary
step, or a “best practice” step that should be done regardless.
>

I think you meant this

link. It is saying that on kernels newer than 2.6.33 there is no need to
disable the write cache on a raw disk being used for a journal. That is
because the data is properly flushed to the disk before it sends an ACK.


>
>
> Thanks in advance….
>
>
>
> Carlos M. Perez
>
> CMP Consulting Services
>
> 305-669-1515
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mysterious cache-tier flushing behavior

2016-06-17 Thread Gregory Farnum
Oh, space available drops, not space consumed. Not sure then; caching
has changed a bunch since I worked with it.

On Fri, Jun 17, 2016 at 9:49 AM, Christian Balzer  wrote:
>
>
> Hello Greg,
>
> The opposite, space is consumed:
>
> http://i.imgur.com/ALBR5dj.png
>
> I can assure you, in that cluster objects don't get deleted.
>
> Christian
>
> On Fri, 17 Jun 2016 08:57:31 -0700 Gregory Farnum wrote:
>
>> Sounds like you've got deleted objects in the cache tier getting flushed
>> (i.e., deleted) in the base tier.
>> -Greg
>>
>> On Thursday, June 16, 2016, Christian Balzer  wrote:
>>
>> >
>> > Hello devs and other sage(sic) people,
>> >
>> > Ceph 0.94.5, cache tier in writeback mode.
>> >
>> > As mentioned before, I'm running a cron job every day at 23:40 dropping
>> > the flush dirty target by 4% (0.60 to 0.56) and then re-setting it to
>> > the previous value 10 minutes later.
>> > The idea is to have all the flushing done during off-peak hours and
>> > that works beautifully.
>> > No flushes during day time, only lightweight evicts.
>> >
>> > Now I'm graphing all kinds of Ceph and system related info with
>> > graphite and noticed something odd.
>> >
>> > When the flushes are initiated, the HDD space of the OSDs in the
>> > backing store drops by a few GB, pretty much the amount of dirty
>> > objects over the threshold accumulated during a day, so no surprise
>> > there. This happens every time when that cron job runs.
>> >
>> > However only on some days this drop (more pronounced on those days) is
>> > accompanied by actual:
>> > a) flushes according to the respective Ceph counters
>> > b) network traffic from the cache-tier to the backing OSDs
>> > c) HDD OSD writes (both from OSD perspective and actual HDD)
>> > d) cache pool SSD reads (both from OSD perspective and actual SSD)
>> >
>> > So what is happening on the other days?
>> >
>> > The space clearly is gone and triggered by the "flush", but no data was
>> > actually transfered to the HDD OSD nodes, nor was there anything
>> > (newly) written.
>> >
>> > Dazed and confused,
>> >
>> > Christian
>> > --
>> > Christian BalzerNetwork/Systems Engineer
>> > ch...@gol.com Global OnLine Japan/Rakuten
>> > Communications
>> > http://www.gol.com/
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com 
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mysterious cache-tier flushing behavior

2016-06-17 Thread Christian Balzer


Hello Greg,

The opposite, space is consumed:

http://i.imgur.com/ALBR5dj.png

I can assure you, in that cluster objects don't get deleted.

Christian

On Fri, 17 Jun 2016 08:57:31 -0700 Gregory Farnum wrote:

> Sounds like you've got deleted objects in the cache tier getting flushed
> (i.e., deleted) in the base tier.
> -Greg
> 
> On Thursday, June 16, 2016, Christian Balzer  wrote:
> 
> >
> > Hello devs and other sage(sic) people,
> >
> > Ceph 0.94.5, cache tier in writeback mode.
> >
> > As mentioned before, I'm running a cron job every day at 23:40 dropping
> > the flush dirty target by 4% (0.60 to 0.56) and then re-setting it to
> > the previous value 10 minutes later.
> > The idea is to have all the flushing done during off-peak hours and
> > that works beautifully.
> > No flushes during day time, only lightweight evicts.
> >
> > Now I'm graphing all kinds of Ceph and system related info with
> > graphite and noticed something odd.
> >
> > When the flushes are initiated, the HDD space of the OSDs in the
> > backing store drops by a few GB, pretty much the amount of dirty
> > objects over the threshold accumulated during a day, so no surprise
> > there. This happens every time when that cron job runs.
> >
> > However only on some days this drop (more pronounced on those days) is
> > accompanied by actual:
> > a) flushes according to the respective Ceph counters
> > b) network traffic from the cache-tier to the backing OSDs
> > c) HDD OSD writes (both from OSD perspective and actual HDD)
> > d) cache pool SSD reads (both from OSD perspective and actual SSD)
> >
> > So what is happening on the other days?
> >
> > The space clearly is gone and triggered by the "flush", but no data was
> > actually transfered to the HDD OSD nodes, nor was there anything
> > (newly) written.
> >
> > Dazed and confused,
> >
> > Christian
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com Global OnLine Japan/Rakuten
> > Communications
> > http://www.gol.com/
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Slack-IRC integration

2016-06-17 Thread Patrick McGarry
Hey cephers,

For those who have been asking for an official Ceph Slack channel, we
now have a bridge between #ceph/#ceph-devel and corresponding Slack
channels.

https://ceph-storage.slack.com/signup

Right now the email domains that will auto-accept registration are
relatively limited, but any slack member can invite you. If you would
like to use Slack feel free to ask in #ceph / #ceph-devel, or just
reply to this email and I’ll get you added.

As a note, we will not be moving away from an open platform like IRC,
this is just extending our communication channels to allow a broader
audience to participate if they so choose. If you have questions or
concerns please feel free to contact me. Thanks.


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mysterious cache-tier flushing behavior

2016-06-17 Thread Gregory Farnum
Sounds like you've got deleted objects in the cache tier getting flushed
(i.e., deleted) in the base tier.
-Greg

On Thursday, June 16, 2016, Christian Balzer  wrote:

>
> Hello devs and other sage(sic) people,
>
> Ceph 0.94.5, cache tier in writeback mode.
>
> As mentioned before, I'm running a cron job every day at 23:40 dropping
> the flush dirty target by 4% (0.60 to 0.56) and then re-setting it to the
> previous value 10 minutes later.
> The idea is to have all the flushing done during off-peak hours and that
> works beautifully.
> No flushes during day time, only lightweight evicts.
>
> Now I'm graphing all kinds of Ceph and system related info with graphite
> and noticed something odd.
>
> When the flushes are initiated, the HDD space of the OSDs in the backing
> store drops by a few GB, pretty much the amount of dirty objects over the
> threshold accumulated during a day, so no surprise there.
> This happens every time when that cron job runs.
>
> However only on some days this drop (more pronounced on those days) is
> accompanied by actual:
> a) flushes according to the respective Ceph counters
> b) network traffic from the cache-tier to the backing OSDs
> c) HDD OSD writes (both from OSD perspective and actual HDD)
> d) cache pool SSD reads (both from OSD perspective and actual SSD)
>
> So what is happening on the other days?
>
> The space clearly is gone and triggered by the "flush", but no data was
> actually transfered to the HDD OSD nodes, nor was there anything (newly)
> written.
>
> Dazed and confused,
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Rakuten
> Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] image map failed

2016-06-17 Thread George Shuklin

(please reply to the maillist)

Next step is check status for ceph (ceph status). If there is issues 
with OSD/placement, IO may hang indefinitely.


If HEALTH_OK, try reduce pool min_size (ceph osd pool set data min_size 
1), and retry.


On 06/17/2016 04:01 PM, Ishmael Tsoaela wrote:

cluster-admin@nodeB:~/.ssh/ceph-cluster$ sudo rbd -p data ls
data_01


network connectivity is fine as per icmp output, is there any other 
way I can confirm this?


On Fri, Jun 17, 2016 at 2:59 PM, George Shuklin 
> wrote:


What did

sudo rbd -p data ls

show?

If it freezes too, issue is with pool itself (ceph health) or
network connectivity.


On 06/17/2016 03:37 PM, Ishmael Tsoaela wrote:

Hi,

Thank you for the response but with sudo all it does is freeze:

rbd map data_01 --pool data


cluster-admin@nodeB:~/.ssh/ceph-cluster$
 date && sudo
rbd map data_01 --pool data && date
Fri Jun 17 14:36:41 SAST 2016





On Fri, Jun 17, 2016 at 2:01 PM, Ishmael Tsoaela
> wrote:

Hi,

Will someone please assist, I am new to cepph and I am trying
to map image and this happens:

cluster-admin@nodeB:~/.ssh/
ceph-cluster$ rbd map
data_01 --pool data
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg |
tail" or so.
rbd: map failed: (13) Permission denied

If someone could help it would be great

cluster-admin@nodeB:~/.ssh/
ceph-cluster$ ceph -v
ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)

cluster-admin@nodeB:~/.ssh/
ceph-cluster$
lsb_release -r
Release: 14.04




___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Performance Testing

2016-06-17 Thread Carlos M. Perez
Hi,

I found the following on testing performance  - 
http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance 
and have a few questions:


-  By testing the block device Do the performance tests take the 
overall cluster performance (how long it takes the data to replicate to the 
other nodes based on copies, etc.)? or is it just a local portion, ignoring the 
backend/external ceph processes?  We're using ceph as block devices for proxmox 
storage for kvms/containers.



-  If the above is as a whole, is there a way to test the "local" 
storage independently of the cluster/pool as a whole.  Basically, I'm testing a 
few different journal drive options (Intel S3700, Samsung SM863) and 
controllers (ICH, LSI, Adaptec) and would prefer to change hardware in one node 
(also limits purchasing requirements for testing), rather than having to 
replicate it in all nodes.  Getting close enough numbers to a fully deployed 
setup is good enough for .  We currently have three nodes, two pools, 6 OSDs 
per node, and trying to find an appropriate drive before we scale the system 
and start putting workloads.



-  Write Cache - In most benchmarking scenarios, it's said to disable 
write caching on the drive.  However, according to this 
(http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance) 
it seems to indicate that "Newer kernels should work fine" - does this mean 
that on a "modern" kernel this setting is not necessary since it's accounted 
for during the use of the journal, or that the disabling should work fine?  
We've seen vast differences using Sebastien Han's guide 
(http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/)
 but that uses fio directly to the device (which will clear out the partitions 
on a "live" journal...yes it was a test system so nothing major, just an 
unexpected issue of the OSD's not coming up after reboot).  We've been 
disabling it but just want to check to see if this is an unnecessary step, or a 
"best practice" step that should be done regardless.

Thanks in advance

Carlos M. Perez
CMP Consulting Services
305-669-1515

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Want to present at FISL Brazil?

2016-06-17 Thread Patrick McGarry
Hey cephers,

As a part of my budget this year I am starting the process of raising
awareness around Ceph in the LATAM region. One of the things I am
starting with is having the Ceph Community sponsor FISL in Porto
Alegre, Brazil (13-16 July).

Thankfully our sponsorship negotiations we have been blessed with an
unexpected additional speaking slot. Rather than monopolize this
within the confines of Red Hat, I wanted to extend an offer to the
community to come present and socialize at the booth with us. If
anyone is interested please let me know as soon as humanly possible.

Speaking slots appear to be ~1hr and the talk can be anything
Ceph-related. If you are interested please send me:

1) Name
2) Org
3) Talk Title
4) Abstract
5) Speaker Bio

US citizens, keep in mind the visa process for Brazil is quite
onerous, so it may be costly in both time and money to get it
processed in time.

Thanks!

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Bug found with CentOS 7.2

2016-06-17 Thread Jason Gress
I don't think it's the symlink that's the problem, but the path
permissions being something other than open.  That may be why you didn't
see this.  I am hoping symlinks still work, as I know we will need them
for our application.

Jason

On 6/17/16, 1:50 AM, "ceph-users on behalf of Oliver Dzombic"

wrote:

>Hi,
>
>just to verify this:
>
>no symlink usage == no problem/bug
>
>right ?
>
>-- 
>Mit freundlichen Gruessen / Best regards
>
>Oliver Dzombic
>IP-Interactive
>
>mailto:i...@ip-interactive.de
>
>Anschrift:
>
>IP Interactive UG ( haftungsbeschraenkt )
>Zum Sonnenberg 1-3
>63571 Gelnhausen
>
>HRB 93402 beim Amtsgericht Hanau
>Geschäftsführung: Oliver Dzombic
>
>Steuer Nr.: 35 236 3622 1
>UST ID: DE274086107
>
>
>Am 17.06.2016 um 06:11 schrieb Yan, Zheng:
>> On Fri, Jun 17, 2016 at 5:03 AM, Jason Gress 
>>wrote:
>>> This is the latest default kernel with CentOS7.  We also tried a newer
>>> kernel (from elrepo), a 4.4 that has the same problem, so I don't think
>>> that is it.  Thank you for the suggestion though.
>>>
>>> We upgraded our cluster to the 10.2.2 release today, and it didn't
>>>resolve
>>> all of the issues.  It's possible that a related issue is actually
>>> permissions.  Something may not be right with our config (or a bug)
>>>here.
>>>
>>> While testing we noticed that there may actually be two issues here.
>>>I am
>>> unsure, as we noticed that the most consistent way to reproduce our
>>>issue
>>> is to use vim or sed -i which does in place renames:
>>>
>>> [root@ftp01 cron]# ls -la
>>> total 3
>>> drwx--   1 root root 2044 Jun 16 15:50 .
>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>> -rw-r--r--   1 root root  300 Jun 16 15:50 file
>>> -rw---   1 root root 2044 Jun 16 13:47 root
>>> [root@ftp01 cron]# sed -i 's/^/#/' file
>>> sed: cannot rename ./sedfB2CkO: Permission denied
>>>
>>>
>>> Strangely, adding or deleting files works fine, it's only renaming that
>>> fails.  And strangely I was able to successfully edit the file on
>>>ftp02:
>>>
>>> [root@ftp02 cron]# sed -i 's/^/#/' file
>>> [root@ftp02 cron]# ls -la
>>> total 3
>>> drwx--   1 root root 2044 Jun 16 15:49 .
>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>> -rw-r--r--   1 root root  313 Jun 16 15:49 file
>>> -rw---   1 root root 2044 Jun 16 13:47 root
>>>
>>>
>>> Then it worked on ftp01 this time:
>>> [root@ftp01 cron]# ls -la
>>> total 3
>>> drwx--   1 root root 2357 Jun 16 15:49 .
>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>> -rw-r--r--   1 root root  313 Jun 16 15:49 file
>>> -rw---   1 root root 2044 Jun 16 13:47 root
>>>
>>>
>>> Then, I vim'd it successfully on ftp01... Then ran the sed again:
>>>
>>> [root@ftp01 cron]# sed -i 's/^/#/' file
>>> sed: cannot rename ./sedfB2CkO: Permission denied
>>> [root@ftp01 cron]# ls -la
>>> total 3
>>> drwx--   1 root root 2044 Jun 16 15:51 .
>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>> -rw-r--r--   1 root root  300 Jun 16 15:50 file
>>> -rw---   1 root root 2044 Jun 16 13:47 root
>>>
>>>
>>> And now we have the zero file problem again:
>>>
>>> [root@ftp02 cron]# ls -la
>>> total 2
>>> drwx--   1 root root 2044 Jun 16 15:51 .
>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>> -rw-r--r--   1 root root0 Jun 16 15:50 file
>>> -rw---   1 root root 2044 Jun 16 13:47 root
>>>
>>>
>>> Anyway, I wonder how much of this issue is related to that cannot
>>>rename
>>> issue above.  Here are our security settings:
>>>
>>> client.ftp01
>>> key: 
>>> caps: [mds] allow r, allow rw path=/ftp
>>> caps: [mon] allow r
>>> caps: [osd] allow rw pool=cephfs_metadata, allow rw
>>>pool=cephfs_data
>>> client.ftp02
>>> key: 
>>> caps: [mds] allow r, allow rw path=/ftp
>>> caps: [mon] allow r
>>> caps: [osd] allow rw pool=cephfs_metadata, allow rw
>>>pool=cephfs_data
>>>
>>>
>>> /ftp is the directory on cephfs under which cron lives; the full path
>>>is
>>> /ftp/cron .
>>>
>>> I hope this helps and thank you for your time!
>> 
>> I opened  ticket http://tracker.ceph.com/issues/16358. The bug is in
>> path restriction code. For now, the workaround is updating client caps
>> to not use path restriction.
>> 
>> Regards
>> Yan, Zheng
>> 
>>>
>>> Jason
>>>
>>> On 6/15/16, 4:43 PM, "John Spray"  wrote:
>>>
 On Wed, Jun 15, 2016 at 10:21 PM, Jason Gress 
 wrote:
> While trying to use CephFS as a clustered filesystem, we stumbled
>upon a
> reproducible bug that is unfortunately pretty serious, as it leads to
> data
> loss.  Here is the situation:
>
> We have two systems, named ftp01 and ftp02.  They are both running
> CentOS
> 7.2, with this kernel release and ceph packages:
>
> kernel-3.10.0-327.18.2.el7.x86_64

 That is an old-ish kernel to be using with cephfs.  It may well be the
 

[ceph-users] cluster ceph -s error

2016-06-17 Thread Ishmael Tsoaela
Hi All,

please assist to fix the error:

1 X admin
2 X admin(hosting admin as well)

4 osd each node


cluster a04e9846-6c54-48ee-b26f-d6949d8bacb4
 health HEALTH_ERR
819 pgs are stuck inactive for more than 300 seconds
883 pgs degraded
64 pgs stale
819 pgs stuck inactive
245 pgs stuck unclean
883 pgs undersized
17 requests are blocked > 32 sec
recovery 2/8 objects degraded (25.000%)
recovery 2/8 objects misplaced (25.000%)
crush map has legacy tunables (require argonaut, min is firefly)
crush map has straw_calc_version=0
 monmap e1: 1 mons at {nodeB=155.232.195.4:6789/0}
election epoch 7, quorum 0 nodeB
 osdmap e80: 10 osds: 5 up, 5 in; 558 remapped pgs
flags sortbitwise
  pgmap v480: 1064 pgs, 3 pools, 6454 bytes data, 4 objects
25791 MB used, 4627 GB / 4652 GB avail
2/8 objects degraded (25.000%)
2/8 objects misplaced (25.000%)
 819 undersized+degraded+peered
 181 active
  64 stale+active+undersized+degraded
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Bug found with CentOS 7.2

2016-06-17 Thread Jason Gress
Thank you very much - we will be testing this soon.

Jason

On 6/16/16, 11:11 PM, "Yan, Zheng"  wrote:

>On Fri, Jun 17, 2016 at 5:03 AM, Jason Gress  wrote:
>> This is the latest default kernel with CentOS7.  We also tried a newer
>> kernel (from elrepo), a 4.4 that has the same problem, so I don't think
>> that is it.  Thank you for the suggestion though.
>>
>> We upgraded our cluster to the 10.2.2 release today, and it didn't
>>resolve
>> all of the issues.  It's possible that a related issue is actually
>> permissions.  Something may not be right with our config (or a bug)
>>here.
>>
>> While testing we noticed that there may actually be two issues here.  I
>>am
>> unsure, as we noticed that the most consistent way to reproduce our
>>issue
>> is to use vim or sed -i which does in place renames:
>>
>> [root@ftp01 cron]# ls -la
>> total 3
>> drwx--   1 root root 2044 Jun 16 15:50 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  300 Jun 16 15:50 file
>> -rw---   1 root root 2044 Jun 16 13:47 root
>> [root@ftp01 cron]# sed -i 's/^/#/' file
>> sed: cannot rename ./sedfB2CkO: Permission denied
>>
>>
>> Strangely, adding or deleting files works fine, it's only renaming that
>> fails.  And strangely I was able to successfully edit the file on ftp02:
>>
>> [root@ftp02 cron]# sed -i 's/^/#/' file
>> [root@ftp02 cron]# ls -la
>> total 3
>> drwx--   1 root root 2044 Jun 16 15:49 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  313 Jun 16 15:49 file
>> -rw---   1 root root 2044 Jun 16 13:47 root
>>
>>
>> Then it worked on ftp01 this time:
>> [root@ftp01 cron]# ls -la
>> total 3
>> drwx--   1 root root 2357 Jun 16 15:49 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  313 Jun 16 15:49 file
>> -rw---   1 root root 2044 Jun 16 13:47 root
>>
>>
>> Then, I vim'd it successfully on ftp01... Then ran the sed again:
>>
>> [root@ftp01 cron]# sed -i 's/^/#/' file
>> sed: cannot rename ./sedfB2CkO: Permission denied
>> [root@ftp01 cron]# ls -la
>> total 3
>> drwx--   1 root root 2044 Jun 16 15:51 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  300 Jun 16 15:50 file
>> -rw---   1 root root 2044 Jun 16 13:47 root
>>
>>
>> And now we have the zero file problem again:
>>
>> [root@ftp02 cron]# ls -la
>> total 2
>> drwx--   1 root root 2044 Jun 16 15:51 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root0 Jun 16 15:50 file
>> -rw---   1 root root 2044 Jun 16 13:47 root
>>
>>
>> Anyway, I wonder how much of this issue is related to that cannot rename
>> issue above.  Here are our security settings:
>>
>> client.ftp01
>> key: 
>> caps: [mds] allow r, allow rw path=/ftp
>> caps: [mon] allow r
>> caps: [osd] allow rw pool=cephfs_metadata, allow rw
>>pool=cephfs_data
>> client.ftp02
>> key: 
>> caps: [mds] allow r, allow rw path=/ftp
>> caps: [mon] allow r
>> caps: [osd] allow rw pool=cephfs_metadata, allow rw
>>pool=cephfs_data
>>
>>
>> /ftp is the directory on cephfs under which cron lives; the full path is
>> /ftp/cron .
>>
>> I hope this helps and thank you for your time!
>
>I opened  ticket http://tracker.ceph.com/issues/16358. The bug is in
>path restriction code. For now, the workaround is updating client caps
>to not use path restriction.
>
>Regards
>Yan, Zheng
>
>>
>> Jason
>>
>> On 6/15/16, 4:43 PM, "John Spray"  wrote:
>>
>>>On Wed, Jun 15, 2016 at 10:21 PM, Jason Gress 
>>>wrote:
 While trying to use CephFS as a clustered filesystem, we stumbled
upon a
 reproducible bug that is unfortunately pretty serious, as it leads to
data
 loss.  Here is the situation:

 We have two systems, named ftp01 and ftp02.  They are both running
CentOS
 7.2, with this kernel release and ceph packages:

 kernel-3.10.0-327.18.2.el7.x86_64
>>>
>>>That is an old-ish kernel to be using with cephfs.  It may well be the
>>>source of your issues.
>>>
 [root@ftp01 cron]# rpm -qa | grep ceph
 ceph-base-10.2.1-0.el7.x86_64
 ceph-deploy-1.5.33-0.noarch
 ceph-mon-10.2.1-0.el7.x86_64
 libcephfs1-10.2.1-0.el7.x86_64
 ceph-selinux-10.2.1-0.el7.x86_64
 ceph-mds-10.2.1-0.el7.x86_64
 ceph-common-10.2.1-0.el7.x86_64
 ceph-10.2.1-0.el7.x86_64
 python-cephfs-10.2.1-0.el7.x86_64
 ceph-osd-10.2.1-0.el7.x86_64

 Mounted like so:
 XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
 _netdev,relatime,name=ftp01,secretfile=/etc/ceph/ftp01.secret 0 0
 And:
 XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
 _netdev,relatime,name=ftp02,secretfile=/etc/ceph/ftp02.secret 0 0

 This filesystem has 234GB worth of data on it, and I created another
 subdirectory and mounted it, NFS style.

 Here were the steps to reproduce:

Re: [ceph-users] image map failed

2016-06-17 Thread George Shuklin

What did

sudo rbd -p data ls

show?

If it freezes too, issue is with pool itself (ceph health) or network 
connectivity.


On 06/17/2016 03:37 PM, Ishmael Tsoaela wrote:

Hi,

Thank you for the response but with sudo all it does is freeze:

rbd map data_01 --pool data


cluster-admin@nodeB:~/.ssh/ceph-cluster$ date && sudo rbd map data_01 
--pool data && date

Fri Jun 17 14:36:41 SAST 2016





On Fri, Jun 17, 2016 at 2:01 PM, Ishmael Tsoaela > wrote:


Hi,

Will someone please assist, I am new to cepph and I am trying to
map image and this happens:

cluster-admin@nodeB:~/.ssh/ceph-cluster$ rbd map data_01 --pool data
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail"
or so.
rbd: map failed: (13) Permission denied

If someone could help it would be great

cluster-admin@nodeB:~/.ssh/ceph-cluster$ ceph -v
ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)

cluster-admin@nodeB:~/.ssh/ceph-cluster$ lsb_release -r
Release: 14.04




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] IOPS requirements

2016-06-17 Thread Gandalf Corvotempesta
2016-06-17 10:03 GMT+02:00 Christian Balzer :
> I'm unfamilar with Xen and Xenserver (the later doesn't support RBD, btw),
> but if you can see all the combined activity of your VMs on your HW in the
> dom0 like with KVM/qemu, a simple "iostat" or "iostat -x" will give you the
> average IOPS of a device.
> Same of course within a VM.

I'm able to see the combined activity directly from Dom0.

With "iostat" should I look for 'tps' column ?

> However that's the average, you're likely to have peaks much higher than
> that.
> For this you'll either have to collect and graph that data
> (collectd/graphite, etc) and/or run something like atop during peak hours
> and watch it or have it write logs with a high sample rate.
> As in, atop can keep a log of all states, but the default interval of 10
> minutes with Debian is likely too course to spot real peaks.
> See the atop documentation.

Running "iostat" every seconds, i can see about 800-1000 tps
(transactions per seconds)
I can try to install "atop" but i'm totally new to this, I don't know
how use. Any hint?
Currently i'm reading the "atop" man page but i don't want to run in
for the whole weekend
with wrong parameters.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] image map failed

2016-06-17 Thread Ilya Dryomov
On Fri, Jun 17, 2016 at 2:37 PM, Ishmael Tsoaela  wrote:
> Hi,
>
> Thank you for the response but with sudo all it does is freeze:
>
> rbd map data_01 --pool data
>
>
> cluster-admin@nodeB:~/.ssh/ceph-cluster$ date && sudo rbd map data_01 --pool
> data && date
> Fri Jun 17 14:36:41 SAST 2016

What's the output of "dmesg | tail"?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] image map failed

2016-06-17 Thread Ishmael Tsoaela
Hi,

Thank you for the response but with sudo all it does is freeze:

rbd map data_01 --pool data


cluster-admin@nodeB:~/.ssh/ceph-cluster$ date && sudo rbd map data_01
--pool data && date
Fri Jun 17 14:36:41 SAST 2016





On Fri, Jun 17, 2016 at 2:01 PM, Ishmael Tsoaela 
wrote:

> Hi,
>
> Will someone please assist, I am new to cepph and I am trying to map image
> and this happens:
>
> cluster-admin@nodeB:~/.ssh/ceph-cluster$ rbd map data_01 --pool data
> rbd: sysfs write failed
> In some cases useful info is found in syslog - try "dmesg | tail" or so.
> rbd: map failed: (13) Permission denied
>
> If someone could help it would be great
>
> cluster-admin@nodeB:~/.ssh/ceph-cluster$ ceph -v
> ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>
> cluster-admin@nodeB:~/.ssh/ceph-cluster$ lsb_release -r
> Release: 14.04
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Debugging OSD startup

2016-06-17 Thread George Shuklin

Hello.

I'm trying to debug why OSD does not getting up.

It stops at:

2016-06-17 12:28:55.174468 7f0e60fd78c0 -1 osd.6 6366 log_to_monitors 
{default=true}
2016-06-17 12:28:55.185917 7f0e60fd78c0  0 osd.6 6366 done with init, 
starting boot process


If I enable debug (debug osd = 20 debug ms = 1) and restart it, it loops 
with messages:


2016-06-17 12:30:29.310433 7f5970e52700 10 osd.6 6366 tick_without_osd_lock
2016-06-17 12:30:29.310435 7f5970e52700 20 osd.6 6366 
scrub_random_backoff lost coin flip, randomly backing off
2016-06-17 12:30:29.810251 7f59584c3700 20 osd.6 6366 update_osd_stat 
osd_stat(6312 MB used, 85842 MB avail, 96281 MB total, peers []/[] op 
hist [])
2016-06-17 12:30:29.810289 7f59584c3700  5 osd.6 6366 heartbeat: 
osd_stat(6312 MB used, 85842 MB avail, 96281 MB total, peers []/[] op 
hist [])

2016-06-17 12:30:30.310604 7f5971653700 10 osd.6 6366 tick
2016-06-17 12:30:30.310650 7f5971653700 10 osd.6 6366 do_waiters -- start
2016-06-17 12:30:30.310652 7f5971653700 10 osd.6 6366 do_waiters -- finish
2016-06-17 12:30:30.310691 7f5970e52700 10 osd.6 6366 tick_without_osd_lock
2016-06-17 12:30:30.310693 7f5970e52700 20 osd.6 6366 
scrub_random_backoff lost coin flip, randomly backing off

2016-06-17 12:30:31.310762 7f5970e52700 10 osd.6 6366 tick_without_osd_lock
2016-06-17 12:30:31.310788 7f5970e52700 20 osd.6 6366 
scrub_random_backoff lost coin flip, randomly backing off

2016-06-17 12:30:31.310808 7f5971653700 10 osd.6 6366 tick
2016-06-17 12:30:31.310812 7f5971653700 10 osd.6 6366 do_waiters -- start
2016-06-17 12:30:31.310814 7f5971653700 10 osd.6 6366 do_waiters -- finish

Full log: https://gist.github.com/amarao/d02516a78657ab9b2a3ddab2f5952641

I see established network connection with monitor: ESTABLISHED 
4718/ceph-mon (on monitor host).



How I should continue debug? Any ideas what's wrong?


Thanks!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] image map failed

2016-06-17 Thread Rakesh Parkiti
Hi Ishmael
You have to apply correct client.keyring permission to a pool on which you 
trying to create rbd image.
>From Admin/Management Node: - create a client keyring.
1. Admin Node:-- ceph auth get-or-create client.rbd mon 'allow r' osd 'allow 
rwx pool=PoolA'[client.rbd1]key = 
AQB2kVpXjhq9OBAACIHEUeRs04UqJsbQeNyLRg== 2. Get the keyring from 
auth list and push the keyring to client node. -- ceph auth get-or-create 
client.rbd | ssh user@clientA1 sudo tee /etc/ceph/ceph.client.rbd.keyring 
[client.rbd1]key = AQB2kVpXjhq9OBAACIHEUeRs04UqJsbQeNyLRg== 
3. From client Node cat /etc/ceph/ceph.client.rbd.keyring >> 
/etc/ceph/keyringceph -s --name client.rbd
user@client:$ rbd create PoolA/PoolA_image1 -S 100G --image-format 2 
--object-size 32K --image-feature layering --name client.rbduser@client:$ sudo 
rbd map --image PoolA/PoolA_image1 --name client.rbd/dev/rbd0
-- Rakesh Parkiti
-- Forwarded message --
From: Ishmael Tsoaela 
Date: Fri, Jun 17, 2016 at 5:31 PM
Subject: [ceph-users] image map failed
To: ceph-users@lists.ceph.com


Hi,Will someone please assist, I am new to cepph and I am trying to map image 
and this happens:cluster-admin@nodeB:~/.ssh/ceph-cluster$ rbd map data_01 
--pool data
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (13) Permission deniedIf someone could help it would be 
greatcluster-admin@nodeB:~/.ssh/ceph-cluster$ ceph -v
ceph version 10.2.1 
(3a66dd4f30852819c1bdaa8ec23c795d4ad77269)cluster-admin@nodeB:~/.ssh/ceph-cluster$
 lsb_release -r
Release: 14.04

___

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






-- 

Thanks Rakesh Parkiti
Senior Test Engineer
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] image map failed

2016-06-17 Thread Ilya Dryomov
On Fri, Jun 17, 2016 at 2:01 PM, Ishmael Tsoaela  wrote:
> Hi,
>
> Will someone please assist, I am new to cepph and I am trying to map image
> and this happens:
>
> cluster-admin@nodeB:~/.ssh/ceph-cluster$ rbd map data_01 --pool data
> rbd: sysfs write failed
> In some cases useful info is found in syslog - try "dmesg | tail" or so.
> rbd: map failed: (13) Permission denied
>
> If someone could help it would be great
>
> cluster-admin@nodeB:~/.ssh/ceph-cluster$ ceph -v
> ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>
> cluster-admin@nodeB:~/.ssh/ceph-cluster$ lsb_release -r
> Release: 14.04

You need sudo privileges for rbd map: sudo rbd map ... or so.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] image map failed

2016-06-17 Thread Ishmael Tsoaela
Hi,

Will someone please assist, I am new to cepph and I am trying to map image
and this happens:

cluster-admin@nodeB:~/.ssh/ceph-cluster$ rbd map data_01 --pool data
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (13) Permission denied

If someone could help it would be great

cluster-admin@nodeB:~/.ssh/ceph-cluster$ ceph -v
ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)

cluster-admin@nodeB:~/.ssh/ceph-cluster$ lsb_release -r
Release: 14.04
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] reweight command

2016-06-17 Thread M Ranga Swami Reddy
Hello,
what is the diff between below reweight command and which one is
preferable to use?

ceph osd rewight 

ceph osd crush reweight 


Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Stripe/Chunk Size (Order Number) Pros Cons

2016-06-17 Thread Wido den Hollander

> Op 17 juni 2016 om 12:12 schreef Lazuardi Nasution :
> 
> 
> Hi Mark,
> 
> What overhead do you mean? Can it be negligible if I use 4KB (extremly,
> same with I/O size) stripe/chunk size for making sure that all random I/O
> will spreaded through all OSDs?
> 

Keep in mind that this involves opening additional TCP connections to OSDs. 
That will come with some overhead. Especially when new connections have to go 
through the handshake process.

I am using 64MB stripes in a case with a customer. They only need sequential 
writes and reads at high speed. Works great for them.

Wido

> Anyway, I love coffee too :)
> 
> Best regards,
> 
> 
> > Date: Thu, 16 Jun 2016 04:01:37 -0500
> > From: Mark Nelson 
> > To: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] RBD Stripe/Chunk Size (Order Number) Pros
> > Cons
> > Message-ID: 
> > Content-Type: text/plain; charset=windows-1252; format=flowed
> >
> >
> >
> > On 06/16/2016 03:54 AM, Mark Nelson wrote:
> > > Hi,
> > >
> > > larger stripe size (to an extent) will generally improve large
> > > sequential read and write performance.
> >
> > Oops, I should have had my coffee. I missed a sentence here.  larger
> > strip size will generally improve large sequential read and write
> > performance.  Smaller stripe size can provide some of the advantages you
> > mention below, but there's overhead though.  Ok fixed, now back to find
> > coffee. :)
> >
> > > There's overhead though.  It
> > > means more objects which can slow things down at the filestore level
> > > when PG splits occur and also potentially means more inodes fall out of
> > > cache, longer syncfs, etc.  On the other hand, if using cache tiering,
> > > smaller objects means less data to promote which can be a big win for
> > > small IO.
> > >
> > > Basically the answer is that there are pluses and minuses, and the exact
> > > behavior will depend on your kernel configuration, hardware, and use
> > > case.  I think 4MB has been a fairly good default thus far (might change
> > > with bluestore), but tuning for a specific use case may mean a smaller
> > > or larger size is better.
> > >
> > > Mark
> > >
> > > On 06/16/2016 03:20 AM, Lazuardi Nasution wrote:
> > >> Hi,
> > >>
> > >> I'm looking for some pros cons related to RBD stripe/chunk size
> > >> indicated by image order number. Default is 4MB (order 22), but
> > >> OpenStack use 8MB (order 23) as default. What if we use smaller size
> > >> (lower order number), isn't it more chance that image objects is
> > >> spreaded through OSDs and cached on OSD nodes RAM? What if we use bigger
> > >> size (higher order number), isn't it more chance that image objects is
> > >> cached as contiguos blocks and may be have read ahead advantage? Please
> > >> give your opnion and reason.
> > >>
> > >> Best regards,
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Stripe/Chunk Size (Order Number) Pros Cons

2016-06-17 Thread Lazuardi Nasution
Hi Mark,

What overhead do you mean? Can it be negligible if I use 4KB (extremly,
same with I/O size) stripe/chunk size for making sure that all random I/O
will spreaded through all OSDs?

Anyway, I love coffee too :)

Best regards,


> Date: Thu, 16 Jun 2016 04:01:37 -0500
> From: Mark Nelson 
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] RBD Stripe/Chunk Size (Order Number) Pros
> Cons
> Message-ID: 
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
>
>
> On 06/16/2016 03:54 AM, Mark Nelson wrote:
> > Hi,
> >
> > larger stripe size (to an extent) will generally improve large
> > sequential read and write performance.
>
> Oops, I should have had my coffee. I missed a sentence here.  larger
> strip size will generally improve large sequential read and write
> performance.  Smaller stripe size can provide some of the advantages you
> mention below, but there's overhead though.  Ok fixed, now back to find
> coffee. :)
>
> > There's overhead though.  It
> > means more objects which can slow things down at the filestore level
> > when PG splits occur and also potentially means more inodes fall out of
> > cache, longer syncfs, etc.  On the other hand, if using cache tiering,
> > smaller objects means less data to promote which can be a big win for
> > small IO.
> >
> > Basically the answer is that there are pluses and minuses, and the exact
> > behavior will depend on your kernel configuration, hardware, and use
> > case.  I think 4MB has been a fairly good default thus far (might change
> > with bluestore), but tuning for a specific use case may mean a smaller
> > or larger size is better.
> >
> > Mark
> >
> > On 06/16/2016 03:20 AM, Lazuardi Nasution wrote:
> >> Hi,
> >>
> >> I'm looking for some pros cons related to RBD stripe/chunk size
> >> indicated by image order number. Default is 4MB (order 22), but
> >> OpenStack use 8MB (order 23) as default. What if we use smaller size
> >> (lower order number), isn't it more chance that image objects is
> >> spreaded through OSDs and cached on OSD nodes RAM? What if we use bigger
> >> size (higher order number), isn't it more chance that image objects is
> >> cached as contiguos blocks and may be have read ahead advantage? Please
> >> give your opnion and reason.
> >>
> >> Best regards,
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Down a osd and bring it Up

2016-06-17 Thread Kanchana. P
Thanks for the reply, the service is still showing as failed. How to bring
the osds service up. Ceph osd tree shows all osds as UP.

 [root@Admin ceph]# systemctl restart ceph-osd@osd.2.service
[root@Admin ceph]# systemctl status ceph-osd@osd.2.service
● ceph-osd@osd.2.service - Ceph object storage daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled;
vendor preset: disabled)
   Active: failed (Result: start-limit) since Fri 2016-06-17 14:34:25 IST;
3s ago
  Process: 8112 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i
--setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
  Process: 8071 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster
${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 8112 (code=exited, status=1/FAILURE)

Jun 17 14:34:25 Admin ceph-osd[8112]: --debug_ms N  set message debug
level (e.g. 1)
Jun 17 14:34:25 Admin systemd[1]: ceph-osd@osd.2.service: main process
exited, code=exited, status=...ILURE
Jun 17 14:34:25 Admin systemd[1]: Unit ceph-osd@osd.2.service entered
failed state.
Jun 17 14:34:25 Admin systemd[1]: ceph-osd@osd.2.service failed.
Jun 17 14:34:25 Admin ceph-osd[8112]: 2016-06-17 14:34:25.003696
7f3f58664800 -1 must specify '-i #'...mber
Jun 17 14:34:25 Admin systemd[1]: ceph-osd@osd.2.service holdoff time over,
scheduling restart.
Jun 17 14:34:25 Admin systemd[1]: start request repeated too quickly for
ceph-osd@osd.2.service
Jun 17 14:34:25 Admin systemd[1]: Failed to start Ceph object storage
daemon.
Jun 17 14:34:25 Admin systemd[1]: Unit ceph-osd@osd.2.service entered
failed state.
Jun 17 14:34:25 Admin systemd[1]: ceph-osd@osd.2.service failed.
Hint: Some lines were ellipsized, use -l to show in full.


[root@Admin ceph]# systemctl -a | grep ceph
  ceph-osd@osd.0.service
loadedinactive dead  Ceph object storage daemon
● ceph-osd@osd.1.service
loadedfailed   failedCeph object storage daemon
  ceph-osd@osd.10.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.11.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.12.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.13.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.14.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.15.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.16.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.17.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.18.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.19.service
loadedinactive dead  Ceph object storage daemon
● ceph-osd@osd.2.service
loadedfailed   failedCeph object storage daemon
● ceph-osd@osd.3.service
loadedfailed   failedCeph object storage daemon
  ceph-osd@osd.4.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.5.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.6.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.7.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.8.service
loadedinactive dead  Ceph object storage daemon
  ceph-osd@osd.9.service
loadedinactive dead  Ceph object storage daemon


On Thu, Jun 16, 2016 at 8:03 PM, Joshua M. Boniface 
wrote:

> RHEL 7.2 and Jewel should be using the systemd unit files by default, so
> you'd do something like:
>
> > sudo systemctl stop ceph-osd@
>
> and then
>
> > sudo systemctl start ceph-osd@
>
> when you're done.
>
> --
> Joshua M. Boniface
> Linux System Ærchitect
> Sigmentation fault. Core dumped.
>
> On 16/06/16 09:44 AM, Kanchana. P wrote:
> >
> > Hi,
> >
> > How can I down a osd and bring it back in RHEL 7.2 with ceph verison
> 10.2.2
> >
> > sudo start ceph-osd id=1 fails with “sudo: start: command not found”.
> >
> > I have 5 osds in each node and i want to down one particular osd (sudo
> stop ceph-sd id=1 also fails) and see whether replicas are written to other
> osds without any issues.
> >
> > Thanks in advance.
> >
> > –kanchana.
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] IOPS requirements

2016-06-17 Thread Christian Balzer

Hello,

On Fri, 17 Jun 2016 09:10:10 +0200 Gandalf Corvotempesta wrote:

> As I'm planning a new cluster where to move all my virtual machine
> (currently on local storage on each hypervisor) i would like to evaluate
> the current IOPS on each server
> 
> Knowing the current iops i'll be able to know how many iops i need on
> ceph
> 
> I'm not an expert, do know know how to get this info from each virtual
> machine (Linux) or directly from each hypervisor (xenserver) ?

I'm unfamilar with Xen and Xenserver (the later doesn't support RBD, btw),
but if you can see all the combined activity of your VMs on your HW in the
dom0 like with KVM/qemu, a simple "iostat" or "iostat -x" will give you the
average IOPS of a device.
Same of course within a VM.

However that's the average, you're likely to have peaks much higher than
that.
For this you'll either have to collect and graph that data
(collectd/graphite, etc) and/or run something like atop during peak hours
and watch it or have it write logs with a high sample rate.
As in, atop can keep a log of all states, but the default interval of 10
minutes with Debian is likely too course to spot real peaks. 
See the atop documentation.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] IOPS requirements

2016-06-17 Thread Oliver Dzombic
Hi,

the most accurate way should be to check on each hostmachine how much
IOPS are flowing through.

You can also visualize this with for example munin.

This way you can also see the peaks.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 17.06.2016 um 09:10 schrieb Gandalf Corvotempesta:
> As I'm planning a new cluster where to move all my virtual machine
> (currently on local storage on each hypervisor) i would like to evaluate
> the current IOPS on each server
> 
> Knowing the current iops i'll be able to know how many iops i need on ceph
> 
> I'm not an expert, do know know how to get this info from each virtual
> machine (Linux) or directly from each hypervisor (xenserver) ?
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] IOPS requirements

2016-06-17 Thread Gandalf Corvotempesta
As I'm planning a new cluster where to move all my virtual machine
(currently on local storage on each hypervisor) i would like to evaluate
the current IOPS on each server

Knowing the current iops i'll be able to know how many iops i need on ceph

I'm not an expert, do know know how to get this info from each virtual
machine (Linux) or directly from each hypervisor (xenserver) ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Bug found with CentOS 7.2

2016-06-17 Thread Oliver Dzombic
Hi,

just to verify this:

no symlink usage == no problem/bug

right ?

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 17.06.2016 um 06:11 schrieb Yan, Zheng:
> On Fri, Jun 17, 2016 at 5:03 AM, Jason Gress  wrote:
>> This is the latest default kernel with CentOS7.  We also tried a newer
>> kernel (from elrepo), a 4.4 that has the same problem, so I don't think
>> that is it.  Thank you for the suggestion though.
>>
>> We upgraded our cluster to the 10.2.2 release today, and it didn't resolve
>> all of the issues.  It's possible that a related issue is actually
>> permissions.  Something may not be right with our config (or a bug) here.
>>
>> While testing we noticed that there may actually be two issues here.  I am
>> unsure, as we noticed that the most consistent way to reproduce our issue
>> is to use vim or sed -i which does in place renames:
>>
>> [root@ftp01 cron]# ls -la
>> total 3
>> drwx--   1 root root 2044 Jun 16 15:50 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  300 Jun 16 15:50 file
>> -rw---   1 root root 2044 Jun 16 13:47 root
>> [root@ftp01 cron]# sed -i 's/^/#/' file
>> sed: cannot rename ./sedfB2CkO: Permission denied
>>
>>
>> Strangely, adding or deleting files works fine, it's only renaming that
>> fails.  And strangely I was able to successfully edit the file on ftp02:
>>
>> [root@ftp02 cron]# sed -i 's/^/#/' file
>> [root@ftp02 cron]# ls -la
>> total 3
>> drwx--   1 root root 2044 Jun 16 15:49 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  313 Jun 16 15:49 file
>> -rw---   1 root root 2044 Jun 16 13:47 root
>>
>>
>> Then it worked on ftp01 this time:
>> [root@ftp01 cron]# ls -la
>> total 3
>> drwx--   1 root root 2357 Jun 16 15:49 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  313 Jun 16 15:49 file
>> -rw---   1 root root 2044 Jun 16 13:47 root
>>
>>
>> Then, I vim'd it successfully on ftp01... Then ran the sed again:
>>
>> [root@ftp01 cron]# sed -i 's/^/#/' file
>> sed: cannot rename ./sedfB2CkO: Permission denied
>> [root@ftp01 cron]# ls -la
>> total 3
>> drwx--   1 root root 2044 Jun 16 15:51 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root  300 Jun 16 15:50 file
>> -rw---   1 root root 2044 Jun 16 13:47 root
>>
>>
>> And now we have the zero file problem again:
>>
>> [root@ftp02 cron]# ls -la
>> total 2
>> drwx--   1 root root 2044 Jun 16 15:51 .
>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>> -rw-r--r--   1 root root0 Jun 16 15:50 file
>> -rw---   1 root root 2044 Jun 16 13:47 root
>>
>>
>> Anyway, I wonder how much of this issue is related to that cannot rename
>> issue above.  Here are our security settings:
>>
>> client.ftp01
>> key: 
>> caps: [mds] allow r, allow rw path=/ftp
>> caps: [mon] allow r
>> caps: [osd] allow rw pool=cephfs_metadata, allow rw pool=cephfs_data
>> client.ftp02
>> key: 
>> caps: [mds] allow r, allow rw path=/ftp
>> caps: [mon] allow r
>> caps: [osd] allow rw pool=cephfs_metadata, allow rw pool=cephfs_data
>>
>>
>> /ftp is the directory on cephfs under which cron lives; the full path is
>> /ftp/cron .
>>
>> I hope this helps and thank you for your time!
> 
> I opened  ticket http://tracker.ceph.com/issues/16358. The bug is in
> path restriction code. For now, the workaround is updating client caps
> to not use path restriction.
> 
> Regards
> Yan, Zheng
> 
>>
>> Jason
>>
>> On 6/15/16, 4:43 PM, "John Spray"  wrote:
>>
>>> On Wed, Jun 15, 2016 at 10:21 PM, Jason Gress 
>>> wrote:
 While trying to use CephFS as a clustered filesystem, we stumbled upon a
 reproducible bug that is unfortunately pretty serious, as it leads to
 data
 loss.  Here is the situation:

 We have two systems, named ftp01 and ftp02.  They are both running
 CentOS
 7.2, with this kernel release and ceph packages:

 kernel-3.10.0-327.18.2.el7.x86_64
>>>
>>> That is an old-ish kernel to be using with cephfs.  It may well be the
>>> source of your issues.
>>>
 [root@ftp01 cron]# rpm -qa | grep ceph
 ceph-base-10.2.1-0.el7.x86_64
 ceph-deploy-1.5.33-0.noarch
 ceph-mon-10.2.1-0.el7.x86_64
 libcephfs1-10.2.1-0.el7.x86_64
 ceph-selinux-10.2.1-0.el7.x86_64
 ceph-mds-10.2.1-0.el7.x86_64
 ceph-common-10.2.1-0.el7.x86_64
 ceph-10.2.1-0.el7.x86_64
 python-cephfs-10.2.1-0.el7.x86_64
 ceph-osd-10.2.1-0.el7.x86_64

 Mounted like so:
 XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph