[ceph-users] IO-500 now accepting submissions

2017-10-27 Thread John Bent
Hello Ceph community,

After BoFs at last year's SC and the last two ISC's, the IO-500 is
formalized and is now accepting submissions in preparation for our first
IO-500 list at this year's SC BoF:
http://sc17.supercomputing.org/presentation/?id=bof108=sess319

The goal of the IO-500 is simple: to improve parallel file systems by
ensuring that sites publish results of both "hero" and "anti-hero" runs and
by sharing the tuning and configuration they applied to achieve those
results.

After receiving feedback from a few trial users, the framework is
significantly improved:
> git clone https://github.com/VI4IO/io-500-dev
> cd io-500-dev
> ./utilities/prepare.sh
> ./io500.sh
> # tune and rerun
> # email results to sub...@io500.org

This, perhaps with a bit of tweaking and please consult our 'doc' directory
for troubleshooting, should get a very small toy problem up and running
quickly.  It then does become a bit challenging to tune the problem size as
well as the underlying file system configuration (e.g. striping parameters)
to get a valid, and impressive, result.

The basic format of the benchmark is to run both a "hero" and "antihero"
IOR test as well as a "hero" and "antihero" mdtest.  The write/create phase
of these tests must last for at least five minutes to ensure that the test
is not measuring cache speeds.

One of the more challenging aspects is that there is a requirement to
search through the metadata of the files that this benchmark creates.
Currently we provide a simple serial version of this test (i.e. the GNU
find command) as well as a simple python MPI parallel tree walking
program.  Even with the MPI program, the find can take an extremely long
amount of time to finish.  You are encouraged to replace these provided
tools with anything of your own devise that satisfies the required
functionality.  This is one area where we particularly hope to foster
innovation as we have heard from many file system admins that metadata
search in current parallel file systems can be painfully slow.

Now is your chance to show the community just how awesome we all know Ceph
to be.  We are excited to introduce this benchmark and foster this
community.  We hope you give the benchmark a try and join our community if
you haven't already.  Please let us know right away in any of our various
communications channels (as described in our documentation) if you
encounter any problems with the benchmark or have questions about tuning or
have suggestions for others.

We hope to see your results in email and to see you in person at the SC BoF.

Thanks,

IO 500 Committee
John Bent, Julian Kunkle, Jay Lofstead
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-27 Thread Russell Glaue
Yes, several have recommended the fio test now.
I cannot perform a fio test at this time. Because the post referred to
directs us to write the fio test data directly to the disk device, e.g.
/dev/sdj. I'd have to take an OSD completely out in order to perform the
test. And I am not ready to do that at this time. Perhaps after I attempt
the hardware firmware updates, and still do not have an answer, I would
then take an OSD out of the cluster to run the fio test.
Also, our M500 disks on the two newest machines are all running version
MU05, the latest firmware. The on the older two, they are behind a RAID0,
but I suspect they might be MU03 firmware.
-RG


On Fri, Oct 27, 2017 at 4:12 PM, Brian Andrus 
wrote:

> I would be interested in seeing the results from the post mentioned by an
> earlier contributor:
>
> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-
> test-if-your-ssd-is-suitable-as-a-journal-device/
>
> Test an "old" M500 and a "new" M500 and see if the performance is A)
> acceptable and B) comparable. Find hardware revision or firmware revision
> in case of A=Good and B=different.
>
> If the "old" device doesn't test well in fio/dd testing, then the drives
> are (as expected) not a great choice for journals and you might want to
> look at hardware/backplane/RAID configuration differences that are somehow
> allowing them to perform adequately.
>
> On Fri, Oct 27, 2017 at 12:36 PM, Russell Glaue  wrote:
>
>> Yes, all the MD500s we use are both journal and OSD, even the older ones.
>> We have a 3 year lifecycle and move older nodes from one ceph cluster to
>> another.
>> On old systems with 3 year old MD500s, they run as RAID0, and run faster
>> than our current problem system with 1 year old MD500s, ran as nonraid
>> pass-through on the controller.
>>
>> All disks are SATA and are connected to a SAS controller. We were
>> wondering if the SAS/SATA conversion is an issue. Yet, the older systems
>> don't exhibit a problem.
>>
>> I found what I wanted to know from a colleague, that when the current
>> ceph cluster was put together, the SSDs tested at 300+MB/s, and ceph
>> cluster writes at 30MB/s.
>>
>> Using SMART tools, the reserved cells in all drives is nearly 100%.
>>
>> Restarting the OSDs minorly improved performance. Still betting on
>> hardware issues that a firmware upgrade may resolve.
>>
>> -RG
>>
>>
>> On Oct 27, 2017 1:14 PM, "Brian Andrus" 
>> wrote:
>>
>> @Russel, are your "older Crucial M500"s being used as journals?
>>
>> Crucial M500s are not to be used as a Ceph journal in my last experience
>> with them. They make good OSDs with an NVMe in front of them perhaps, but
>> not much else.
>>
>> Ceph uses O_DSYNC for journal writes and these drives do not handle them
>> as expected. It's been many years since I've dealt with the M500s
>> specifically, but it has to do with the capacitor/power save feature and
>> how it handles those types of writes. I'm sorry I don't have the emails
>> with specifics around anymore, but last I remember, this was a hardware
>> issue and could not be resolved with firmware.
>>
>> Paging Kyle Bader...
>>
>> On Fri, Oct 27, 2017 at 9:24 AM, Russell Glaue  wrote:
>>
>>> We have older crucial M500 disks operating without such problems. So, I
>>> have to believe it is a hardware firmware issue.
>>> And its peculiar seeing performance boost slightly, even 24 hours later,
>>> when I stop then start the OSDs.
>>>
>>> Our actual writes are low, as most of our Ceph Cluster based images are
>>> low-write, high-memory. So a 20GB/day life/write capacity is a non-issue
>>> for us. Only write speed is the concern. Our write-intensive images are
>>> locked on non-ceph disks.
>>>
>>> What are others using for SSD drives in their Ceph cluster?
>>> With 0.50+ DWPD (Drive Writes Per Day), the Kingston SEDC400S37 models
>>> seems to be the best for the price today.
>>>
>>>
>>>
>>> On Fri, Oct 27, 2017 at 6:34 AM, Maged Mokhtar 
>>> wrote:
>>>
 It is quiet likely related, things are pointing to bad disks. Probably
 the best thing is to plan for disk replacement, the sooner the better as it
 could get worse.



 On 2017-10-27 02:22, Christian Wuerdig wrote:

 Hm, no necessarily directly related to your performance problem,
 however: These SSDs have a listed endurance of 72TB total data written
 - over a 5 year period that's 40GB a day or approx 0.04 DWPD. Given
 that you run the journal for each OSD on the same disk, that's
 effectively at most 0.02 DWPD (about 20GB per day per disk). I don't
 know many who'd run a cluster on disks like those. Also it means these
 are pure consumer drives which have a habit of exhibiting random
 performance at times (based on unquantified anecdotal personal
 experience with other consumer model SSDs). I wouldn't touch these
 with a long stick for anything 

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-27 Thread Brian Andrus
I would be interested in seeing the results from the post mentioned by an
earlier contributor:

https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

Test an "old" M500 and a "new" M500 and see if the performance is A)
acceptable and B) comparable. Find hardware revision or firmware revision
in case of A=Good and B=different.

If the "old" device doesn't test well in fio/dd testing, then the drives
are (as expected) not a great choice for journals and you might want to
look at hardware/backplane/RAID configuration differences that are somehow
allowing them to perform adequately.

On Fri, Oct 27, 2017 at 12:36 PM, Russell Glaue  wrote:

> Yes, all the MD500s we use are both journal and OSD, even the older ones.
> We have a 3 year lifecycle and move older nodes from one ceph cluster to
> another.
> On old systems with 3 year old MD500s, they run as RAID0, and run faster
> than our current problem system with 1 year old MD500s, ran as nonraid
> pass-through on the controller.
>
> All disks are SATA and are connected to a SAS controller. We were
> wondering if the SAS/SATA conversion is an issue. Yet, the older systems
> don't exhibit a problem.
>
> I found what I wanted to know from a colleague, that when the current ceph
> cluster was put together, the SSDs tested at 300+MB/s, and ceph cluster
> writes at 30MB/s.
>
> Using SMART tools, the reserved cells in all drives is nearly 100%.
>
> Restarting the OSDs minorly improved performance. Still betting on
> hardware issues that a firmware upgrade may resolve.
>
> -RG
>
>
> On Oct 27, 2017 1:14 PM, "Brian Andrus" 
> wrote:
>
> @Russel, are your "older Crucial M500"s being used as journals?
>
> Crucial M500s are not to be used as a Ceph journal in my last experience
> with them. They make good OSDs with an NVMe in front of them perhaps, but
> not much else.
>
> Ceph uses O_DSYNC for journal writes and these drives do not handle them
> as expected. It's been many years since I've dealt with the M500s
> specifically, but it has to do with the capacitor/power save feature and
> how it handles those types of writes. I'm sorry I don't have the emails
> with specifics around anymore, but last I remember, this was a hardware
> issue and could not be resolved with firmware.
>
> Paging Kyle Bader...
>
> On Fri, Oct 27, 2017 at 9:24 AM, Russell Glaue  wrote:
>
>> We have older crucial M500 disks operating without such problems. So, I
>> have to believe it is a hardware firmware issue.
>> And its peculiar seeing performance boost slightly, even 24 hours later,
>> when I stop then start the OSDs.
>>
>> Our actual writes are low, as most of our Ceph Cluster based images are
>> low-write, high-memory. So a 20GB/day life/write capacity is a non-issue
>> for us. Only write speed is the concern. Our write-intensive images are
>> locked on non-ceph disks.
>>
>> What are others using for SSD drives in their Ceph cluster?
>> With 0.50+ DWPD (Drive Writes Per Day), the Kingston SEDC400S37 models
>> seems to be the best for the price today.
>>
>>
>>
>> On Fri, Oct 27, 2017 at 6:34 AM, Maged Mokhtar 
>> wrote:
>>
>>> It is quiet likely related, things are pointing to bad disks. Probably
>>> the best thing is to plan for disk replacement, the sooner the better as it
>>> could get worse.
>>>
>>>
>>>
>>> On 2017-10-27 02:22, Christian Wuerdig wrote:
>>>
>>> Hm, no necessarily directly related to your performance problem,
>>> however: These SSDs have a listed endurance of 72TB total data written
>>> - over a 5 year period that's 40GB a day or approx 0.04 DWPD. Given
>>> that you run the journal for each OSD on the same disk, that's
>>> effectively at most 0.02 DWPD (about 20GB per day per disk). I don't
>>> know many who'd run a cluster on disks like those. Also it means these
>>> are pure consumer drives which have a habit of exhibiting random
>>> performance at times (based on unquantified anecdotal personal
>>> experience with other consumer model SSDs). I wouldn't touch these
>>> with a long stick for anything but small toy-test clusters.
>>>
>>> On Fri, Oct 27, 2017 at 3:44 AM, Russell Glaue  wrote:
>>>
>>>
>>> On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar 
>>> wrote:
>>>
>>>
>>> It depends on what stage you are in:
>>> in production, probably the best thing is to setup a monitoring tool
>>> (collectd/grahite/prometheus/grafana) to monitor both ceph stats as
>>> well as
>>> resource load. This will, among other things, show you if you have
>>> slowing
>>> disks.
>>>
>>>
>>> I am monitoring Ceph performance with ceph-dash
>>> (http://cephdash.crapworks.de/), that is why I knew to look into the
>>> slow
>>> writes issue. And I am using Monitorix (http://www.monitorix.org/) to
>>> monitor system resources, including Disk I/O.
>>>
>>> However, though I can monitor individual disk performance at the system
>>> 

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-27 Thread Russell Glaue
Yes, all the MD500s we use are both journal and OSD, even the older ones.
We have a 3 year lifecycle and move older nodes from one ceph cluster to
another.
On old systems with 3 year old MD500s, they run as RAID0, and run faster
than our current problem system with 1 year old MD500s, ran as nonraid
pass-through on the controller.

All disks are SATA and are connected to a SAS controller. We were wondering
if the SAS/SATA conversion is an issue. Yet, the older systems don't
exhibit a problem.

I found what I wanted to know from a colleague, that when the current ceph
cluster was put together, the SSDs tested at 300+MB/s, and ceph cluster
writes at 30MB/s.

Using SMART tools, the reserved cells in all drives is nearly 100%.

Restarting the OSDs minorly improved performance. Still betting on hardware
issues that a firmware upgrade may resolve.

-RG


On Oct 27, 2017 1:14 PM, "Brian Andrus"  wrote:

@Russel, are your "older Crucial M500"s being used as journals?

Crucial M500s are not to be used as a Ceph journal in my last experience
with them. They make good OSDs with an NVMe in front of them perhaps, but
not much else.

Ceph uses O_DSYNC for journal writes and these drives do not handle them as
expected. It's been many years since I've dealt with the M500s
specifically, but it has to do with the capacitor/power save feature and
how it handles those types of writes. I'm sorry I don't have the emails
with specifics around anymore, but last I remember, this was a hardware
issue and could not be resolved with firmware.

Paging Kyle Bader...

On Fri, Oct 27, 2017 at 9:24 AM, Russell Glaue  wrote:

> We have older crucial M500 disks operating without such problems. So, I
> have to believe it is a hardware firmware issue.
> And its peculiar seeing performance boost slightly, even 24 hours later,
> when I stop then start the OSDs.
>
> Our actual writes are low, as most of our Ceph Cluster based images are
> low-write, high-memory. So a 20GB/day life/write capacity is a non-issue
> for us. Only write speed is the concern. Our write-intensive images are
> locked on non-ceph disks.
>
> What are others using for SSD drives in their Ceph cluster?
> With 0.50+ DWPD (Drive Writes Per Day), the Kingston SEDC400S37 models
> seems to be the best for the price today.
>
>
>
> On Fri, Oct 27, 2017 at 6:34 AM, Maged Mokhtar 
> wrote:
>
>> It is quiet likely related, things are pointing to bad disks. Probably
>> the best thing is to plan for disk replacement, the sooner the better as it
>> could get worse.
>>
>>
>>
>> On 2017-10-27 02:22, Christian Wuerdig wrote:
>>
>> Hm, no necessarily directly related to your performance problem,
>> however: These SSDs have a listed endurance of 72TB total data written
>> - over a 5 year period that's 40GB a day or approx 0.04 DWPD. Given
>> that you run the journal for each OSD on the same disk, that's
>> effectively at most 0.02 DWPD (about 20GB per day per disk). I don't
>> know many who'd run a cluster on disks like those. Also it means these
>> are pure consumer drives which have a habit of exhibiting random
>> performance at times (based on unquantified anecdotal personal
>> experience with other consumer model SSDs). I wouldn't touch these
>> with a long stick for anything but small toy-test clusters.
>>
>> On Fri, Oct 27, 2017 at 3:44 AM, Russell Glaue  wrote:
>>
>>
>> On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar 
>> wrote:
>>
>>
>> It depends on what stage you are in:
>> in production, probably the best thing is to setup a monitoring tool
>> (collectd/grahite/prometheus/grafana) to monitor both ceph stats as well
>> as
>> resource load. This will, among other things, show you if you have slowing
>> disks.
>>
>>
>> I am monitoring Ceph performance with ceph-dash
>> (http://cephdash.crapworks.de/), that is why I knew to look into the slow
>> writes issue. And I am using Monitorix (http://www.monitorix.org/) to
>> monitor system resources, including Disk I/O.
>>
>> However, though I can monitor individual disk performance at the system
>> level, it seems Ceph does not tax any disk more than the worst disk. So in
>> my monitoring charts, all disks have the same performance.
>> All four nodes are base-lining at 50 writes/sec during the cluster's
>> normal
>> load, with the non-problem hosts spiking up to 150, and the problem host
>> only spikes up to 100.
>> But during the window of time I took the problem host OSDs down to run the
>> bench tests, the OSDs on the other nodes increased to 300-500 writes/sec.
>> Otherwise, the chart looks the same for all disks on all ceph nodes/hosts.
>>
>> Before production you should first make sure your SSDs are suitable for
>> Ceph, either by being recommend by other Ceph users or you test them
>> yourself for sync writes performance using fio tool as outlined earlier.
>> Then after you build your cluster you can use rados and/or rbd bencmark
>> 

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-27 Thread Brian Andrus
@Russel, are your "older Crucial M500"s being used as journals?

Crucial M500s are not to be used as a Ceph journal in my last experience
with them. They make good OSDs with an NVMe in front of them perhaps, but
not much else.

Ceph uses O_DSYNC for journal writes and these drives do not handle them as
expected. It's been many years since I've dealt with the M500s
specifically, but it has to do with the capacitor/power save feature and
how it handles those types of writes. I'm sorry I don't have the emails
with specifics around anymore, but last I remember, this was a hardware
issue and could not be resolved with firmware.

Paging Kyle Bader...

On Fri, Oct 27, 2017 at 9:24 AM, Russell Glaue  wrote:

> We have older crucial M500 disks operating without such problems. So, I
> have to believe it is a hardware firmware issue.
> And its peculiar seeing performance boost slightly, even 24 hours later,
> when I stop then start the OSDs.
>
> Our actual writes are low, as most of our Ceph Cluster based images are
> low-write, high-memory. So a 20GB/day life/write capacity is a non-issue
> for us. Only write speed is the concern. Our write-intensive images are
> locked on non-ceph disks.
>
> What are others using for SSD drives in their Ceph cluster?
> With 0.50+ DWPD (Drive Writes Per Day), the Kingston SEDC400S37 models
> seems to be the best for the price today.
>
>
>
> On Fri, Oct 27, 2017 at 6:34 AM, Maged Mokhtar 
> wrote:
>
>> It is quiet likely related, things are pointing to bad disks. Probably
>> the best thing is to plan for disk replacement, the sooner the better as it
>> could get worse.
>>
>>
>>
>> On 2017-10-27 02:22, Christian Wuerdig wrote:
>>
>> Hm, no necessarily directly related to your performance problem,
>> however: These SSDs have a listed endurance of 72TB total data written
>> - over a 5 year period that's 40GB a day or approx 0.04 DWPD. Given
>> that you run the journal for each OSD on the same disk, that's
>> effectively at most 0.02 DWPD (about 20GB per day per disk). I don't
>> know many who'd run a cluster on disks like those. Also it means these
>> are pure consumer drives which have a habit of exhibiting random
>> performance at times (based on unquantified anecdotal personal
>> experience with other consumer model SSDs). I wouldn't touch these
>> with a long stick for anything but small toy-test clusters.
>>
>> On Fri, Oct 27, 2017 at 3:44 AM, Russell Glaue  wrote:
>>
>>
>> On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar 
>> wrote:
>>
>>
>> It depends on what stage you are in:
>> in production, probably the best thing is to setup a monitoring tool
>> (collectd/grahite/prometheus/grafana) to monitor both ceph stats as well
>> as
>> resource load. This will, among other things, show you if you have slowing
>> disks.
>>
>>
>> I am monitoring Ceph performance with ceph-dash
>> (http://cephdash.crapworks.de/), that is why I knew to look into the slow
>> writes issue. And I am using Monitorix (http://www.monitorix.org/) to
>> monitor system resources, including Disk I/O.
>>
>> However, though I can monitor individual disk performance at the system
>> level, it seems Ceph does not tax any disk more than the worst disk. So in
>> my monitoring charts, all disks have the same performance.
>> All four nodes are base-lining at 50 writes/sec during the cluster's
>> normal
>> load, with the non-problem hosts spiking up to 150, and the problem host
>> only spikes up to 100.
>> But during the window of time I took the problem host OSDs down to run the
>> bench tests, the OSDs on the other nodes increased to 300-500 writes/sec.
>> Otherwise, the chart looks the same for all disks on all ceph nodes/hosts.
>>
>> Before production you should first make sure your SSDs are suitable for
>> Ceph, either by being recommend by other Ceph users or you test them
>> yourself for sync writes performance using fio tool as outlined earlier.
>> Then after you build your cluster you can use rados and/or rbd bencmark
>> tests to benchmark your cluster and find bottlenecks using
>> atop/sar/collectl
>> which will help you tune your cluster.
>>
>>
>> All 36 OSDs are: Crucial_CT960M500SSD1
>>
>> Rados bench tests were done at the beginning. The speed was much faster
>> than
>> it is now. I cannot recall the test results, someone else on my team ran
>> them. Recently, I had thought the slow disk problem was a configuration
>> issue with Ceph - before I posted here. Now we are hoping it may be
>> resolved
>> with a firmware update. (If it is firmware related, rebooting the problem
>> node may temporarily resolve this)
>>
>>
>> Though you did see better improvements, your cluster with 27 SSDs should
>> give much higher numbers than 3k iops. If you are running rados bench
>> while
>> you have other client ios, then obviously the reported number by the tool
>> will be less than what the cluster is actually giving...which you can find
>> out via 

Re: [ceph-users] Kernel version recommendation

2017-10-27 Thread David Turner
If you can do an ssh session to the IPMI console and then do that inside of
a screen, you can save the output of the screen to a file and look at what
was happening on the console when the server locked up.  That's how I track
kernel panics.

On Fri, Oct 27, 2017 at 1:53 PM Bogdan SOLGA  wrote:

> Thank you very much for the reply, Ilya!
>
> The server was completely frozen / hard lockup, we had to restart it via
> IPMI. We grepped the logs trying to find the culprit, but to no avail.
> Any hint on how to troubleshoot the (eventual) freezes is highly
> appreciated.
>
> Understood on the kernel recommendation. We'll continue to use 4.10, then.
>
> Thanks, a lot!
>
> Kind regards,
> Bogdan
>
>
>
>
> On Fri, Oct 27, 2017 at 8:04 PM, Ilya Dryomov  wrote:
>
>> On Fri, Oct 27, 2017 at 6:33 PM, Bogdan SOLGA 
>> wrote:
>> > Hello, everyone!
>> >
>> > We have recently upgraded our Ceph pool to the latest Luminous release.
>> On
>> > one of the servers that we used as Ceph clients we had several freeze
>> > issues, which we empirically linked to the concurrent usage of some I/O
>> > operations - writing in an LXD container (backed by Ceph) while there
>> was an
>> > ongoing PG rebalancing. We searched for the issue's cause through the
>> logs,
>> > but we haven't found anything useful.
>>
>> What kind of freezes -- temporary slowdowns or hard lockups?  Did they
>> resolve on their own or did you have to intervene?
>>
>> >
>> > At that time the server was running Ubuntu 16 with a 4.5 kernel. We
>> thought
>> > an upgrade to the latest HWE kernel (4.10) would help, but we had the
>> same
>> > freezing issues after the kernel upgrade. Of course, we're aware that we
>> > have tried to fix / avoid the issue without understanding it's cause.
>> >
>> > After seeing the OS recommendations from the Ceph page, we reinstalled
>> the
>> > server (and got the 4.4 kernel), we ran into a feature set mismatch
>> issue
>> > when mounting a RBD image. We concluded that the feature set requires a
>> > kernel > 4.5.
>> >
>> > Our question - how would you recommend us to proceed? Shall we
>> re-upgrade to
>> > the HWE kernel (4.10) or to another kernel version? Would you recommend
>> an
>> > alternative solution?
>>
>> The OS recommendations page lists upstream kernels, as a general
>> guidance.  As long as the kernel is fairly recent and maintained
>> (either upstream or by the distributor), it should be fine.  4.10 is
>> certainly better than 4.4-based kernels, at least for the kernel
>> client.
>>
>> Thanks,
>>
>> Ilya
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2017-10-27 Thread David Turner
Your client needs to tell the cluster that the objects have been deleted.
'-o discard' is my goto because I'm lazy and it works well enough for me.
If you're in need of more performance, then fstrim is the other option.
Nothing on the Ceph side can be configured to know when a client no longer
needs the contents of an object.  It just acts like a normal harddrive in
that the filesystem on top of the RBD removed the pointers to the objects,
but the disk just lets the file stay where it is until it is eventually
overwritten.

Utilizing discard or fstrim cleans up the objects immediately, but at the
cost of cluster iops.  If you know that a particular RBD overwrites its
data all the time, then you can skip using fstrim on it as it will
constantly be using the same objects anyway.

On Fri, Oct 27, 2017 at 1:17 PM nigel davies  wrote:

> Hay all
>
> I am new to ceph and made an test ceph cluster that supports
> s3 and rbd's (rbd's are linked using iscsi)
>
> I been looking about and notice that the space is not decreasing when i
> delete a file and in turn filled up my cluster osd's
>
> I have been doing some reading and see people recomand
> adding  "-o discard" to every RBD mount (what can be an performance hit)
> or use fstrim. When i try both options work, but is their an better/another
> option.
>
> Thanks
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] (no subject)

2017-10-27 Thread nigel davies
Hay all

I am new to ceph and made an test ceph cluster that supports
s3 and rbd's (rbd's are linked using iscsi)

I been looking about and notice that the space is not decreasing when i
delete a file and in turn filled up my cluster osd's

I have been doing some reading and see people recomand
adding  "-o discard" to every RBD mount (what can be an performance hit) or
use fstrim. When i try both options work, but is their an better/another
option.

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel version recommendation

2017-10-27 Thread Bogdan SOLGA
Thank you very much for the reply, Ilya!

The server was completely frozen / hard lockup, we had to restart it via
IPMI. We grepped the logs trying to find the culprit, but to no avail.
Any hint on how to troubleshoot the (eventual) freezes is highly
appreciated.

Understood on the kernel recommendation. We'll continue to use 4.10, then.

Thanks, a lot!

Kind regards,
Bogdan



On Fri, Oct 27, 2017 at 8:04 PM, Ilya Dryomov  wrote:

> On Fri, Oct 27, 2017 at 6:33 PM, Bogdan SOLGA 
> wrote:
> > Hello, everyone!
> >
> > We have recently upgraded our Ceph pool to the latest Luminous release.
> On
> > one of the servers that we used as Ceph clients we had several freeze
> > issues, which we empirically linked to the concurrent usage of some I/O
> > operations - writing in an LXD container (backed by Ceph) while there
> was an
> > ongoing PG rebalancing. We searched for the issue's cause through the
> logs,
> > but we haven't found anything useful.
>
> What kind of freezes -- temporary slowdowns or hard lockups?  Did they
> resolve on their own or did you have to intervene?
>
> >
> > At that time the server was running Ubuntu 16 with a 4.5 kernel. We
> thought
> > an upgrade to the latest HWE kernel (4.10) would help, but we had the
> same
> > freezing issues after the kernel upgrade. Of course, we're aware that we
> > have tried to fix / avoid the issue without understanding it's cause.
> >
> > After seeing the OS recommendations from the Ceph page, we reinstalled
> the
> > server (and got the 4.4 kernel), we ran into a feature set mismatch issue
> > when mounting a RBD image. We concluded that the feature set requires a
> > kernel > 4.5.
> >
> > Our question - how would you recommend us to proceed? Shall we
> re-upgrade to
> > the HWE kernel (4.10) or to another kernel version? Would you recommend
> an
> > alternative solution?
>
> The OS recommendations page lists upstream kernels, as a general
> guidance.  As long as the kernel is fairly recent and maintained
> (either upstream or by the distributor), it should be fine.  4.10 is
> certainly better than 4.4-based kernels, at least for the kernel
> client.
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel version recommendation

2017-10-27 Thread Ilya Dryomov
On Fri, Oct 27, 2017 at 6:33 PM, Bogdan SOLGA  wrote:
> Hello, everyone!
>
> We have recently upgraded our Ceph pool to the latest Luminous release. On
> one of the servers that we used as Ceph clients we had several freeze
> issues, which we empirically linked to the concurrent usage of some I/O
> operations - writing in an LXD container (backed by Ceph) while there was an
> ongoing PG rebalancing. We searched for the issue's cause through the logs,
> but we haven't found anything useful.

What kind of freezes -- temporary slowdowns or hard lockups?  Did they
resolve on their own or did you have to intervene?

>
> At that time the server was running Ubuntu 16 with a 4.5 kernel. We thought
> an upgrade to the latest HWE kernel (4.10) would help, but we had the same
> freezing issues after the kernel upgrade. Of course, we're aware that we
> have tried to fix / avoid the issue without understanding it's cause.
>
> After seeing the OS recommendations from the Ceph page, we reinstalled the
> server (and got the 4.4 kernel), we ran into a feature set mismatch issue
> when mounting a RBD image. We concluded that the feature set requires a
> kernel > 4.5.
>
> Our question - how would you recommend us to proceed? Shall we re-upgrade to
> the HWE kernel (4.10) or to another kernel version? Would you recommend an
> alternative solution?

The OS recommendations page lists upstream kernels, as a general
guidance.  As long as the kernel is fairly recent and maintained
(either upstream or by the distributor), it should be fine.  4.10 is
certainly better than 4.4-based kernels, at least for the kernel
client.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] crush optimize does not work

2017-10-27 Thread David Turner
What does your crush map look like?  Also a `ceph df` output.  You're
optimizing your map for pool #5, if there are other pools with a
significant amount of data, then your going to be off on your cluster
balance.

A big question for balancing a cluster is how big are your PGs?  If your
primary data pool has PGs that are 100GB in size, then even if you balanced
the cluster such that all of the osds were within 1 PG of each other
(assuming your average OSD size <1TB), then your OSDs would be 10% apart in
disk usage from each other and 100GB.

On Thu, Oct 26, 2017 at 4:44 AM Stefan Priebe - Profihost AG <
s.pri...@profihost.ag> wrote:

> Hello,
>
> while trying to optimize a ceph cluster running jewel i get the
> following output:
> 2017-10-26 10:43:27,615 argv = optimize --crushmap
> /home/spriebe/ceph.report --out-path /home/spriebe/optimized.crush
> --pool 5 --pool=5 --choose-args=5 --replication-count=3 --pg-num=4096
> --pgp-num=4096 --rule=data --out-version=j --no-positions
> 2017-10-26 10:43:27,646 root optimizing
> 2017-10-26 10:43:29,329 root already optimized
> 2017-10-26 10:43:29,337 cloud1-1475 optimizing
> 2017-10-26 10:43:29,348 cloud1-1474 optimizing
> 2017-10-26 10:43:29,353 cloud1-1473 optimizing
> 2017-10-26 10:43:29,361 cloud1-1467 optimizing
> 2017-10-26 10:43:30,118 cloud1-1473 already optimized
> 2017-10-26 10:43:30,126 cloud1-1472 optimizing
> 2017-10-26 10:43:30,177 cloud1-1474 already optimized
> 2017-10-26 10:43:30,178 cloud1-1467 already optimized
> 2017-10-26 10:43:30,185 cloud1-1471 optimizing
> 2017-10-26 10:43:30,193 cloud1-1470 optimizing
> 2017-10-26 10:43:30,301 cloud1-1475 already optimized
> 2017-10-26 10:43:30,310 cloud1-1469 optimizing
> 2017-10-26 10:43:30,855 cloud1-1472 already optimized
> 2017-10-26 10:43:30,864 cloud1-1468 optimizing
> 2017-10-26 10:43:31,020 cloud1-1470 already optimized
> 2017-10-26 10:43:31,075 cloud1-1471 already optimized
> 2017-10-26 10:43:31,079 cloud1-1469 already optimized
> 2017-10-26 10:43:31,460 cloud1-1468 already optimized
>
>
> But this one is heavily inbalanced if you look at the AVAIL GB.
>
> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS
> 31 1.0  1.0   931G   756G   174G 81.24 1.24 204
> 32 1.0  1.0   931G   750G   180G 80.58 1.23 202
> 33 1.0  1.0   931G   774G   157G 83.13 1.27 209
> 34 1.0  1.0   931G   743G   187G 79.89 1.22 200
> 48 1.0  1.0   931G   780G   150G 83.79 1.28 210
> 49 1.0  1.0   931G   778G   152G 83.64 1.27 209
> 50 0.7  1.0   819G   631G   188G 77.00 1.17 170
> 21 1.0  0.95001   931G   729G   201G 78.32 1.19 197
> 22 1.0  1.0   931G   756G   174G 81.28 1.24 204
> 23 1.0  1.0   931G   778G   152G 83.57 1.27 210
> 24 1.0  0.95001   931G   760G   171G 81.63 1.24 205
> 46 1.0  1.0   931G   756G   174G 81.30 1.24 203
> 47 1.0  1.0   931G   768G   162G 82.57 1.26 207
> 55 0.7  1.0   819G   602G   217G 73.46 1.12 162
> 11 1.0  1.0   931G   628G   302G 67.47 1.03 169
> 12 1.0  1.0   931G   630G   300G 67.68 1.03 170
> 13 1.0  1.0   931G   629G   301G 67.63 1.03 170
> 14 1.0  1.0   931G   644G   286G 69.20 1.05 173
> 45 1.0  1.0   931G   635G   295G 68.23 1.04 171
> 56 0.7  1.0   819G   509G   309G 62.18 0.95 137
> 40 1.72670  1.0  1768G   774G   994G 43.78 0.67 209
> 51 1.0  0.95001   931G   784G   146G 84.26 1.28 212
> 52 1.0  1.0   931G   768G   162G 82.52 1.26 207
> 53 1.0  1.0   931G   752G   178G 80.78 1.23 203
> 54 1.0  1.0   931G   750G   180G 80.58 1.23 202
> 38 1.0  1.0   931G   768G   162G 82.58 1.26 207
> 39 1.0  1.0   931G   768G   162G 82.51 1.26 207
> 57 0.7  1.0   819G   616G   203G 75.17 1.15 167
> 43 1.0  1.0   931G   636G   294G 68.41 1.04 171
> 44 1.0  1.0   931G   630G   300G 67.70 1.03 170
> 36 1.0  1.0   931G   626G   304G 67.33 1.03 170
> 37 1.0  1.0   931G   637G   293G 68.43 1.04 171
> 58 0.7  1.0   819G   508G   311G 62.03 0.95 137
> 41 0.87000  1.0   888G   549G   339G 61.79 0.94 148
> 42 1.72670  1.0  1768G   382G  1385G 21.65 0.33 103
> 65 1.72670  1.0  1768G   383G  1384G 21.69 0.33 103
>  1 1.0  1.0   931G   760G   170G 81.70 1.24 205
>  2 1.0  1.0   931G   768G   162G 82.57 1.26 207
>  3 1.0  0.95001   931G   732G   198G 78.71 1.20 197
> 30 1.0  1.0   931G   771G   159G 82.84 1.26 209
> 35 1.0  1.0   931G   751G   179G 80.70 1.23 202
> 59 0.7  1.0   819G   599G   220G 73.08 1.11 162
>  0 0.84999  1.0   874G   662G   211G 75.81 1.16 179
>  4 1.0  1.0   931G   633G   297G 68.07 1.04 171
>  5 1.0  1.0   931G   632G   298G 67.94 1.04 171
> 60 0.7  1.0   819G   509G   310G 62.12 0.95 137
> 29 0.87000  1.0   888G   553G   335G 62.29 0.95 149
>  7 0.87000  1.0   888G   554G   333G 62.44 0.95 149
> 28 0.87000  1.0   888G   551G   336G 62.09 0.95 

Re: [ceph-users] Kernel version recommendation

2017-10-27 Thread Bogdan SOLGA
Thanks a lot for the recommendation, David!

Are you aware of any drawbacks and / or known issues with using rbd-nbd?

On Fri, Oct 27, 2017 at 7:47 PM, David Turner  wrote:

> rbd-nbd is gaining a lot of followers for use as mapping rbds.  The kernel
> driver for RBD's has taken a while to support features of current ceph
> versions.  The nice thing with rbd-nbd is that it has feature parity with
> the version of ceph you are using and can enable all of the rbd features
> you want to.
>
> When I do use the kernel driver, I usually find the kernel I want and then
> disable RBD features until the RBD is compatible to be mapped by that
> kernel.
>
> On Fri, Oct 27, 2017 at 12:34 PM Bogdan SOLGA 
> wrote:
>
>> Hello, everyone!
>>
>> We have recently upgraded our Ceph pool to the latest Luminous release.
>> On one of the servers that we used as Ceph clients we had several freeze
>> issues, which we empirically linked to the concurrent usage of some I/O
>> operations - writing in an LXD container (backed by Ceph) while there was
>> an ongoing PG rebalancing. We searched for the issue's cause through the
>> logs, but we haven't found anything useful.
>>
>> At that time the server was running Ubuntu 16 with a 4.5 kernel. We
>> thought an upgrade to the latest HWE kernel (4.10) would help, but we had
>> the same freezing issues after the kernel upgrade. Of course, we're aware
>> that we have tried to fix / avoid the issue without understanding it's
>> cause.
>>
>> After seeing the OS recommendations from the Ceph page
>> ,
>> we reinstalled the server (and got the 4.4 kernel), we ran into a feature
>> set mismatch issue when mounting a RBD image. We concluded
>> 
>> that the feature set requires a kernel > 4.5.
>>
>> Our question - how would you recommend us to proceed? Shall we re-upgrade
>> to the HWE kernel (4.10) or to another kernel version? Would you recommend
>> an alternative solution?
>>
>> Thank you very much, we're looking forward for your advice.
>>
>> Kind regards,
>> Bogdan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel version recommendation

2017-10-27 Thread David Turner
rbd-nbd is gaining a lot of followers for use as mapping rbds.  The kernel
driver for RBD's has taken a while to support features of current ceph
versions.  The nice thing with rbd-nbd is that it has feature parity with
the version of ceph you are using and can enable all of the rbd features
you want to.

When I do use the kernel driver, I usually find the kernel I want and then
disable RBD features until the RBD is compatible to be mapped by that
kernel.

On Fri, Oct 27, 2017 at 12:34 PM Bogdan SOLGA 
wrote:

> Hello, everyone!
>
> We have recently upgraded our Ceph pool to the latest Luminous release. On
> one of the servers that we used as Ceph clients we had several freeze
> issues, which we empirically linked to the concurrent usage of some I/O
> operations - writing in an LXD container (backed by Ceph) while there was
> an ongoing PG rebalancing. We searched for the issue's cause through the
> logs, but we haven't found anything useful.
>
> At that time the server was running Ubuntu 16 with a 4.5 kernel. We
> thought an upgrade to the latest HWE kernel (4.10) would help, but we had
> the same freezing issues after the kernel upgrade. Of course, we're aware
> that we have tried to fix / avoid the issue without understanding it's
> cause.
>
> After seeing the OS recommendations from the Ceph page
> ,
> we reinstalled the server (and got the 4.4 kernel), we ran into a feature
> set mismatch issue when mounting a RBD image. We concluded
> 
> that the feature set requires a kernel > 4.5.
>
> Our question - how would you recommend us to proceed? Shall we re-upgrade
> to the HWE kernel (4.10) or to another kernel version? Would you recommend
> an alternative solution?
>
> Thank you very much, we're looking forward for your advice.
>
> Kind regards,
> Bogdan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Kernel version recommendation

2017-10-27 Thread Bogdan SOLGA
Hello, everyone!

We have recently upgraded our Ceph pool to the latest Luminous release. On
one of the servers that we used as Ceph clients we had several freeze
issues, which we empirically linked to the concurrent usage of some I/O
operations - writing in an LXD container (backed by Ceph) while there was
an ongoing PG rebalancing. We searched for the issue's cause through the
logs, but we haven't found anything useful.

At that time the server was running Ubuntu 16 with a 4.5 kernel. We thought
an upgrade to the latest HWE kernel (4.10) would help, but we had the same
freezing issues after the kernel upgrade. Of course, we're aware that we
have tried to fix / avoid the issue without understanding it's cause.

After seeing the OS recommendations from the Ceph page
,
we reinstalled the server (and got the 4.4 kernel), we ran into a feature
set mismatch issue when mounting a RBD image. We concluded

that the feature set requires a kernel > 4.5.

Our question - how would you recommend us to proceed? Shall we re-upgrade
to the HWE kernel (4.10) or to another kernel version? Would you recommend
an alternative solution?

Thank you very much, we're looking forward for your advice.

Kind regards,
Bogdan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd map hangs when using systemd-automount

2017-10-27 Thread Bjoern Laessig
Hi Cephers,

i have multiple rbds to map and mount and the bootup hangs forever
while running rbdmap.service script. This was my mount-entry for
/etc/fstab:

/dev/rbd/ptxdev/WORK_CEPH_BLA /ptx/work/ceph/bla xfs 
noauto,x-systemd.automount,defaults,noatime,_netdev,logbsize=256k,nofail  0  0

(the mount is activated at boottime by an nfs-server that exports this
filesystem)
And i have a lot of these rbd mounts. Via systemds debug-shell.service
i found out, that the boot hangs at rbdmap.service. I added an 'set -x'
to /usr/bin/rbdmap and it showed me, that it hangs at

  mount --fake /dev/rbd/$DEV >>/dev/null 2>&1

Why is this called there? Why is this done one rbd at a time? 

As there was no mention of it in the manual mounting documentation, I
masked rbdmap.service and created a rbdmap@.service instead:


[Unit]
Description=Map RBD device ptxdev/%i

After=network-online.target local-fs.target
Wants=network-online.target local-fs.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/rbd map %I --id dev --keyring 
/etc/ceph/ceph.client.dev.keyring
ExecStop=/usr/bin/rbd unmap /dev/rbd/%I


and added the option 
  x-systemd.requires=rbdmap@ptxdev-WORK_CEPH_BLA.service
 to my fstab-entry.

Now systemd is able to finish the boot process, but this is clearly
only a workaround as there is now duplicated configuration data in the
servicetemplate and in /etc/ceph/rbdmap.

To do this right, there should be a systemd.generator(7) that reads
/etc/ceph/rbdmap at boottime and generates the rbdmap@ptxdev-
WORK_CEPH_BLA.service files.

Is this the correct way?

have a nice weekend
Björn Lässig
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-27 Thread Russell Glaue
We have older crucial M500 disks operating without such problems. So, I
have to believe it is a hardware firmware issue.
And its peculiar seeing performance boost slightly, even 24 hours later,
when I stop then start the OSDs.

Our actual writes are low, as most of our Ceph Cluster based images are
low-write, high-memory. So a 20GB/day life/write capacity is a non-issue
for us. Only write speed is the concern. Our write-intensive images are
locked on non-ceph disks.

What are others using for SSD drives in their Ceph cluster?
With 0.50+ DWPD (Drive Writes Per Day), the Kingston SEDC400S37 models
seems to be the best for the price today.



On Fri, Oct 27, 2017 at 6:34 AM, Maged Mokhtar  wrote:

> It is quiet likely related, things are pointing to bad disks. Probably the
> best thing is to plan for disk replacement, the sooner the better as it
> could get worse.
>
>
>
> On 2017-10-27 02:22, Christian Wuerdig wrote:
>
> Hm, no necessarily directly related to your performance problem,
> however: These SSDs have a listed endurance of 72TB total data written
> - over a 5 year period that's 40GB a day or approx 0.04 DWPD. Given
> that you run the journal for each OSD on the same disk, that's
> effectively at most 0.02 DWPD (about 20GB per day per disk). I don't
> know many who'd run a cluster on disks like those. Also it means these
> are pure consumer drives which have a habit of exhibiting random
> performance at times (based on unquantified anecdotal personal
> experience with other consumer model SSDs). I wouldn't touch these
> with a long stick for anything but small toy-test clusters.
>
> On Fri, Oct 27, 2017 at 3:44 AM, Russell Glaue  wrote:
>
>
> On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar 
> wrote:
>
>
> It depends on what stage you are in:
> in production, probably the best thing is to setup a monitoring tool
> (collectd/grahite/prometheus/grafana) to monitor both ceph stats as well
> as
> resource load. This will, among other things, show you if you have slowing
> disks.
>
>
> I am monitoring Ceph performance with ceph-dash
> (http://cephdash.crapworks.de/), that is why I knew to look into the slow
> writes issue. And I am using Monitorix (http://www.monitorix.org/) to
> monitor system resources, including Disk I/O.
>
> However, though I can monitor individual disk performance at the system
> level, it seems Ceph does not tax any disk more than the worst disk. So in
> my monitoring charts, all disks have the same performance.
> All four nodes are base-lining at 50 writes/sec during the cluster's normal
> load, with the non-problem hosts spiking up to 150, and the problem host
> only spikes up to 100.
> But during the window of time I took the problem host OSDs down to run the
> bench tests, the OSDs on the other nodes increased to 300-500 writes/sec.
> Otherwise, the chart looks the same for all disks on all ceph nodes/hosts.
>
> Before production you should first make sure your SSDs are suitable for
> Ceph, either by being recommend by other Ceph users or you test them
> yourself for sync writes performance using fio tool as outlined earlier.
> Then after you build your cluster you can use rados and/or rbd bencmark
> tests to benchmark your cluster and find bottlenecks using
> atop/sar/collectl
> which will help you tune your cluster.
>
>
> All 36 OSDs are: Crucial_CT960M500SSD1
>
> Rados bench tests were done at the beginning. The speed was much faster
> than
> it is now. I cannot recall the test results, someone else on my team ran
> them. Recently, I had thought the slow disk problem was a configuration
> issue with Ceph - before I posted here. Now we are hoping it may be
> resolved
> with a firmware update. (If it is firmware related, rebooting the problem
> node may temporarily resolve this)
>
>
> Though you did see better improvements, your cluster with 27 SSDs should
> give much higher numbers than 3k iops. If you are running rados bench while
> you have other client ios, then obviously the reported number by the tool
> will be less than what the cluster is actually giving...which you can find
> out via ceph status command, it will print the total cluster throughput and
> iops. If the total is still low i would recommend running the fio raw disk
> test, maybe the disks are not suitable. When you removed your 9 bad disk
> from 36 and your performance doubled, you still had 2 other disk slowing
> you..meaning near 100% busy ? It makes me feel the disk type used is not
> good. For these near 100% busy disks can you also measure their raw disk
> iops at that load (i am not sure atop shows this, if not use
> sat/syssyat/iostat/collecl).
>
>
> I ran another bench test today with all 36 OSDs up. The overall performance
> was improved slightly compared to the original tests. Only 3 OSDs on the
> problem host were increasing to 101% disk busy.
> The iops reported from ceph status during this bench test ranged from 1.6k
> to 3.3k, the test 

Re: [ceph-users] How to enable jumbo frames on IPv6 only cluster?

2017-10-27 Thread Ian Bobbitt
On 10/27/17 8:22 AM, Félix Barbeira wrote:
> root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02

You're specifying the ICMP payload size. IPv6 has larger headers than IPv4, so 
you'll need to decrease the payload to
fit in a standard jumbo frame. Try 8952 instead of 8972.

-- Ian


smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-27 Thread David Turner
I had the exact same error when using --bypass-gc.  We too decided to
destroy this realm and start it fresh.  For us, 95% of the data in this
realm is backups for other systems and they're find rebuilding it.  So our
plan is to migrate the 5% of the data to a temporary s3 location and then
rebuild this realm with brand-new pools, a fresh GC, and new settings. I
can add this realm to the offerings of tests to figure out options.  It's
running Jewel 10.2.7.

On Fri, Oct 27, 2017 at 11:26 AM Bryan Stillwell 
wrote:

> On Wed, Oct 25, 2017 at 4:02 PM, Yehuda Sadeh-Weinraub 
> wrote:
> >
> > On Wed, Oct 25, 2017 at 2:32 PM, Bryan Stillwell 
> wrote:
> > > That helps a little bit, but overall the process would take years at
> this
> > > rate:
> > >
> > > # for i in {1..3600}; do ceph df -f json-pretty |grep -A7
> '".rgw.buckets"' |grep objects; sleep 60; done
> > >  "objects": 1660775838
> > >  "objects": 1660775733
> > >  "objects": 1660775548
> > >  "objects": 1660774825
> > >  "objects": 1660774790
> > >  "objects": 1660774735
> > >
> > > This is on a hammer cluster.  Would upgrading to Jewel or Luminous
> speed up
> > > this process at all?
> >
> > I'm not sure it's going to help much, although the omap performance
> > might improve there. The big problem is that the omaps are just too
> > big, so that every operation on them take considerable time. I think
> > the best way forward there is to take a list of all the rados objects
> > that need to be removed from the gc omaps, and then get rid of the gc
> > objects themselves (newer ones will be created, this time using the
> > new configurable). Then remove the objects manually (and concurrently)
> > using the rados command line tool.
> > The one problem I see here is that even just removal of objects with
> > large omaps can affect the availability of the osds that hold these
> > objects. I discussed that now with Josh, and we think the best way to
> > deal with that is not to remove the gc objects immediatly, but to
> > rename the gc pool, and create a new one (with appropriate number of
> > pgs). This way new gc entries will now go into the new gc pool (with
> > higher number of gc shards), and you don't need to remove the old gc
> > objects (thus no osd availability problem). Then you can start
> > trimming the old gc objects (on the old renamed pool) by using the
> > rados command. It'll take a very very long time, but the process
> > should pick up speed slowly, as the objects shrink.
>
> That's fine for us.  We'll be tearing down this cluster in a few weeks
> and adding the nodes to the new cluster we created.  I just wanted to
> explore other options now that we can use it as a test cluster.
>
> The solution you described with renaming the .rgw.gc pool and creating a
> new one is pretty interesting.  I'll have to give that a try, but until
> then I've been trying to remove some of the other buckets with the
> --bypass-gc option and it keeps dying with output like this:
>
> # radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc
> 2017-10-27 08:00:00.865993 7f2b387228c0  0 RGWObjManifest::operator++():
> result: ofs=1488744 stripe_ofs=1488744 part_ofs=0 rule->part_size=0
> 2017-10-27 08:00:04.385875 7f2b387228c0  0 RGWObjManifest::operator++():
> result: ofs=673900 stripe_ofs=673900 part_ofs=0 rule->part_size=0
> 2017-10-27 08:00:04.517241 7f2b387228c0  0 RGWObjManifest::operator++():
> result: ofs=1179224 stripe_ofs=1179224 part_ofs=0 rule->part_size=0
> 2017-10-27 08:00:05.791876 7f2b387228c0  0 RGWObjManifest::operator++():
> result: ofs=566620 stripe_ofs=566620 part_ofs=0 rule->part_size=0
> 2017-10-27 08:00:26.815081 7f2b387228c0  0 RGWObjManifest::operator++():
> result: ofs=1090645 stripe_ofs=1090645 part_ofs=0 rule->part_size=0
> 2017-10-27 08:00:46.757556 7f2b387228c0  0 RGWObjManifest::operator++():
> result: ofs=1488744 stripe_ofs=1488744 part_ofs=0 rule->part_size=0
> 2017-10-27 08:00:47.093813 7f2b387228c0 -1 ERROR: could not drain handles
> as aio completion returned with -2
>
>
> I can typically make further progress by running it again:
>
> # radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc
> 2017-10-27 08:20:57.310859 7fae9c3d48c0  0 RGWObjManifest::operator++():
> result: ofs=673900 stripe_ofs=673900 part_ofs=0 rule->part_size=0
> 2017-10-27 08:20:57.406684 7fae9c3d48c0  0 RGWObjManifest::operator++():
> result: ofs=1179224 stripe_ofs=1179224 part_ofs=0 rule->part_size=0
> 2017-10-27 08:20:57.808050 7fae9c3d48c0 -1 ERROR: could not drain handles
> as aio completion returned with -2
>
>
> and again:
>
> # radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc
> 2017-10-27 08:22:04.992578 7ff8071038c0  0 RGWObjManifest::operator++():
> result: ofs=566620 stripe_ofs=566620 part_ofs=0 rule->part_size=0
> 2017-10-27 

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-27 Thread Bryan Stillwell
On Wed, Oct 25, 2017 at 4:02 PM, Yehuda Sadeh-Weinraub  
wrote:
>
> On Wed, Oct 25, 2017 at 2:32 PM, Bryan Stillwell  
> wrote:
> > That helps a little bit, but overall the process would take years at this
> > rate:
> >
> > # for i in {1..3600}; do ceph df -f json-pretty |grep -A7 '".rgw.buckets"' 
> > |grep objects; sleep 60; done
> >  "objects": 1660775838
> >  "objects": 1660775733
> >  "objects": 1660775548
> >  "objects": 1660774825
> >  "objects": 1660774790
> >  "objects": 1660774735
> >
> > This is on a hammer cluster.  Would upgrading to Jewel or Luminous speed up
> > this process at all?
>
> I'm not sure it's going to help much, although the omap performance
> might improve there. The big problem is that the omaps are just too
> big, so that every operation on them take considerable time. I think
> the best way forward there is to take a list of all the rados objects
> that need to be removed from the gc omaps, and then get rid of the gc
> objects themselves (newer ones will be created, this time using the
> new configurable). Then remove the objects manually (and concurrently)
> using the rados command line tool.
> The one problem I see here is that even just removal of objects with
> large omaps can affect the availability of the osds that hold these
> objects. I discussed that now with Josh, and we think the best way to
> deal with that is not to remove the gc objects immediatly, but to
> rename the gc pool, and create a new one (with appropriate number of
> pgs). This way new gc entries will now go into the new gc pool (with
> higher number of gc shards), and you don't need to remove the old gc
> objects (thus no osd availability problem). Then you can start
> trimming the old gc objects (on the old renamed pool) by using the
> rados command. It'll take a very very long time, but the process
> should pick up speed slowly, as the objects shrink.

That's fine for us.  We'll be tearing down this cluster in a few weeks
and adding the nodes to the new cluster we created.  I just wanted to
explore other options now that we can use it as a test cluster.

The solution you described with renaming the .rgw.gc pool and creating a
new one is pretty interesting.  I'll have to give that a try, but until
then I've been trying to remove some of the other buckets with the
--bypass-gc option and it keeps dying with output like this:

# radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc
2017-10-27 08:00:00.865993 7f2b387228c0  0 RGWObjManifest::operator++(): 
result: ofs=1488744 stripe_ofs=1488744 part_ofs=0 rule->part_size=0
2017-10-27 08:00:04.385875 7f2b387228c0  0 RGWObjManifest::operator++(): 
result: ofs=673900 stripe_ofs=673900 part_ofs=0 rule->part_size=0
2017-10-27 08:00:04.517241 7f2b387228c0  0 RGWObjManifest::operator++(): 
result: ofs=1179224 stripe_ofs=1179224 part_ofs=0 rule->part_size=0
2017-10-27 08:00:05.791876 7f2b387228c0  0 RGWObjManifest::operator++(): 
result: ofs=566620 stripe_ofs=566620 part_ofs=0 rule->part_size=0
2017-10-27 08:00:26.815081 7f2b387228c0  0 RGWObjManifest::operator++(): 
result: ofs=1090645 stripe_ofs=1090645 part_ofs=0 rule->part_size=0
2017-10-27 08:00:46.757556 7f2b387228c0  0 RGWObjManifest::operator++(): 
result: ofs=1488744 stripe_ofs=1488744 part_ofs=0 rule->part_size=0
2017-10-27 08:00:47.093813 7f2b387228c0 -1 ERROR: could not drain handles as 
aio completion returned with -2


I can typically make further progress by running it again:

# radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc
2017-10-27 08:20:57.310859 7fae9c3d48c0  0 RGWObjManifest::operator++(): 
result: ofs=673900 stripe_ofs=673900 part_ofs=0 rule->part_size=0
2017-10-27 08:20:57.406684 7fae9c3d48c0  0 RGWObjManifest::operator++(): 
result: ofs=1179224 stripe_ofs=1179224 part_ofs=0 rule->part_size=0
2017-10-27 08:20:57.808050 7fae9c3d48c0 -1 ERROR: could not drain handles as 
aio completion returned with -2


and again:

# radosgw-admin bucket rm --bucket=sg2pl5000 --purge-objects --bypass-gc
2017-10-27 08:22:04.992578 7ff8071038c0  0 RGWObjManifest::operator++(): 
result: ofs=566620 stripe_ofs=566620 part_ofs=0 rule->part_size=0
2017-10-27 08:22:05.726485 7ff8071038c0 -1 ERROR: could not drain handles as 
aio completion returned with -2


What does this error mean, and is there any way to keep it from dying
like this?  This cluster is running 0.94.10, but I can upgrade it to jewel
pretty easily if you would like.

Thanks,
Bryan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to enable jumbo frames on IPv6 only cluster?

2017-10-27 Thread Nico Schottelius

Hello,

we are running everything IPv6 only. You just need to setup the MTU on
your devices (nics, switches) correctly, nothing ceph or IPv6 specific
required.

If you are using SLAAC (like we do), you can also announce the MTU via
RA.

Best,

Nico



Jack  writes:

> Or maybe you reach that ipv4 directly, and that ipv6 via a router, somehow
>
> Check your routing table and neighbor table
>
> On 27/10/2017 16:02, Wido den Hollander wrote:
>>
>>> Op 27 oktober 2017 om 14:22 schreef Félix Barbeira :
>>>
>>>
>>> Hi,
>>>
>>> I'm trying to configure a ceph cluster using IPv6 only but I can't enable
>>> jumbo frames. I made the definition on the
>>> 'interfaces' file and it seems like the value is applied but when I test it
>>> looks like only works on IPv4, not IPv6.
>>>
>>> It works on IPv4:
>>>
>>> root@ceph-node01:~# ping -c 3 -M do -s 8972 ceph-node02
>>>
>>> PING ceph-node02 (x.x.x.x) 8972(9000) bytes of data.
>>> 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=1 ttl=64 time=0.474 ms
>>> 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=2 ttl=64 time=0.254 ms
>>> 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=3 ttl=64 time=0.288 ms
>>>
>>
>> Verify with Wireshark/tcpdump if it really sends 9k packets. I doubt it.
>>
>>> --- ceph-node02 ping statistics ---
>>> 3 packets transmitted, 3 received, 0% packet loss, time 2000ms
>>> rtt min/avg/max/mdev = 0.254/0.338/0.474/0.099 ms
>>>
>>> root@ceph-node01:~#
>>>
>>> But *not* in IPv6:
>>>
>>> root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02
>>> PING ceph-node02(x:x:x:x:x:x:x:x) 8972 data bytes
>>> ping: local error: Message too long, mtu=1500
>>> ping: local error: Message too long, mtu=1500
>>> ping: local error: Message too long, mtu=1500
>>>
>>
>> Like Ronny already mentioned, check the switches and the receiver. There is 
>> a 1500 MTU somewhere configured.
>>
>> Wido
>>
>>> --- ceph-node02 ping statistics ---
>>> 4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3024ms
>>>
>>> root@ceph-node01:~#
>>>
>>>
>>>
>>> root@ceph-node01:~# ifconfig
>>> eno1  Link encap:Ethernet  HWaddr 24:6e:96:05:55:f8
>>>   inet6 addr: 2a02:x:x:x:x:x:x:x/64 Scope:Global
>>>   inet6 addr: fe80::266e:96ff:fe05:55f8/64 Scope:Link
>>>   UP BROADCAST RUNNING MULTICAST  *MTU:9000*  Metric:1
>>>   RX packets:633318 errors:0 dropped:0 overruns:0 frame:0
>>>   TX packets:649607 errors:0 dropped:0 overruns:0 carrier:0
>>>   collisions:0 txqueuelen:1000
>>>   RX bytes:463355602 (463.3 MB)  TX bytes:498891771 (498.8 MB)
>>>
>>> loLink encap:Local Loopback
>>>   inet addr:127.0.0.1  Mask:255.0.0.0
>>>   inet6 addr: ::1/128 Scope:Host
>>>   UP LOOPBACK RUNNING  MTU:65536  Metric:1
>>>   RX packets:127420 errors:0 dropped:0 overruns:0 frame:0
>>>   TX packets:127420 errors:0 dropped:0 overruns:0 carrier:0
>>>   collisions:0 txqueuelen:1
>>>   RX bytes:179470326 (179.4 MB)  TX bytes:179470326 (179.4 MB)
>>>
>>> root@ceph-node01:~#
>>>
>>> root@ceph-node01:~# cat /etc/network/interfaces
>>> # This file describes network interfaces avaiulable on your system
>>> # and how to activate them. For more information, see interfaces(5).
>>>
>>> source /etc/network/interfaces.d/*
>>>
>>> # The loopback network interface
>>> auto lo
>>> iface lo inet loopback
>>>
>>> # The primary network interface
>>> auto eno1
>>> iface eno1 inet6 auto
>>>post-up ifconfig eno1 mtu 9000
>>> root@ceph-node01:#
>>>
>>>
>>> Please help!
>>>
>>> --
>>> Félix Barbeira.
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to enable jumbo frames on IPv6 only cluster?

2017-10-27 Thread Jack
Or maybe you reach that ipv4 directly, and that ipv6 via a router, somehow

Check your routing table and neighbor table

On 27/10/2017 16:02, Wido den Hollander wrote:
> 
>> Op 27 oktober 2017 om 14:22 schreef Félix Barbeira :
>>
>>
>> Hi,
>>
>> I'm trying to configure a ceph cluster using IPv6 only but I can't enable
>> jumbo frames. I made the definition on the
>> 'interfaces' file and it seems like the value is applied but when I test it
>> looks like only works on IPv4, not IPv6.
>>
>> It works on IPv4:
>>
>> root@ceph-node01:~# ping -c 3 -M do -s 8972 ceph-node02
>>
>> PING ceph-node02 (x.x.x.x) 8972(9000) bytes of data.
>> 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=1 ttl=64 time=0.474 ms
>> 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=2 ttl=64 time=0.254 ms
>> 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=3 ttl=64 time=0.288 ms
>>
> 
> Verify with Wireshark/tcpdump if it really sends 9k packets. I doubt it.
> 
>> --- ceph-node02 ping statistics ---
>> 3 packets transmitted, 3 received, 0% packet loss, time 2000ms
>> rtt min/avg/max/mdev = 0.254/0.338/0.474/0.099 ms
>>
>> root@ceph-node01:~#
>>
>> But *not* in IPv6:
>>
>> root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02
>> PING ceph-node02(x:x:x:x:x:x:x:x) 8972 data bytes
>> ping: local error: Message too long, mtu=1500
>> ping: local error: Message too long, mtu=1500
>> ping: local error: Message too long, mtu=1500
>>
> 
> Like Ronny already mentioned, check the switches and the receiver. There is a 
> 1500 MTU somewhere configured.
> 
> Wido
> 
>> --- ceph-node02 ping statistics ---
>> 4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3024ms
>>
>> root@ceph-node01:~#
>>
>>
>>
>> root@ceph-node01:~# ifconfig
>> eno1  Link encap:Ethernet  HWaddr 24:6e:96:05:55:f8
>>   inet6 addr: 2a02:x:x:x:x:x:x:x/64 Scope:Global
>>   inet6 addr: fe80::266e:96ff:fe05:55f8/64 Scope:Link
>>   UP BROADCAST RUNNING MULTICAST  *MTU:9000*  Metric:1
>>   RX packets:633318 errors:0 dropped:0 overruns:0 frame:0
>>   TX packets:649607 errors:0 dropped:0 overruns:0 carrier:0
>>   collisions:0 txqueuelen:1000
>>   RX bytes:463355602 (463.3 MB)  TX bytes:498891771 (498.8 MB)
>>
>> loLink encap:Local Loopback
>>   inet addr:127.0.0.1  Mask:255.0.0.0
>>   inet6 addr: ::1/128 Scope:Host
>>   UP LOOPBACK RUNNING  MTU:65536  Metric:1
>>   RX packets:127420 errors:0 dropped:0 overruns:0 frame:0
>>   TX packets:127420 errors:0 dropped:0 overruns:0 carrier:0
>>   collisions:0 txqueuelen:1
>>   RX bytes:179470326 (179.4 MB)  TX bytes:179470326 (179.4 MB)
>>
>> root@ceph-node01:~#
>>
>> root@ceph-node01:~# cat /etc/network/interfaces
>> # This file describes network interfaces avaiulable on your system
>> # and how to activate them. For more information, see interfaces(5).
>>
>> source /etc/network/interfaces.d/*
>>
>> # The loopback network interface
>> auto lo
>> iface lo inet loopback
>>
>> # The primary network interface
>> auto eno1
>> iface eno1 inet6 auto
>>post-up ifconfig eno1 mtu 9000
>> root@ceph-node01:#
>>
>>
>> Please help!
>>
>> -- 
>> Félix Barbeira.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to enable jumbo frames on IPv6 only cluster?

2017-10-27 Thread Wido den Hollander

> Op 27 oktober 2017 om 14:22 schreef Félix Barbeira :
> 
> 
> Hi,
> 
> I'm trying to configure a ceph cluster using IPv6 only but I can't enable
> jumbo frames. I made the definition on the
> 'interfaces' file and it seems like the value is applied but when I test it
> looks like only works on IPv4, not IPv6.
> 
> It works on IPv4:
> 
> root@ceph-node01:~# ping -c 3 -M do -s 8972 ceph-node02
> 
> PING ceph-node02 (x.x.x.x) 8972(9000) bytes of data.
> 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=1 ttl=64 time=0.474 ms
> 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=2 ttl=64 time=0.254 ms
> 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=3 ttl=64 time=0.288 ms
> 

Verify with Wireshark/tcpdump if it really sends 9k packets. I doubt it.

> --- ceph-node02 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2000ms
> rtt min/avg/max/mdev = 0.254/0.338/0.474/0.099 ms
> 
> root@ceph-node01:~#
> 
> But *not* in IPv6:
> 
> root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02
> PING ceph-node02(x:x:x:x:x:x:x:x) 8972 data bytes
> ping: local error: Message too long, mtu=1500
> ping: local error: Message too long, mtu=1500
> ping: local error: Message too long, mtu=1500
> 

Like Ronny already mentioned, check the switches and the receiver. There is a 
1500 MTU somewhere configured.

Wido

> --- ceph-node02 ping statistics ---
> 4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3024ms
> 
> root@ceph-node01:~#
> 
> 
> 
> root@ceph-node01:~# ifconfig
> eno1  Link encap:Ethernet  HWaddr 24:6e:96:05:55:f8
>   inet6 addr: 2a02:x:x:x:x:x:x:x/64 Scope:Global
>   inet6 addr: fe80::266e:96ff:fe05:55f8/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  *MTU:9000*  Metric:1
>   RX packets:633318 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:649607 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:463355602 (463.3 MB)  TX bytes:498891771 (498.8 MB)
> 
> loLink encap:Local Loopback
>   inet addr:127.0.0.1  Mask:255.0.0.0
>   inet6 addr: ::1/128 Scope:Host
>   UP LOOPBACK RUNNING  MTU:65536  Metric:1
>   RX packets:127420 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:127420 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1
>   RX bytes:179470326 (179.4 MB)  TX bytes:179470326 (179.4 MB)
> 
> root@ceph-node01:~#
> 
> root@ceph-node01:~# cat /etc/network/interfaces
> # This file describes network interfaces avaiulable on your system
> # and how to activate them. For more information, see interfaces(5).
> 
> source /etc/network/interfaces.d/*
> 
> # The loopback network interface
> auto lo
> iface lo inet loopback
> 
> # The primary network interface
> auto eno1
> iface eno1 inet6 auto
>post-up ifconfig eno1 mtu 9000
> root@ceph-node01:#
> 
> 
> Please help!
> 
> -- 
> Félix Barbeira.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to enable jumbo frames on IPv6 only cluster?

2017-10-27 Thread Ronny Aasen

On 27. okt. 2017 14:22, Félix Barbeira wrote:

Hi,

I'm trying to configure a ceph cluster using IPv6 only but I can't 
enable jumbo frames. I made the definition on the
'interfaces' file and it seems like the value is applied but when I test 
it looks like only works on IPv4, not IPv6.


It works on IPv4:

root@ceph-node01:~# ping -c 3 -M do -s 8972 ceph-node02

PING ceph-node02 (x.x.x.x) 8972(9000) bytes of data.
8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=1 ttl=64 time=0.474 ms
8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=2 ttl=64 time=0.254 ms
8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=3 ttl=64 time=0.288 ms

--- ceph-node02 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.254/0.338/0.474/0.099 ms

root@ceph-node01:~#

But *not* in IPv6:

root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02
PING ceph-node02(x:x:x:x:x:x:x:x) 8972 data bytes
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500

--- ceph-node02 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3024ms

root@ceph-node01:~#



root@ceph-node01:~# ifconfig
eno1      Link encap:Ethernet  HWaddr 24:6e:96:05:55:f8
           inet6 addr: 2a02:x:x:x:x:x:x:x/64 Scope:Global
           inet6 addr: fe80::266e:96ff:fe05:55f8/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST *MTU:9000*  Metric:1
           RX packets:633318 errors:0 dropped:0 overruns:0 frame:0
           TX packets:649607 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:463355602 (463.3 MB)  TX bytes:498891771 (498.8 MB)

lo        Link encap:Local Loopback
           inet addr:127.0.0.1  Mask:255.0.0.0
           inet6 addr: ::1/128 Scope:Host
           UP LOOPBACK RUNNING  MTU:65536  Metric:1
           RX packets:127420 errors:0 dropped:0 overruns:0 frame:0
           TX packets:127420 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1
           RX bytes:179470326 (179.4 MB)  TX bytes:179470326 (179.4 MB)

root@ceph-node01:~#

root@ceph-node01:~# cat /etc/network/interfaces
# This file describes network interfaces avaiulable on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eno1
iface eno1 inet6 auto
    post-up ifconfig eno1 mtu 9000
root@ceph-node01:#


Please help!





hello

have you changed on all nodes ?

also the ipv6 icmpv6 protocol can advertise a link MTU value.
the client will pick up this mtu value and store it 
in/proc/sys/net/ipv6/conf/eth0/mtu

if /proc/sys/net/ipv6/conf/ens32/accept_ra_mtu is enabled.

you can perhaps change what mtu is advertised on the link by altering 
your Router or device that advertise RA's



kind regards
Ronny Aasen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to enable jumbo frames on IPv6 only cluster?

2017-10-27 Thread Félix Barbeira
Hi,

I'm trying to configure a ceph cluster using IPv6 only but I can't enable
jumbo frames. I made the definition on the
'interfaces' file and it seems like the value is applied but when I test it
looks like only works on IPv4, not IPv6.

It works on IPv4:

root@ceph-node01:~# ping -c 3 -M do -s 8972 ceph-node02

PING ceph-node02 (x.x.x.x) 8972(9000) bytes of data.
8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=1 ttl=64 time=0.474 ms
8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=2 ttl=64 time=0.254 ms
8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=3 ttl=64 time=0.288 ms

--- ceph-node02 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.254/0.338/0.474/0.099 ms

root@ceph-node01:~#

But *not* in IPv6:

root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02
PING ceph-node02(x:x:x:x:x:x:x:x) 8972 data bytes
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500

--- ceph-node02 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3024ms

root@ceph-node01:~#



root@ceph-node01:~# ifconfig
eno1  Link encap:Ethernet  HWaddr 24:6e:96:05:55:f8
  inet6 addr: 2a02:x:x:x:x:x:x:x/64 Scope:Global
  inet6 addr: fe80::266e:96ff:fe05:55f8/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  *MTU:9000*  Metric:1
  RX packets:633318 errors:0 dropped:0 overruns:0 frame:0
  TX packets:649607 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:463355602 (463.3 MB)  TX bytes:498891771 (498.8 MB)

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:65536  Metric:1
  RX packets:127420 errors:0 dropped:0 overruns:0 frame:0
  TX packets:127420 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1
  RX bytes:179470326 (179.4 MB)  TX bytes:179470326 (179.4 MB)

root@ceph-node01:~#

root@ceph-node01:~# cat /etc/network/interfaces
# This file describes network interfaces avaiulable on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eno1
iface eno1 inet6 auto
   post-up ifconfig eno1 mtu 9000
root@ceph-node01:#


Please help!

-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-27 Thread Maged Mokhtar
It is quiet likely related, things are pointing to bad disks. Probably
the best thing is to plan for disk replacement, the sooner the better as
it could get worse. 

On 2017-10-27 02:22, Christian Wuerdig wrote:

> Hm, no necessarily directly related to your performance problem,
> however: These SSDs have a listed endurance of 72TB total data written
> - over a 5 year period that's 40GB a day or approx 0.04 DWPD. Given
> that you run the journal for each OSD on the same disk, that's
> effectively at most 0.02 DWPD (about 20GB per day per disk). I don't
> know many who'd run a cluster on disks like those. Also it means these
> are pure consumer drives which have a habit of exhibiting random
> performance at times (based on unquantified anecdotal personal
> experience with other consumer model SSDs). I wouldn't touch these
> with a long stick for anything but small toy-test clusters.
> 
> On Fri, Oct 27, 2017 at 3:44 AM, Russell Glaue  wrote: 
> On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar  wrote: 
> It depends on what stage you are in:
> in production, probably the best thing is to setup a monitoring tool
> (collectd/grahite/prometheus/grafana) to monitor both ceph stats as well as
> resource load. This will, among other things, show you if you have slowing
> disks. 
> I am monitoring Ceph performance with ceph-dash
> (http://cephdash.crapworks.de/), that is why I knew to look into the slow
> writes issue. And I am using Monitorix (http://www.monitorix.org/) to
> monitor system resources, including Disk I/O.
> 
> However, though I can monitor individual disk performance at the system
> level, it seems Ceph does not tax any disk more than the worst disk. So in
> my monitoring charts, all disks have the same performance.
> All four nodes are base-lining at 50 writes/sec during the cluster's normal
> load, with the non-problem hosts spiking up to 150, and the problem host
> only spikes up to 100.
> But during the window of time I took the problem host OSDs down to run the
> bench tests, the OSDs on the other nodes increased to 300-500 writes/sec.
> Otherwise, the chart looks the same for all disks on all ceph nodes/hosts.
> 
> Before production you should first make sure your SSDs are suitable for
> Ceph, either by being recommend by other Ceph users or you test them
> yourself for sync writes performance using fio tool as outlined earlier.
> Then after you build your cluster you can use rados and/or rbd bencmark
> tests to benchmark your cluster and find bottlenecks using atop/sar/collectl
> which will help you tune your cluster. 
> All 36 OSDs are: Crucial_CT960M500SSD1
> 
> Rados bench tests were done at the beginning. The speed was much faster than
> it is now. I cannot recall the test results, someone else on my team ran
> them. Recently, I had thought the slow disk problem was a configuration
> issue with Ceph - before I posted here. Now we are hoping it may be resolved
> with a firmware update. (If it is firmware related, rebooting the problem
> node may temporarily resolve this)
> 
> Though you did see better improvements, your cluster with 27 SSDs should
> give much higher numbers than 3k iops. If you are running rados bench while
> you have other client ios, then obviously the reported number by the tool
> will be less than what the cluster is actually giving...which you can find
> out via ceph status command, it will print the total cluster throughput and
> iops. If the total is still low i would recommend running the fio raw disk
> test, maybe the disks are not suitable. When you removed your 9 bad disk
> from 36 and your performance doubled, you still had 2 other disk slowing
> you..meaning near 100% busy ? It makes me feel the disk type used is not
> good. For these near 100% busy disks can you also measure their raw disk
> iops at that load (i am not sure atop shows this, if not use
> sat/syssyat/iostat/collecl). 
> I ran another bench test today with all 36 OSDs up. The overall performance
> was improved slightly compared to the original tests. Only 3 OSDs on the
> problem host were increasing to 101% disk busy.
> The iops reported from ceph status during this bench test ranged from 1.6k
> to 3.3k, the test yielding 4k iops.
> 
> Yes, the two other OSDs/disks that were the bottleneck were at 101% disk
> busy. The other OSD disks on the same host were sailing along at like 50-60%
> busy.
> 
> All 36 OSD disks are exactly the same disk. They were all purchased at the
> same time. All were installed at the same time.
> I cannot believe it is a problem with the disk model. A failed/bad disk,
> perhaps is possible. But the disk model itself cannot be the problem based
> on what I am seeing. If I am seeing bad performance on all disks on one ceph
> node/host, but not on another ceph node with these same disks, it has to be
> some other factor. This is why I am now guessing a firmware upgrade is
> needed.
> 
> Also, as I eluded to here earlier. I took down all 

Re: [ceph-users] ceph zstd not for bluestor due to performance reasons

2017-10-27 Thread Haomai Wang
On Fri, Oct 27, 2017 at 5:03 PM, Ragan, Tj (Dr.)
 wrote:
> Hi Haomai,
>
> According to the documentation, and a brief test to confirm, the lz4
> compression plugin isn’t distributed in the official release.  I’ve tried
> asking google how to add it back to no avail, so how have you added the
> plugin?  Is it simply a matter of putting a symlink in the right place or
> will I have to recompile?

yep, maybe we could add lz4 as the default buld option

>
> Any suggestions or pointers would be gratefully received.
>
> -TJ Ragan
>
>
>
> On 26 Oct 2017, at 07:44, Haomai Wang  wrote:
>
>
> Stefan Priebe - Profihost AG 于2017年10月26日 周四17:06写道:
>>
>> Hi Sage,
>>
>> Am 25.10.2017 um 21:54 schrieb Sage Weil:
>> > On Wed, 25 Oct 2017, Stefan Priebe - Profihost AG wrote:
>> >> Hello,
>> >>
>> >> in the lumious release notes is stated that zstd is not supported by
>> >> bluestor due to performance reason. I'm wondering why btrfs instead
>> >> states that zstd is as fast as lz4 but compresses as good as zlib.
>> >>
>> >> Why is zlib than supported by bluestor? And why does btrfs / facebook
>> >> behave different?
>> >>
>> >> "BlueStore supports inline compression using zlib, snappy, or LZ4.
>> >> (Ceph
>> >> also supports zstd for RGW compression but zstd is not recommended for
>> >> BlueStore for performance reasons.)"
>> >
>> > zstd will work but in our testing the performance wasn't great for
>> > bluestore in particular.  The problem was that for each compression run
>> > there is a relatively high start-up cost initializing the zstd
>> > context/state (IIRC a memset of a huge memory buffer) that dominated the
>> > execution time... primarily because bluestore is generally compressing
>> > pretty small chunks of data at a time, not big buffers or streams.
>> >
>> > Take a look at unittest_compression timings on compressing 16KB buffers
>> > (smaller than bluestore needs usually, but illustrated of the problem):
>> >
>> > [ RUN  ] Compressor/CompressorTest.compress_16384/0
>> > [plugin zlib (zlib/isal)]
>> > [   OK ] Compressor/CompressorTest.compress_16384/0 (294 ms)
>> > [ RUN  ] Compressor/CompressorTest.compress_16384/1
>> > [plugin zlib (zlib/noisal)]
>> > [   OK ] Compressor/CompressorTest.compress_16384/1 (1755 ms)
>> > [ RUN  ] Compressor/CompressorTest.compress_16384/2
>> > [plugin snappy (snappy)]
>> > [   OK ] Compressor/CompressorTest.compress_16384/2 (169 ms)
>> > [ RUN  ] Compressor/CompressorTest.compress_16384/3
>> > [plugin zstd (zstd)]
>> > [   OK ] Compressor/CompressorTest.compress_16384/3 (4528 ms)
>> >
>> > It's an order of magnitude slower than zlib or snappy, which probably
>> > isn't acceptable--even if it is a bit smaller.
>> >
>> > We just updated to a newer zstd the other day but I haven't been paying
>> > attention to the zstd code changes.  When I was working on this the
>> > plugin
>> > was initially also misusing the zstd API, but it was also pointed out
>> > that the size of the memset is dependent on the compression level.
>> > Maybe a different (default) choice there woudl help.
>> >
>> > https://github.com/facebook/zstd/issues/408#issuecomment-252163241
>>
>> thanks for the fast reply. Btrfs uses a default compression level of 3
>> but i think this is the default anyway.
>>
>> Does the zstd plugin of ceph already uses the mentioned
>> ZSTD_resetCStream instead of creating and initializing a new one every
>> time?
>>
>> So if performance matters ceph would recommand snappy?
>
>
>
> in our test, lz4 is better than snappy
>>
>>
>> Greets,
>> Stefan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph zstd not for bluestor due to performance reasons

2017-10-27 Thread Ragan, Tj (Dr.)
Hi Haomai,

According to the documentation, and a brief test to confirm, the lz4 
compression plugin isn’t distributed in the official release.  I’ve tried 
asking google how to add it back to no avail, so how have you added the plugin? 
 Is it simply a matter of putting a symlink in the right place or will I have 
to recompile?

Any suggestions or pointers would be gratefully received.

-TJ Ragan



On 26 Oct 2017, at 07:44, Haomai Wang > 
wrote:


Stefan Priebe - Profihost AG 
>于2017年10月26日 周四17:06写道:
Hi Sage,

Am 25.10.2017 um 21:54 schrieb Sage Weil:
> On Wed, 25 Oct 2017, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> in the lumious release notes is stated that zstd is not supported by
>> bluestor due to performance reason. I'm wondering why btrfs instead
>> states that zstd is as fast as lz4 but compresses as good as zlib.
>>
>> Why is zlib than supported by bluestor? And why does btrfs / facebook
>> behave different?
>>
>> "BlueStore supports inline compression using zlib, snappy, or LZ4. (Ceph
>> also supports zstd for RGW compression but zstd is not recommended for
>> BlueStore for performance reasons.)"
>
> zstd will work but in our testing the performance wasn't great for
> bluestore in particular.  The problem was that for each compression run
> there is a relatively high start-up cost initializing the zstd
> context/state (IIRC a memset of a huge memory buffer) that dominated the
> execution time... primarily because bluestore is generally compressing
> pretty small chunks of data at a time, not big buffers or streams.
>
> Take a look at unittest_compression timings on compressing 16KB buffers
> (smaller than bluestore needs usually, but illustrated of the problem):
>
> [ RUN  ] Compressor/CompressorTest.compress_16384/0
> [plugin zlib (zlib/isal)]
> [   OK ] Compressor/CompressorTest.compress_16384/0 (294 ms)
> [ RUN  ] Compressor/CompressorTest.compress_16384/1
> [plugin zlib (zlib/noisal)]
> [   OK ] Compressor/CompressorTest.compress_16384/1 (1755 ms)
> [ RUN  ] Compressor/CompressorTest.compress_16384/2
> [plugin snappy 
> (snappy)]
> [   OK ] Compressor/CompressorTest.compress_16384/2 (169 ms)
> [ RUN  ] Compressor/CompressorTest.compress_16384/3
> [plugin zstd (zstd)]
> [   OK ] Compressor/CompressorTest.compress_16384/3 (4528 ms)
>
> It's an order of magnitude slower than zlib or snappy, which probably
> isn't acceptable--even if it is a bit smaller.
>
> We just updated to a newer zstd the other day but I haven't been paying
> attention to the zstd code changes.  When I was working on this the plugin
> was initially also misusing the zstd API, but it was also pointed out
> that the size of the memset is dependent on the compression level.
> Maybe a different (default) choice there woudl help.
>
> https://github.com/facebook/zstd/issues/408#issuecomment-252163241

thanks for the fast reply. Btrfs uses a default compression level of 3
but i think this is the default anyway.

Does the zstd plugin of ceph already uses the mentioned
ZSTD_resetCStream instead of creating and initializing a new one every time?

So if performance matters ceph would recommand snappy?


in our test, lz4 is better than snappy

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Install Ceph on Fedora 26

2017-10-27 Thread GiangCoi Mr
Dear Denes

I searched in the Internet, one way to resolve it:
http://m.it610.com/article/5024659.htm

2017-10-27 14:23 GMT+07:00 Denes Dolhay :

> Hi,
>
> According to the documentation It should have been created by "ceph-deploy
> new". Maybe a small problem due to the fedora is not on the recommended os
> list(?)
>
> Either way, there is a document on manual deployment, and it contains the
> step to generate ceph.client.admin.keyring too:
>
> http://docs.ceph.com/docs/master/rados/configuration/auth-config-ref/
>
> You will probably have to go threw all the steps, because you are missing
> the mds, osd, rgw, mgr keyring too.
>
>
> Cheers,
>
> Denes.
>
>
>
> On 10/27/2017 04:07 AM, GiangCoi Mr wrote:
>
> Hi Denes Dolhay,
> This is error when I run command: ceph-deploy mon create-initial
>
> [ceph_deploy.mon][INFO  ] mon.ceph-node1 monitor has reached quorum!
> [ceph_deploy.mon][INFO  ] all initial monitors are running and have formed
> quorum
> [ceph_deploy.mon][INFO  ] Running gatherkeys...
> [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-node1 for
> /etc/ceph/ceph.client.admin.keyring
> [ceph-node1][DEBUG ] connected to host: ceph-node1
> [ceph-node1][DEBUG ] detect platform information from remote host
> [ceph-node1][DEBUG ] detect machine type
> [ceph-node1][DEBUG ] fetch remote file
>
> *[ceph_deploy.gatherkeys][WARNIN] Unable to find
> /etc/ceph/ceph.client.admin.keyring on ceph-node1 [ceph_deploy][ERROR ]
> KeyNotFoundError: Could not find keyring file:
> /etc/ceph/ceph.client.admin.keyring on host ceph-node1*
>
> It can not find */etc/ceph/ceph.client.admin.keyring,* and in directory I
> run ceph-deploy, it only have 3 files: ceph.conf, ceph-deploy-ceph.log,
> ceph.mon.keyring
>
> Regards,
> GiangLT
>
>
>
> 2017-10-26 22:47 GMT+07:00 Denes Dolhay :
>
>> Hi,
>>
>> If you ssh to ceph-node1, what are the rights, owner, group, content of
>> /etc/ceph/ceph.client.admin.keyring ?
>>
>> [you should mask out the key, just show us that it is there]
>>
>> On 10/26/2017 05:41 PM, GiangCoi Mr wrote:
>>
>> Hi Denes.
>> I created with command: ceph-deploy new ceph-node1
>>
>> Sent from my iPhone
>>
>> On Oct 26, 2017, at 10:34 PM, Denes Dolhay  wrote:
>>
>> Hi,
>>
>> Did you to create a cluster first?
>>
>> ceph-deploy new {initial-monitor-node(s)}
>>
>> Cheers,
>> Denes.
>>
>> On 10/26/2017 05:25 PM, GiangCoi Mr wrote:
>>
>> Dear Alan Johnson
>> I install with command: ceph-deploy install ceph-node1 —no-adjust-repos. 
>> When install success, I run command: ceph-deploy mon ceph-node1, it’s error 
>> because it didn’t find file ceph.client.admin.keyring. So how I make 
>> permission for this file?
>>
>> Sent from my iPhone
>>
>>
>> On Oct 26, 2017, at 10:18 PM, Alan Johnson  
>>  wrote:
>>
>> If using defaults try
>> chmod +r /etc/ceph/ceph.client.admin.keyring
>>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
>> ] On Behalf Of GiangCoi Mr
>> Sent: Thursday, October 26, 2017 11:09 AM
>> To: ceph-us...@ceph.com
>> Subject: [ceph-users] Install Ceph on Fedora 26
>>
>> Hi all
>> I am installing ceph luminous on fedora 26, I installed ceph luminous 
>> success but when I install ceph mon, it’s error: it doesn’t find 
>> client.admin.keyring. How I can fix it, Thank so much
>>
>> Regard,
>> GiangLT
>>
>> Sent from my iPhone
>> ___
>> ceph-users mailing 
>> listceph-us...@lists.ceph.comhttps://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIGaQ=4DxX-JX0i28X6V65hK0ft5M-1rZQeWgdMry9v8-eNr4=eqMv5yFFe6-lAM9jJfUusNFzzcFAGwmoAez_acfPOtw=YEG8qsLFsc0XjSKKJCIlkSn9C_WtCejsaUPv2p5ieRk=orrv_azJsm9kAmXQLjUHM6ClwXx-8oQFN89GyknIeN0=
>>
>> ___
>> ceph-users mailing 
>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Install Ceph on Fedora 26

2017-10-27 Thread Denes Dolhay

Hi,

According to the documentation It should have been created by 
"ceph-deploy new". Maybe a small problem due to the fedora is not on the 
recommended os list(?)


Either way, there is a document on manual deployment, and it contains 
the step to generate ceph.client.admin.keyring too:


http://docs.ceph.com/docs/master/rados/configuration/auth-config-ref/

You will probably have to go threw all the steps, because you are 
missing the mds, osd, rgw, mgr keyring too.



Cheers,

Denes.



On 10/27/2017 04:07 AM, GiangCoi Mr wrote:

Hi Denes Dolhay,
This is error when I run command: ceph-deploy mon create-initial

[ceph_deploy.mon][INFO  ] mon.ceph-node1 monitor has reached quorum!
[ceph_deploy.mon][INFO  ] all initial monitors are running and have 
formed quorum

[ceph_deploy.mon][INFO  ] Running gatherkeys...
[ceph_deploy.gatherkeys][DEBUG ] Checking ceph-node1 for 
/etc/ceph/ceph.client.admin.keyring

[ceph-node1][DEBUG ] connected to host: ceph-node1
[ceph-node1][DEBUG ] detect platform information from remote host
[ceph-node1][DEBUG ] detect machine type
[ceph-node1][DEBUG ] fetch remote file
*[ceph_deploy.gatherkeys][WARNIN] Unable to find 
/etc/ceph/ceph.client.admin.keyring on ceph-node1
[ceph_deploy][ERROR ] KeyNotFoundError: Could not find keyring file: 
/etc/ceph/ceph.client.admin.keyring on host ceph-node1*


It can not find */etc/ceph/ceph.client.admin.keyring,* and in 
directory I run ceph-deploy, it only have 3 files: ceph.conf, 
ceph-deploy-ceph.log, ceph.mon.keyring


Regards,
GiangLT
*
*


2017-10-26 22:47 GMT+07:00 Denes Dolhay >:


Hi,

If you ssh to ceph-node1, what are the rights, owner, group,
content of /etc/ceph/ceph.client.admin.keyring ?

[you should mask out the key, just show us that it is there]


On 10/26/2017 05:41 PM, GiangCoi Mr wrote:

Hi Denes.
I created with command: ceph-deploy new ceph-node1

Sent from my iPhone

On Oct 26, 2017, at 10:34 PM, Denes Dolhay > wrote:


Hi,

Did you to create a cluster first?

ceph-deploy  new  {initial-monitor-node(s)} Cheers, Denes.
On 10/26/2017 05:25 PM, GiangCoi Mr wrote:

Dear Alan Johnson
I install with command: ceph-deploy install ceph-node1 —no-adjust-repos. 
When install success, I run command: ceph-deploy mon ceph-node1, it’s error 
because it didn’t find file ceph.client.admin.keyring. So how I make permission 
for this file?

Sent from my iPhone


On Oct 26, 2017, at 10:18 PM, Alan Johnson 
  wrote:

If using defaults try
chmod +r /etc/ceph/ceph.client.admin.keyring

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com
] On Behalf Of GiangCoi Mr
Sent: Thursday, October 26, 2017 11:09 AM
To:ceph-us...@ceph.com 
Subject: [ceph-users] Install Ceph on Fedora 26

Hi all
I am installing ceph luminous on fedora 26, I installed ceph luminous 
success but when I install ceph mon, it’s error: it doesn’t find 
client.admin.keyring. How I can fix it, Thank so much

Regard,
GiangLT

Sent from my iPhone
___
ceph-users mailing list
ceph-users@lists.ceph.com 

https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIGaQ=4DxX-JX0i28X6V65hK0ft5M-1rZQeWgdMry9v8-eNr4=eqMv5yFFe6-lAM9jJfUusNFzzcFAGwmoAez_acfPOtw=YEG8qsLFsc0XjSKKJCIlkSn9C_WtCejsaUPv2p5ieRk=orrv_azJsm9kAmXQLjUHM6ClwXx-8oQFN89GyknIeN0=
  

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com