from:"John Petrini"

Re: [ceph-users] Ceph OSDs advice

2017-02-15 Thread John Petrini

You should subtract buffers and cache from the used memory to get a more
accurate representation of how much memory is actually available to
processes. In this case that puts you around 22G of used - or a better term
might be unavailable memory. Buffers and cache can be reallocated when
needed - it's just Linux taking advantage of memory under the theory why
not use it if it's there? Memory is fast so Linux will take advantage of it.

With 72 OSD's 22G of memory puts you below the 500MB/daemon that you've
mentioned so I don't think you have anything to be concerned about.

___

John Petrini

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Tue, Feb 14, 2017 at 11:24 PM, Khang Nguyễn Nhật <
nguyennhatkhang2...@gmail.com> wrote:

> Hi Sam,
> Thank for your reply. I use BTRFS file system on OSDs.
> Here is result of "*free -hw*":
>
>total used  freeshared
> buffers   cache available
> Mem:   125G 58G 31G1.2M3.7M
>   36G 60G
>
> and "*ceph df*":
>
> GLOBAL:
> SIZE AVAIL RAW USED %RAW USED
> 523T  522T1539G  0.29
> POOLS:
> NAME   ID USED %USED MAX AVAIL
> OBJECTS
> 
> default.rgw.buckets.data  92 597G  0.15  391T
>84392
> 
>
> I was reviced this a few minutes ago.
>
> 2017-02-15 10:50 GMT+07:00 Sam Huracan <nowitzki.sa...@gmail.com>:
>
>> Hi Khang,
>>
>> What file system do you use in OSD node?
>> XFS always use Memory for caching data before writing to disk.
>>
>> So, don't worry, it always holds memory in your system as much as
>> possible.
>>
>>
>>
>> 2017-02-15 10:35 GMT+07:00 Khang Nguyễn Nhật <
>> nguyennhatkhang2...@gmail.com>:
>>
>>> Hi all,
>>> My ceph OSDs is running on Fedora-server24 with config are:
>>> 128GB RAM DDR3, CPU Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 72 OSDs
>>> (8TB per OSD). My cluster was use ceph object gateway with S3 API. Now, it
>>> had contained 500GB data but it was used > 50GB RAM. I'm worry my OSD will
>>> dead if i continue put file to it. I had read "OSDs do not require as
>>> much RAM for regular operations (e.g., 500MB of RAM per daemon instance);
>>> however, during recovery they need significantly more RAM (e.g., ~1GB per
>>> 1TB of storage per daemon)." in Ceph Hardware Recommendations. Someone
>>> can give me advice on this issue? Thank
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Ceph-community] Consultation about ceph storage cluster architecture

2017-01-20 Thread John Petrini

Here's a really good write up on how to cluster NFS servers backed by RBD
volumes. It could be adapted to use CephFS with relative ease.

https://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]
<http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Fri, Jan 20, 2017 at 12:35 PM, Joao Eduardo Luis <j...@suse.de> wrote:

> Hi,
>
> This email is better suited for the 'ceph-users' list (CC'ed).
>
> You'll likely find more answers there.
>
>   -Joao
>
> On 01/20/2017 04:33 PM, hen shmuel wrote:
>
>> im new to Ceph and i want to build ceph storage cluster at my work site,
>> to provide NAS services to are clients, as NFS to are linux servers
>> clients, and as CIFS to are windows servers clients, to my understanding
>> in order to do that with ceph i need to:
>>
>>  1. build a full ceph storage cluster
>>  2. create CephFS "volumes" on my ceph cluster
>>  3. mount the CephFS to a linux server that will be used as a "gateway"
>> for export the CephFS as NFS to linux servers and as CIFS to windows
>> servers
>>  4. on my linux "gateway" server i need to install NFS server and SMB
>> Server to do the export part
>>  5. in order to overcome the "single point of failure" with this linux
>> "gateway" server in will need to build a cluster of them and use
>> clustered NFS Server and CTBD or something like that
>>
>>
>> i wanted to know if i understand it properly and if this is the right
>> way to do that or if there is a simplified way to achieved NAS services
>> like i want.
>>
>> thanks for any help!
>>
>>
>>
>> ___
>> Ceph-community mailing list
>> ceph-commun...@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph and container

2016-11-15 Thread John Petrini

I've had lots of success running monitors in VM's. Never tried the
container route but there is a ceph-docker project
https://github.com/ceph/ceph-docker if you want to give it a shot. I don't
know how highly recommended that it though, I've got no personal experience
with it.

No matter what you want to make sure you don't have a single point of
failure. There's not much point it having three monitors if they are all
going to run in containers/VM's on the same host.

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]
<http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Tue, Nov 15, 2016 at 9:25 AM, Matteo Dacrema <mdacr...@enter.eu> wrote:

> Hi,
>
> does anyone ever tried to run ceph monitors in containers?
> Could it lead to performance issues?
> Can I run monitor containers on the OSD nodes?
>
> I don’t want to buy 3 dedicated servers. Is there any other solution?
>
> Thanks
> Best regards
>
> Matteo Dacrema
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph and container

2016-11-15 Thread John Petrini

I forgot to mention that we are running 2 of our 3 monitors in VM's on our
OSD nodes. It's a small cluster with only two OSD nodes. The third monitor
is on a VM on a separate host. It works well but we made sure the OSD's had
plenty of extra resources to accommodate the VM's and the host OS is
running on SSD. I should also mention that this is a small cluster only
used for non-critical backups which is why we're comfortable with it.

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]
<http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Tue, Nov 15, 2016 at 5:14 PM, Matt Taylor <mtay...@mty.net.au> wrote:

> I think you may need to re-evaluate your situation. If you aren't willing
> to spend the $ on 3 Dedicated Servers, is your platform big enough to
> warrant the need for Ceph?
>
>
>
>
> On 16/11/16 01:25, Matteo Dacrema wrote:
>
>> Hi,
>>
>> does anyone ever tried to run ceph monitors in containers?
>> Could it lead to performance issues?
>> Can I run monitor containers on the OSD nodes?
>>
>> I don’t want to buy 3 dedicated servers. Is there any other solution?
>>
>> Thanks
>> Best regards
>>
>> Matteo Dacrema
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread John Petrini

What command are you using to start your OSD's?

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]
<http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Tue, Nov 29, 2016 at 7:19 PM, Mike Jacobacci <mi...@flowjo.com> wrote:

> I was able to bring the osd's up by looking at my other OSD node which is
> the exact same hardware/disks and finding out which disks map.  But I still
> cant bring up any of the start ceph-disk@dev-sd* services... When I first
> installed the cluster and got the OSD's up, I had to run the following:
>
> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>
> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>
> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>
> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>
> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>
> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>
> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>
> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>
> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>
> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>
>
> Do i need to run that again?
>
>
> Cheers,
>
> Mike
>
> On Tue, Nov 29, 2016 at 4:13 PM, Sean Redmond <sean.redmo...@gmail.com>
> wrote:
>
>> Normally they mount based upon the gpt label, if it's not working you can
>> mount the disk under /mnt and then cat the file called whoami to find out
>> the osd number
>>
>> On 29 Nov 2016 23:56, "Mike Jacobacci" <mi...@flowjo.com> wrote:
>>
>>> OK I am in some trouble now and would love some help!  After updating
>>> none of the OSDs on the node will come back up:
>>>
>>> ● ceph-disk@dev-sdb1.service
>>>loaded failed failedCeph disk activation: /dev/sdb1
>>> ● ceph-disk@dev-sdb2.service
>>>loaded failed failedCeph disk activation: /dev/sdb2
>>> ● ceph-disk@dev-sdb3.service
>>>loaded failed failedCeph disk activation: /dev/sdb3
>>> ● ceph-disk@dev-sdb4.service
>>>loaded failed failedCeph disk activation: /dev/sdb4
>>> ● ceph-disk@dev-sdb5.service
>>>loaded failed failedCeph disk activation: /dev/sdb5
>>> ● ceph-disk@dev-sdc1.service
>>>loaded failed failedCeph disk activation: /dev/sdc1
>>> ● ceph-disk@dev-sdc2.service
>>>loaded failed failedCeph disk activation: /dev/sdc2
>>> ● ceph-disk@dev-sdc3.service
>>>loaded failed failedCeph disk activation: /dev/sdc3
>>> ● ceph-disk@dev-sdc4.service
>>>loaded failed failedCeph disk activation: /dev/sdc4
>>> ● ceph-disk@dev-sdc5.service
>>>loaded failed failedCeph disk activation: /dev/sdc5
>>> ● ceph-disk@dev-sdd1.service
>>>loaded failed failedCeph disk activation: /dev/sdd1
>>> ● ceph-disk@dev-sde1.service
>>>loaded failed failedCeph disk activation: /dev/sde1
>>> ● ceph-disk@dev-sdf1.service
>>>loaded failed failedCeph disk activation: /dev/sdf1
>>> ● ceph-disk@dev-sdg1.service
>>>loaded failed failedCeph disk activation: /dev/sdg1
>>> ● ceph-disk@dev-sdh1.service
>>>loaded failed failedCeph disk activation: /dev/sdh1
>>> ● ceph-disk@dev-sdi1.service
>>>loaded failed failedCeph disk activation: /dev/sdi1
>>> ● ceph-disk@dev-sdj1.service
>>>loaded failed failedCeph disk activation: /dev/sdj1
>>> ● ceph-disk@dev-sdk1.service
>>>loaded failed failedCeph disk activation: /dev/sdk1
>>&

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread John Petrini

Also, don't run sgdisk again; that's just for creating the journal
partitions. ceph-disk is a service used for prepping disks, only the OSD
services need to be running as far as I know. Are the ceph-osd@x. services
running now that you've mounted the disks?

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]
<http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Tue, Nov 29, 2016 at 7:27 PM, John Petrini <jpetr...@coredial.com> wrote:

> What command are you using to start your OSD's?
>
> ___
>
> John Petrini
>
> NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
>//   [image: Twitter] <https://twitter.com/coredial>   [image:
> LinkedIn] <http://www.linkedin.com/company/99631>   [image: Google Plus]
> <https://plus.google.com/104062177220750809525/posts>   [image: Blog]
> <http://success.coredial.com/blog>
> Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
> *P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
> jpetr...@coredial.com
>
> [image: Exceptional people. Proven Processes. Innovative Technology.
> Discover CoreDial - watch our video]
> <http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission,  dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>
> On Tue, Nov 29, 2016 at 7:19 PM, Mike Jacobacci <mi...@flowjo.com> wrote:
>
>> I was able to bring the osd's up by looking at my other OSD node which is
>> the exact same hardware/disks and finding out which disks map.  But I still
>> cant bring up any of the start ceph-disk@dev-sd* services... When I
>> first installed the cluster and got the OSD's up, I had to run the
>> following:
>>
>> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>
>> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>
>> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>
>> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>
>> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>
>> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>
>> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>
>> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>
>> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>
>> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>
>>
>> Do i need to run that again?
>>
>>
>> Cheers,
>>
>> Mike
>>
>> On Tue, Nov 29, 2016 at 4:13 PM, Sean Redmond <sean.redmo...@gmail.com>
>> wrote:
>>
>>> Normally they mount based upon the gpt label, if it's not working you
>>> can mount the disk under /mnt and then cat the file called whoami to find
>>> out the osd number
>>>
>>> On 29 Nov 2016 23:56, "Mike Jacobacci" <mi...@flowjo.com> wrote:
>>>
>>>> OK I am in some trouble now and would love some help!  After updating
>>>> none of the OSDs on the node will come back up:
>>>>
>>>> ● ceph-disk@dev-sdb1.service
>>>>loaded failed failedCeph disk activation: /dev/sdb1
>>>> ● ceph-disk@dev-sdb2.service
>>>>loaded failed failedCeph disk activation: /dev/sdb2
>

Re: [ceph-users] OSDs down after reboot

2016-12-09 Thread John Petrini

Try using systemctl start ceph-osd*

I usually refer to this documentation for ceph + systemd
https://www.suse.com/documentation/ses-1/book_storage_admin/data/ceph_operating_services.html

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]
<http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Fri, Dec 9, 2016 at 11:13 AM, sandeep.cool...@gmail.com <
sandeep.cool...@gmail.com> wrote:

> Hi,
>
> Im using jewel (10.2.4)  release on centos 7.2, after rebooting one of the
> OSD node, the osd doesn't start. Even after trying the 'systemctl start
> ceph-osd@.service'.
> Does we have to make entry for in fstab for our ceph osd's folder or ceph
> does it automatically?
>
> Then i mounted the correct partition on my disks and tried  'systemctl
> start ceph-osd@.service' , but still osd doesn't comes up.
>
> But when i try with 'ceph-osd -i 1' , the OSD comes UP.
>
> tried searching online but couldn't find anything concrete on this
> problem. Is systemd scripts has some bugs, or im missing something here??
>
> Regards,
> Sandeep
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Calamari or Alternative

2017-01-12 Thread John Petrini

I used Calamari before making the move to Ubuntu 16.04 and upgrading to
Jewel. At the time I tried to install it on 16.04 but couldn't get it
working.

I'm now using ceph-dash  along with
the nagios plugin check_ceph_dash
 and I've found that this
gets me everything I need. A nice looking dashboard, graphs and alerting on
the most important stats.

Another plus is that it's incredibly easy to setup; you can have the
dashboard up and running in five minutes.

___

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Fri, Jan 13, 2017 at 12:06 AM, Tu Holmes  wrote:

> Hey Cephers.
>
> Question for you.
>
> Do you guys use Calamari or an alternative?
>
> If so, why has the installation of Calamari not really gotten much better
> recently.
>
> Are you still building the vagrant installers and building packages?
>
> Just wondering what you are all doing.
>
> Thanks.
>
> //Tu
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Adding second interface to storage network - issue

2016-11-30 Thread John Petrini

For redundancy I would suggest bonding the interfaces using LACP that way
both ports are combined under the same interface with the same IP. They
will both send and receive traffic and if one link goes down the other
continues to work. The ports will need to be configured for LACP on the
switch as well.

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]
<http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Wed, Nov 30, 2016 at 12:15 PM, Mike Jacobacci <mi...@flowjo.com> wrote:

> I ran into an interesting issue last night when I tried to add a second
> storage interface.  The original 10gb storage interface on the OSD node was
> only set at 1500 MTU, so the plan was to bump it to 9000 and configure the
> second interface the same way with a diff IP and reboot. Once I did that,
> for some reason the original interface showed active but would not respond
> to ping from the other OSD nodes, the second interface I added came up and
> was reachable.  So even though the node could still communicate to the
> others on the second interface, PG's would start remapping and would get
> stuck at about 300 (of 1024).  I resolved the issue by changing the config
> back on the original interface and disabling the second.  After a Reboot,
> PG's recovered very quickly.
>
> It seemed that the remapping would only go partially because the first
> node could reach the others, but they couldn't reach the original interface
> and didn't use the newly added second. So for my questions:
>
> Is there a proper way to add an additional interface (for redundancy) to
> the storage network so that it's recognized by the cluster?
>
> If IPV6 is enabled on a storage interface when the cluster was created,
> would it be a problem to disable it now?
>
> Cheers,
> Mike
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Adding second interface to storage network - issue

2016-11-30 Thread John Petrini

Yes that should work. Though I'd be weary of increasing the MTU to 9000 as
this could introduce other issues. Jumbo Frames don't provide a very
significant performance increase so I wouldn't recommend it unless you have
a very good reason to make the change. If you do want to go down that path
I'd suggest getting LACP configured on all of the nodes before upping the
MTU and even then make sure you understand the requirement of a larger MTU
size before introducing it on your network.

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]
<http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Wed, Nov 30, 2016 at 1:01 PM, Mike Jacobacci <mi...@flowjo.com> wrote:

> Hi John,
>
> Thanks that makes sense... So I take it if I use the same IP for the bond,
> I shouldn't run into the issues I ran into last night?
>
> Cheers,
> Mike
>
> On Wed, Nov 30, 2016 at 9:55 AM, John Petrini <jpetr...@coredial.com>
> wrote:
>
>> For redundancy I would suggest bonding the interfaces using LACP that way
>> both ports are combined under the same interface with the same IP. They
>> will both send and receive traffic and if one link goes down the other
>> continues to work. The ports will need to be configured for LACP on the
>> switch as well.
>>
>> ___
>>
>> John Petrini
>>
>> NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
>>//   [image: Twitter] <https://twitter.com/coredial>   [image:
>> LinkedIn] <http://www.linkedin.com/company/99631>   [image: Google Plus]
>> <https://plus.google.com/104062177220750809525/posts>   [image: Blog]
>> <http://success.coredial.com/blog>
>> Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
>> *P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
>> jpetr...@coredial.com
>>
>> [image: Exceptional people. Proven Processes. Innovative Technology.
>> Discover CoreDial - watch our video]
>> <http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>
>>
>> The information transmitted is intended only for the person or entity to
>> which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission,  dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipient is prohibited. If you received
>> this in error, please contact the sender and delete the material from any
>> computer.
>>
>> On Wed, Nov 30, 2016 at 12:15 PM, Mike Jacobacci <mi...@flowjo.com>
>> wrote:
>>
>>> I ran into an interesting issue last night when I tried to add a second
>>> storage interface.  The original 10gb storage interface on the OSD node was
>>> only set at 1500 MTU, so the plan was to bump it to 9000 and configure the
>>> second interface the same way with a diff IP and reboot. Once I did that,
>>> for some reason the original interface showed active but would not respond
>>> to ping from the other OSD nodes, the second interface I added came up and
>>> was reachable.  So even though the node could still communicate to the
>>> others on the second interface, PG's would start remapping and would get
>>> stuck at about 300 (of 1024).  I resolved the issue by changing the config
>>> back on the original interface and disabling the second.  After a Reboot,
>>> PG's recovered very quickly.
>>>
>>> It seemed that the remapping would only go partially because the first
>>> node could reach the others, but they couldn't reach the original interface
>>> and didn't use the newly added second. So fo

Re: [ceph-users] osd down detection broken in jewel?

2016-11-30 Thread John Petrini

It's right there in your config.

mon osd report timeout = 900

See:
http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]
<http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Wed, Nov 30, 2016 at 6:39 AM, Manuel Lausch <manuel.lau...@1und1.de>
wrote:

> Hi,
>
> In a test with ceph jewel we tested how long the cluster needs to detect
> and mark down OSDs after they are killed (with kill -9). The result -> 900
> seconds.
>
> In Hammer this took about 20 - 30 seconds.
>
> In the Logfile from the leader monitor are a lot of messeages like
> 2016-11-30 11:32:20.966567 7f158f5ab700  0 log_channel(cluster) log [DBG]
> : osd.7 10.78.43.141:8120/106673 reported failed by osd.272
> 10.78.43.145:8106/117053
> A deeper look at this. A lot of OSDs reported this exactly one time. In
> Hammer The OSDs reported a down OSD a few more times.
>
> Finaly there is the following and the osd is marked down.
> 2016-11-30 11:36:22.633253 7f158fdac700  0 log_channel(cluster) log [INF]
> : osd.7 marked down after no pg stats for 900.982893seconds
>
> In my ceph.conf I have the following lines in the global section
> mon osd min down reporters = 10
> mon osd min down reports = 3
> mon osd report timeout = 900
>
> It seems the parameter "mon osd min down reports" is removed in jewel but
> the documentation is not updated -> http://docs.ceph.com/docs/jewe
> l/rados/configuration/mon-osd-interaction/
>
>
> Can someone tell me how ceph jewel detects down OSDs and mark them down in
> a appropriated time?
>
>
> The Cluster:
> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
> 24 hosts á 60 OSDs -> 1440 OSDs
> 2 pool with replication factor 4
> 65536 PGs
> 5 Mons
>
> --
> Manuel Lausch
>
> Systemadministrator
> Cloud Services
>
> 1&1 Mail & Media Development & Technology GmbH | Brauerstraße 48 | 76135
> Karlsruhe | Germany
> Phone: +49 721 91374-1847
> E-Mail: manuel.lau...@1und1.de | Web: www.1und1.de
>
> Amtsgericht Montabaur, HRB 5452
>
> Geschäftsführer: Frank Einhellinger, Thomas Ludwig, Jan Oetjen
>
>
> Member of United Internet
>
> Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte
> Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind
> oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie bitte den
> Absender und vernichten Sie diese E-Mail. Anderen als dem
> bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern,
> weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden.
>
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient of this e-mail, you are hereby notified that
> saving, distribution or use of the content of this e-mail in any way is
> prohibited. If you have received this e-mail in error, please notify the
> sender and delete the e-mail.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph - even filling disks

2016-12-01 Thread John Petrini

You can reweight the OSD's either automatically based on utilization (ceph
osd reweight-by-utilization) or by hand.

See:
https://ceph.com/planet/ceph-osd-reweight/
http://docs.ceph.com/docs/master/rados/operations/control/#osd-subsystem

It's probably not ideal to have OSD's of such different sizes on a node.

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]
<http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Fri, Dec 2, 2016 at 12:36 AM, Волков Павел (Мобилон) <vol...@mobilon.ru>
wrote:

> Good day.
>
> I have set up the repository ceph and created several pools on the hdd
> 4TB. My problem lies in uneven filling hdd.
>
>
>
> root@ceph-node1:~# df -H
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/sda1   236G  2.7G  221G   2% /
>
> none4.1k 0  4.1k   0% /sys/fs/cgroup
>
> udev 30G  4.1k   30G   1% /dev
>
> tmpfs   6.0G  1.1M  6.0G   1% /run
>
> none5.3M 0  5.3M   0% /run/lock
>
> none 30G  8.2k   30G   1% /run/shm
>
> none105M 0  105M   0% /run/user
>
> */dev/sdf1   4.0T  1.7T  2.4T  42% /var/lib/ceph/osd/ceph-4*
>
> /dev/sdg1   395G  329G   66G  84% /var/lib/ceph/osd/ceph-5
>
> /dev/sdi1   195G  152G   44G  78% /var/lib/ceph/osd/ceph-7
>
> */dev/sdd1   4.0T  1.7T  2.4T  41% /var/lib/ceph/osd/ceph-2*
>
> /dev/sdh1   395G  330G   65G  84% /var/lib/ceph/osd/ceph-6
>
> */dev/sdb1   4.0T  1.9T  2.2T  46% /var/lib/ceph/osd/ceph-0*
>
> */dev/sde1   4.0T  2.1T  2.0T  51% /var/lib/ceph/osd/ceph-3*
>
> */dev/sdc1   4.0T  1.8T  2.3T  45% /var/lib/ceph/osd/ceph-1*
>
>
>
>
>
> On the test machine, this leads to an overflow error CDM and further
> incorrect operation.
>
> How to make that all hdd filled equally?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Estimate Max IOPS of Cluster

2017-01-04 Thread John Petrini

Thank you both for the tools an suggestions. I expected the response "there
are many variables" but this gives me a place to start in determining what
our configuration is capable of.

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]
<http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Wed, Jan 4, 2017 at 11:58 AM, Maged Mokhtar <mmokh...@petasan.org> wrote:

>
> if you are asking about what tools to use:
> http://tracker.ceph.com/projects/ceph/wiki/Benchmark_
> Ceph_Cluster_Performance
>
> You should run many concurrent processes on different clients
>
>
> *From:* Maged Mokhtar <mmokh...@petasan.org>
> *Sent:* Wednesday, January 04, 2017 6:45 PM
> *To:* John Petrini <jpetr...@coredial.com> ; ceph-users
> <ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] Estimate Max IOPS of Cluster
>
>
> Max iops  depends on the hardware type/configuration for disks/cpu/network.
>
> For disks, the theoretical iops limit is
> read  = physical disk iops x number of disks
> write (with journal on same disk) = physical disk iops x number of disks /
> num of replicas / 3
> in practice real benchmarks will vary widely from this, I've seen numbers
> from 30 to 80 % of theoretical value.
>
> When the number of disks/cpu cores is high, the cpu bottleneck kicks in,
> again it depends on hardware but you could use a performance tool such as
> atop to know when this happens on your setup. There is no theoretical
> measure of this, but one good analysis i find is Nick Fisk:
> http://www.sys-pro.co.uk/how-many-mhz-does-a-ceph-io-need/
>
>
> Cheers
> /Maged
>
> *From:* John Petrini <jpetr...@coredial.com>
> *Sent:* Tuesday, January 03, 2017 10:15 PM
> *To:* ceph-users <ceph-users@lists.ceph.com>
> *Subject:* [ceph-users] Estimate Max IOPS of Cluster
>
> Hello,
>
> Does any one have a reasonably accurate way to determine the max IOPS of a
> Ceph cluster?
>
> Thank You,
>
> ___
>
> John Petrini
>
> --
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> --
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Estimate Max IOPS of Cluster

2017-01-03 Thread John Petrini

Hello,

Does any one have a reasonably accurate way to determine the max IOPS of a
Ceph cluster?

Thank You,

___

John Petrini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] clock skew

2017-04-01 Thread John Petrini

Just ntp.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] clock skew

2017-04-01 Thread John Petrini

Hello,

I'm also curious about the impact of clock drift. We see the same on both
of our clusters despite trying various NTP servers including our own local
servers. Ultimately we just ended up adjusting our monitoring to be less
sensitive to it since the clock drift always resolves on its own. Is this a
dangerous practice?

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
<http://www.linkedin.com/company/99631>   [image: Google Plus]
<https://plus.google.com/104062177220750809525/posts>   [image: Blog]
<http://success.coredial.com/blog>
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

Interested in sponsoring PartnerConnex 2017? Learn more.
<http://success.coredial.com/partnerconnex-2017-sponsorship>

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Sat, Apr 1, 2017 at 9:12 AM, mj <li...@merit.unu.edu> wrote:

> Hi,
>
> On 04/01/2017 02:10 PM, Wido den Hollander wrote:
>
>> You could try the chrony NTP daemon instead of ntpd and make sure all
>> MONs are peers from each other.
>>
> I understand now what that means. I have set it up according to your
> suggestion.
>
> Curious to see how this works out, thanks!
>
>
> MJ
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] High iowait on OSD node

2017-07-27 Thread John Petrini

Hello list,

Just curious if anyone has ever seen this behavior and might have some
ideas on how to troubleshoot it.

We're seeing very high iowait in iostat across all OSD's in on a single OSD
host. It's very spiky - dropping to zero and then shooting up to as high as
400 in some cases. Despite this it does not seem to be having a major
impact on the cluster performance as a whole.

Some more details:
3x OSD Nodes - Dell R730's: 24 cores @2.6GHz, 256GB RAM, 20x 1.2TB 10K SAS
OSD's per node.

We're running ceph hammer.

Here's the output of iostat. Note that this is from a period when the
cluster is not very busy but you can still see high spikes on a few OSD's.
It's much worse during high load.

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.000.50 0.00 6.0024.00
0.008.000.008.00   8.00   0.40
sdb   0.00 0.000.00   60.00 0.00   808.0026.93
0.000.070.000.07   0.03   0.20
sdc   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdd   0.00 0.000.00   67.00 0.00  1010.0030.15
0.010.090.000.09   0.09   0.60
sde   0.00 0.000.00   93.00 0.00   868.0018.67
0.000.040.000.04   0.04   0.40
sdf   0.00 0.000.00   57.50 0.00   572.0019.90
0.000.030.000.03   0.03   0.20
sdg   0.00 1.000.003.50 0.0022.0012.57
0.75   16.000.00   16.00   2.86   1.00
sdh   0.00 0.001.50   25.50 6.00   458.5034.41
2.03   75.260.00   79.69   3.04   8.20
sdi   0.00 0.000.00   30.50 0.00   384.5025.21
2.36   77.510.00   77.51   3.28  10.00
sdj   0.00 1.001.50  105.00 6.00   925.7517.50
   10.85  101.848.00  103.18   2.35  25.00
sdl   0.00 0.002.000.00   320.00 0.00   320.00
0.013.003.000.00   2.00   0.40
sdk   0.00 1.000.00   55.00 0.00   334.5012.16
7.92  136.910.00  136.91   2.51  13.80
sdm   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdn   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdo   0.00 0.001.000.00 4.00 0.00 8.00
0.004.004.000.00   4.00   0.40
sdp   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdq   0.50 0.00  756.000.00 93288.00 0.00   246.79
1.471.951.950.00   1.17  88.60
sdr   0.00 0.001.000.00 4.00 0.00 8.00
0.004.004.000.00   4.00   0.40
sds   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdt   0.00 0.000.00   36.50 0.00   643.5035.26
3.49   95.730.00   95.73   2.63   9.60
sdu   0.00 0.000.00   21.00 0.00   323.2530.79
0.78   37.240.00   37.24   2.95   6.20
sdv   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdw   0.00 0.000.00   31.00 0.00   689.5044.48
2.48   80.060.00   80.06   3.29  10.20
sdx   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
dm-0  0.00 0.000.000.50 0.00 6.0024.00
0.008.000.008.00   8.00   0.40
dm-1  0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-12 Thread John Petrini

We use the following in our ceph.conf for MDS failover. We're running one
active and one standby. Last time it failed over there was about 2 minutes
of downtime before the mounts started responding again but it did recover
gracefully.

[mds]
max_mds = 1
mds_standby_for_rank = 0
mds_standby_replay = true

___

John Petrini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS | flapping OSD locked up NFS

2017-06-19 Thread John Petrini

Hi David,

While I have no personal experience with this; from what I've been told, if
you're going to export cephfs over NFS it's recommended that you use a
userspace implementation of NFS (like nfs-ganesha) rather than
nfs-kernel-server. This may be the source of you issues and might be worth
testing. I'd be interested to hear the results if you do.

___

John Petrini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High Load and High Apply Latency

2017-12-14 Thread John Petrini

Anyone have any ideas on this?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High Load and High Apply Latency

2017-12-18 Thread John Petrini

Hi David,

Thanks for the info. The controller in the server (perc h730) was just
replaced and the battery is at full health. Prior to replacing the
controller I was seeing very high iowait when running iostat but I no
longer see that behavior - just apply latency when running ceph osd perf.
Since there's no iowait it makes me believe that the latency is not being
introduced by the hardware; though I'm not ruling it out completely. I'd
like to know what I can do to get a better understanding of what the OSD
processes are so busy doing because they are working much harder on this
server than the others.

On Thu, Dec 14, 2017 at 11:33 AM, David Turner <drakonst...@gmail.com>
wrote:

> We show high disk latencies on a node when the controller's cache battery
> dies.  This is assuming that you're using a controller with cache enabled
> for your disks.  In any case, I would look at the hardware on the server.
>
> On Thu, Dec 14, 2017 at 10:15 AM John Petrini <jpetr...@coredial.com>
> wrote:
>
>> Anyone have any ideas on this?
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High Load and High Apply Latency

2017-12-18 Thread John Petrini

Another strange thing I'm seeing is that two of the nodes in the cluster
have some OSD's with almost no activity. If I watch top long enough I'll
eventually see cpu utilization on these osds but for the most part they sit
a 0% cpu utilization. I'm not sure if this is expected behavior or not
though. I have another cluster running the same version of ceph that has
the same symptom but the osds in our jewel cluster always show activity.

John Petrini
Platforms Engineer

[image: Call CoreDial] 215.297.4400 x 232 <215-297-4400>
[image: Call CoreDial] www.coredial.com <https://coredial.com/>
[image: CoreDial] 751 Arbor Way, Hillcrest I, Suite 150 Blue Bell, PA 19422
<https://www.google.com/maps/place/CoreDial,+LLC/@40.140902,-75.2878857,17z/data=!3m1!4b1!4m5!3m4!1s0x89c6bc587f1cfd47:0x4c79d505f2ee580b!8m2!3d40.140902!4d-75.285697>
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Mon, Dec 18, 2017 at 11:51 AM, John Petrini <jpetr...@coredial.com>
wrote:

> Hi David,
>
> Thanks for the info. The controller in the server (perc h730) was just
> replaced and the battery is at full health. Prior to replacing the
> controller I was seeing very high iowait when running iostat but I no
> longer see that behavior - just apply latency when running ceph osd perf.
> Since there's no iowait it makes me believe that the latency is not being
> introduced by the hardware; though I'm not ruling it out completely. I'd
> like to know what I can do to get a better understanding of what the OSD
> processes are so busy doing because they are working much harder on this
> server than the others.
>
>
>
>
>
> On Thu, Dec 14, 2017 at 11:33 AM, David Turner <drakonst...@gmail.com>
> wrote:
>
>> We show high disk latencies on a node when the controller's cache battery
>> dies.  This is assuming that you're using a controller with cache enabled
>> for your disks.  In any case, I would look at the hardware on the server.
>>
>> On Thu, Dec 14, 2017 at 10:15 AM John Petrini <jpetr...@coredial.com>
>> wrote:
>>
>>> Anyone have any ideas on this?
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] High Load and High Apply Latency

2017-12-11 Thread John Petrini

Hi List,

I've got a 5 OSD node cluster running hammer. All of the OSD servers are
identical but one has about 3-4x higher load than the others and the OSD's
in this node are reporting high apply latency.

The cause of the load appears to be the OSD processes. About half of the
OSD processes are using between 100-185% CPU putting keeping the proc
pegged around 85% utilization overall. In comparison others servers in the
cluster are sitting around 30% CPU utilization and are report ~1.5ms of
apply latency.

A few days ago I restarted the OSD processes and the problem went away but
now three days later it has returned. I don't see anything in the logs and
there's no iowait on the disks.

Anyone have any ideas on how I can troubleshoot this further?

Thank You,

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph on Public IP

2018-01-07 Thread John Petrini

I think what you're looking for is the public bind addr option.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph on Public IP

2018-01-08 Thread John Petrini

ceph will always bind to the local IP. It can't bind to an IP that isn't
assigned directly to the server such as a NAT'd IP. So your public network
should be the local network that's configured on each server. If you
cluster network is 10.128.0.0/16 for instance your public network might be
10.129.0.0/16.

The public bind addr allows you to specify a NAT'd IP for each of your
monitors. You monitors will then advertise this IP address so that your
clients know to reach them at their NAT'd IP's rather than their local
IP's.

This does NOT apply for OSD IP's. Your clients must be able to route to the
OSD's directly. If your OSD servers are behind a NAT I don't think that
configuration is possible nor do I think it would be a good idea to route
your storage traffic through a NAT.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd crush reweight 0 on "out" OSD causes backfilling?

2018-02-13 Thread John Petrini

The rule of thumb is to reweight to 0 prior to marking out. This should
avoid causing data movement twice as you're experiencing.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High Load and High Apply Latency

2018-02-16 Thread John Petrini

I thought I'd follow up on this just in case anyone else experiences
similar issues. We ended up increasing the tcmalloc thread cache size and
saw a huge improvement in latency. This got us out of the woods because we
were finally in a state where performance was good enough that it was no
longer impacting services.

The tcmalloc issues are pretty well documented on this mailing list and I
don't believe they impact newer versions of Ceph but I thought I'd at least
give a data point. After making this change our average apply latency
dropped to 3.46ms during peak business hours. To give you an idea of how
significant that is here's a graph of the apply latency prior to the
change: https://imgur.com/KYUETvD

This however did not resolve all of our issues. We were still seeing high
iowait (repeated spikes up to 400ms) on three of our OSD nodes on all
disks. We tried replacing the RAID controller (PERC H730) on these nodes
and while this resolved the issue on one server the two others remained
problematic. These two nodes were configured differently than the rest.
They'd been configured in non-raid mode while the others were configured as
individual raid-0. This turned out to be the problem. We ended up removing
the two nodes one at a time and rebuilding them with their disks configured
in independent raid-0 instead of non-raid. After this change iowait rarely
spikes above 15ms and averages <1ms.

I was really surprised at the performance impact when using non-raid mode.
While I realize non-raid bypasses the controller cache I still would have
never expected such high latency. Dell has a whitepaper that recommends
using individual raid-0 but their own tests show only a small performance
advantage over non-raid. Note that we are running SAS disks, they actually
recommend non-raid mode for SATA but I have not tested this. You can view
the whtiepaper here:
http://en.community.dell.com/techcenter/cloud/m/dell_cloud_resources/20442913/download

I hope this helps someone.

John Petrini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Automated Failover of CephFS Clients

2018-02-20 Thread John Petrini

I just wanted to add that even if you only provide one monitor IP the
client will learn about the other monitors on mount so failover will still
work. This only presents a problem when you try to remount or reboot a
client while the monitor it's using is unavailable.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High Load and High Apply Latency

2017-12-20 Thread John Petrini

Hello,

Looking at perf top it looks as though Ceph is spending most of it's CPU
cycles on tcmalloc. Looking around online i found that this is a known
issue and in fact I found this guide on how to increase the tcmalloc thread
cache size:
https://swamireddy.wordpress.com/2017/01/27/increase-tcmalloc-thread-cache-bytes/.
Is this the right step to take toward fixing this issue?

Here's the output of perf report that shows this behavior.
http://paste.openstack.org/show/629490/

Thanks,

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Least impact when adding PG's

2018-08-06 Thread John Petrini

Hello List,

We're planning to add a couple new OSD nodes to one of our clusters but
we've reached the point where we need to increase PG's before doing so. Our
ratio is currently 52pg's per OSD.

Based on the PG calc we need to make the following increases:

compute - 1024 => 4096
images 512 => 2048

My question is what is the safest, least impactful way to do this. Our
workload is very latency sensitive so when we do maintenance on the cluster
we always try to minimize impact best we can. When adding/removing disks or
nodes we accomplish this by gradually reweighting. Is there a similar
technique we can use in this case?

Also I'd like to know how many PG's is recommended to increase at a time.

Any advice is appreciated.

Thanks,

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Least impact when adding PG's

2018-08-07 Thread John Petrini

Hi All,

Any advice?

Thanks,

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk write cache - safe?

2018-03-15 Thread John Petrini

I had a recent battle with performance on two of our nodes and it turned
out to be a result of using non-raid mode. We ended up rebuilding them one
by one in raid-0 with controller cache enabled on the OSD disks. I
discussed it on the mailing list:
https://www.spinics.net/lists/ceph-users/msg42756.html. The r730 controller
has a battery so I don't think there's a reason to be concerned about
moving to raid-0 w/cache.


John Petrini
Platforms Engineer

[image: Call CoreDial] 215.297.4400 x 232 <215-297-4400>
[image: Call CoreDial] www.coredial.com <https://coredial.com/>
[image: CoreDial] 751 Arbor Way, Hillcrest I, Suite 150 Blue Bell, PA 19422
<https://www.google.com/maps/place/CoreDial,+LLC/@40.140902,-75.2878857,17z/data=!3m1!4b1!4m5!3m4!1s0x89c6bc587f1cfd47:0x4c79d505f2ee580b!8m2!3d40.140902!4d-75.285697>
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Thu, Mar 15, 2018 at 3:09 PM, Tim Bishop <tim-li...@bishnet.net> wrote:

> Thank you Christian, David, and Reed for your responses.
>
> My servers have the Dell H730 RAID controller in them, but I have the
> OSD disks in Non-RAID mode. When initially testing I compared single
> RAID-0 containers with Non-RAID and the Non-RAID performance was
> acceptable, so I opted for the configuration with less components
> between Ceph and the disks. This seemed to be the safer approach at the
> time.
>
> What I obviously hadn't realised was that the drive caches were enabled.
> Without those caches the difference is much greater, and the latency
> is now becoming a problem.
>
> My reading of the documentation led me to think along the lines
> Christian mentions below - that is, that data in flight would be lost,
> but that the disks should be consistent and still usable. But it would
> be nice to get confirmation of whether that holds for Bluestore.
> However, it looks like this wasn't the case for Reed, although perhaps
> that was at an earlier time when Ceph and/or Linux didn't handle things
> was well?
>
> I had also thought that our power supply was pretty safe - redundant
> PSUs with independent feeds, redundant UPSs, and a generator. But Reed's
> experiences certainly highlight that even that can fail, so it was good
> to hear that from someone else rather than experience it first hand.
>
> I do have tape backups, but recovery would be a pain, so based on all
> your comments I'll leave the drive caches off and look at using the RAID
> controller cache with its BBU instead.
>
> Tim.
>
> On Thu, Mar 15, 2018 at 04:13:49PM +0900, Christian Balzer wrote:
> > Hello,
> >
> > what has been said by others before is essentially true, as in if you
> want:
> >
> > - as much data conservation as possible and have
> > - RAID controllers with decent amounts of cache and a BBU
> >
> > then disabling the on disk cache is the way to go.
> >
> > But as you found out, w/o those caches and a controller cache to replace
> > them, performance will tank.
> >
> > And of course any data only in the pagecache (dirty) and not yet flushed
> > to the controller/disks is lost anyway in a power failure.
> >
> > All current FS _should_ be powerfail safe (barriers) in the sense that
> you
> > may loose the data in the disk caches (if properly exposed to the OS and
> > the controller or disk not lying about having written data to disk) but
> > the FS will be consistent and not "all will be lost".
> >
> > I'm hoping that this is true for Bluestore, but somebody needs to do that
> > testing.
> >
> > So if you can live with the loss of the in-transit data in the disk
> caches
> > in addition to the pagecache and/or you trust your DC never to loose
> > power, go ahead and get re-enable the disk caches.
> >
> > If you have the money and need for a sound happy sleep, do the BBU
> > controller cache dance.
> > Some controllers (Areca comes to mind) actually manage to IT mode style
> > exposure of the disks and still use their HW cache.
> >
> > Christian
>
> --
> Tim Bishop
> http://www.bishnet.net/tim/
> PGP Key: 0x6C226B37FDF38D55
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New Ceph cluster design

2018-03-09 Thread John Petrini

What you linked was only a 2 week test. When Ceph is healthy it does not
need a lot of RAM, it's during recovery that OOM appears and that's when
you'll find yourself upgrading the RAM on your nodes just to stop OOM and
allow the cluster to recover. Look through the mailing list and you'll see
that this is one of the most common mistakes made when spec'ing hardware
for Ceph.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Best way to remove an OSD node

2018-04-16 Thread John Petrini

There's a gentle reweight python script floating around on the net that
does this. It gradually reduces the weight of each osd one by one waiting
for rebalance to complete each time.

I've never used it and it may not work on all versions so I'd make sure to
test it.

That or do it manually but that's a tedius process.

On Mon, Apr 16, 2018, 06:38 Caspar Smit  wrote:

> Hi All,
>
> What would be the best way to remove an entire OSD node from a cluster?
>
> I've ran into problems removing OSD's from that node 1 by 1, eventually
> the last few OSD's are overloaded with data.
>
> Setting the crush weight of all these OSD's to 0 at once seems a bit
> rigorous
> Is there also a gentle (balanced) way to slowly move data off that node?
>
> Something like:
>
> ceph osd crush reweight  0.8(all at once?)
> then to 60%, 40%, 20% and eventually 0
>
> Kind regards,
> Caspar
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor Recovery

2018-10-24 Thread John Petrini

Thanks Wido. That seems to have worked. I just had to pass the keyring
and monmap when calling mkfs. I saved the keyring from the monitors
data directory and used that, then I obtained the monmap using ceph
mon getmap -o /var/tmp/monmap.

After starting the monitor it synchronized and recreated the leveldb.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Monitor Recovery

2018-10-23 Thread John Petrini

Hi List,

I've got a monitor that won't stay up. It comes up and joins the
cluster but crashes within a couple of minutes with no info in the
logs. At this point I'd prefer to just give up on it and assume it's
in a bad state and recover it from the working monitors. What's the
best way to go about this?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] osd reweight = pgs stuck unclean

2018-11-07 Thread John Petrini

Hello,

I've got a small development cluster that shows some strange behavior
that I'm trying to understand.

If I reduce the weight of an OSD using ceph osd reweight X 0.9 for
example Ceph will move data but recovery stalls and a few pg's remain
stuck unclean. If I reset them all back to 1 ceph goes healthy again.

This is running an older version 0.94.6.

Here's the OSD tree:

ID WEIGHT  TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 8.24982 root default
-2 2.74994 host node-10
11 0.54999 osd.11up  1.0  1.0
 3 0.54999 osd.3 up  1.0  1.0
12 0.54999 osd.12up  1.0  1.0
 0 0.54999 osd.0 up  1.0  1.0
 6 0.54999 osd.6 up  1.0  1.0
-3 2.74994 host node-11
 8 0.54999 osd.8 up  1.0  1.0
15 0.54999 osd.15up  1.0  1.0
 9 0.54999 osd.9 up  1.0  1.0
 2 0.54999 osd.2 up  1.0  1.0
13 0.54999 osd.13up  1.0  1.0
-4 2.74994 host node-3
 4 0.54999 osd.4 up  1.0  1.0
 5 0.54999 osd.5 up  1.0  1.0
 7 0.54999 osd.7 up  1.0  1.0
 1 0.54999 osd.1 up  1.0  1.0
10 0.54999 osd.10up  1.0  1.0
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] too few PGs per OSD

2018-10-01 Thread John Petrini

You need to set the pg number before setting the pgp number, it's a
two step process.

ceph osd pool set cephfs_data pg_num 64

Setting the pg number creates new placement groups by splitting
existing ones but keeps them on the local OSD. Setting the pgp number
allows ceph to move the new pg's to different OSD's and will trigger
re-balancing and data movement as a result.

OSD size has no affect on pg per OSD cout.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mount cephfs from a public network ip of mds

2018-10-01 Thread John Petrini

Multiple subnets are supported.

http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/#id1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Huge latency spikes

2018-11-20 Thread John Petrini

I would disable cache on the controller for your journals. Use write
through and no read ahead. Did you make sure the disk cache is disabled?

On Tuesday, November 20, 2018, Alex Litvak 
wrote:
> I went through raid controller firmware update.  I replaced a pair  of
SSDs with new ones.  Nothing have changed.  Per controller card utility it
shows that no patrol reading happens and battery backup is in a good
shape.  Cache policy is WriteBack.  I am aware on the bad battery effect
but it doesn't seem to be the case unless controller is lying to me.
>
>
> On 11/19/2018 2:39 PM, Brendan Moloney wrote:
>>
>> Hi,
>>
>>> Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery
back cache is on
>>>
>>> Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache
if Bad BBU
>>> Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache
if Bad BBU
>>>
>>> I have  2 other nodes with older Perc H710 and similar SSDs with
slightly higher wear (6.3% vs 5.18%) but from observation they hardly hit
1.5 ms on rear occasion
>>> Cache, RAID, and battery situation is the same.
>>
>> I would take a closer look at the RAID card.  Are you sure the BBU is
ok?  In the past I noticed the Megaraid cards would do periodic battery
tests that would completely drain the battery and thus disable the write
cache until they reached some threshold of charge again.  They also can do
periodic "patrol reads" and "consistency checks" that can hurt performance.
Or the card could just be failing, I have almost gone through more RAID
cards than HDDs. The unreliability and black box nature of hardware RAID
cards is one of the things that first got me looking into Ceph (although
even mdadm is a big improvement in my opinion).
>>
>> For journals you are better off putting half your OSDs on one SSD and
half on the other instead of RAID1.
>>
>> -Brendan
>>
>
>
> _______
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 

John Petrini
Platforms Engineer

[image: Call CoreDial] 215.297.4400 x 232 <215-297-4400>
[image: Call CoreDial] www.coredial.com <https://coredial.com/>
[image: CoreDial] 751 Arbor Way, Hillcrest I, Suite 150 Blue Bell, PA 19422
<https://www.google.com/maps/place/CoreDial,+LLC/@40.140902,-75.2878857,17z/data=!3m1!4b1!4m5!3m4!1s0x89c6bc587f1cfd47:0x4c79d505f2ee580b!8m2!3d40.140902!4d-75.285697>
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread John Petrini

The iostat isn't very helpful because there are not many writes. I'd
recommend disabling cstates entirely, not sure it's your problem but it's
good practice and if your cluster goes as idle as your iostat suggests it
could be the culprit.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread John Petrini

I'd take a look at cstates if it's only happening during periods of
low activity. If your journals are on SSD you should also check their
health. They may have exceeded their write endurance - high apply
latency is a tell tale sign of this and you'd see high iowait on those
disks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread John Petrini

You can check if cstates are enabled with cat /proc/acpi/processor/info.
Look for power management: yes/no.

If they are enabled then you can check the current cstate of each core. 0
is the CPU's normal operating range, any other state means the processor is
in a power saving mode. cat /proc/acpi/processor/CPU?/power.

cstates are configured in the bios so a reboot is required to change them.
I know with Dell servers you can trigger the change with omconfig and then
issue a reboot for it to take effect. Otherwise you'll need to disable it
directly in the bios.

As for the SSD's I would just run iostat and check the iowait. If you see
small disk writes causing high iowait then your SSD's are probably at the
end of their life. Ceph journaling is good at destroying SSD's.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-13 Thread John Petrini

Okay that makes more sense, I didn't realize the WAL functioned in a
similar manner to filestore journals (though now that I've had another read
of Sage's blog post, New in Luminous: BlueStore, I notice he does cover
this). Is this to say that writes are acknowledged as soon as they hit the
WAL?

Also this raises another question regarding sizing. The Ceph documentation
suggests allocating as much available space as possible to blocks.db but
what about WAL? We'll have 120GB per OSD available on each SSD. Any
suggestion on how we might divvy that between the WAL and DB?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph osd journal disk in RAID#1?

2019-02-14 Thread John Petrini

You can but it's usually not recommended. When you replace a failed
disk the RAID rebuild is going to drag down the performance of the
remaining disk and subsequently all OSD's that are backed by it. This
can hamper the performance of the entire cluster. You could probably
tune rebuild priority in the RAID controller to limit the impact but
this will come at the expense of longer rebuild times which might not
be ideal.

Ideally losing a journal disk should not be a cause for concern. As
long as you don't have too many OSD's per journal your cluster should
keep humming along just fine until you rebuild those OSD's with a
replacement journal.

Cost and available disk slots are also worth considering since you'll
burn a lot more by going RAID-1, which again really isn't necessary.
This may be the most convincing reason not to bother.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-13 Thread John Petrini

Anyone have any insight to offer here? Also I'm now curious to hear
about experiences with 512e vs 4kn drives.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Bluestore HDD Cluster Advice

2019-02-01 Thread John Petrini

Hello,

We'll soon be building out four new luminous clusters with Bluestore.
Our current clusters are running filestore so we're not very familiar
with Bluestore yet and I'd like to have an idea of what to expect.

Here are the OSD hardware specs (5x per cluster):
2x 3.0GHz 18c/36t
22x 1.8TB 10K SAS (RAID1 OS + 20 OSD's)
5x 480GB Intel S4610 SSD's (WAL and DB)
192 GB RAM
4X Mellanox 25GB NIC
PERC H730p

With filestore we've found that we can achieve sub-millisecond write
latency by running very fast journals (currently Intel S4610's). My
main concern is that Bluestore doesn't use journals and instead writes
directly to the higher latency HDD; in theory resulting in slower acks
and higher write latency. How does Bluestore handle this? Can we
expect similar or better performance then our current filestore
clusters?

I've heard it repeated that Bluestore performs better than Filestore
but I've also heard some people claiming this is not always the case
with HDD's. Is there any truth to that and if so is there a
configuration we can use to achieve this same type of performance with
Bluestore?

Thanks all.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-02 Thread John Petrini

Hi Martin,

Hardware has already been aquired and was spec'd to mostly match our
current clusters which perform very well for us. I'm really just hoping to
hear from anyone who may have experience moving from filestore => bluestore
with and HDD cluster. Obviously we'll be doing testing but it's always
helpful to hear firsthand experience.

That said there is reasoning behind our choices.

CPU: Buys us some additional horsepower for collocating RGW. We run 12c
currently and they stay very busy. Since we're adding an additional
workload it seemed warranted.
Memory: The Intel Procs in the R740's are 6 channel instead of 4 so the
bump to 192GB was the result of that change. We run 128 today.
NIC's: A few reasons

   - Microbursts, our workload seems to generate them pretty regularly and
   we've had a tough time taming them using buffers, 25G should eliminate that
   even though we won't ever use the sustained bandwidth.
   - Port waste: We're running large compute nodes so the choice came down
   to 4x10G or 2x25G per compute. The switches are more expensive (though not
   terribly) but we get the benefit of using fewer ports.
   - Features: The 25G switches support some features we were looking for
   such as EVPN, VxLAN etc.

Disk: 512e. I'm interested to hear about the performance difference here.
Does Ceph not recognize the physical sector size as being 4k?

Thanks,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Expected IO in luminous Ceph Cluster

2019-06-07 Thread John Petrini

iops? And is the advantage more or less when your sata
hdd's are
> > >slower?
> > >
> > >
> > >-Original Message-
> > >From: Stolte, Felix [mailto:f.sto...@fz-juelich.de]
> > >Sent: donderdag 6 juni 2019 10:47
> > >To: ceph-users
> > >Subject: [ceph-users] Expected IO in luminous Ceph Cluster
> > >
> > >Hello folks,
> > >
> > >we are running a ceph cluster on Luminous consisting of 21
OSD Nodes
> > >with 9 8TB SATA drives and 3 Intel 3700 SSDs for Bluestore
WAL and DB
> > >
> > >(1:3 Ratio). OSDs have 10Gb for Public and Cluster
Network. The
> > > cluster
> > >is running stable for over a year. We didn’t had a closer
look on IO
> > >until one of our customers started to complain about a VM
we migrated
> > >
> > >from VMware with Netapp Storage to our Openstack Cloud
with ceph
> > >storage. He sent us a sysbench report from the machine,
which I could
> > >
> > >reproduce on other VMs as well as on a mounted RBD on
physical
> > > hardware:
> > >
> > >sysbench --file-fsync-freq=1 --threads=16 fileio
--file-total-size=1G
> > >
> > >--file-test-mode=rndrw --file-rw-ratio=2 run sysbench
1.0.11 (using
> > >system LuaJIT 2.1.0-beta3)
> > >
> > >Running the test with following options:
> > >Number of threads: 16
> > >Initializing random number generator from current time
> > >
> > >Extra file open flags: 0
> > >128 files, 8MiB each
> > >1GiB total file size
> > >Block size 16KiB
> > >Number of IO requests: 0
> > >Read/Write ratio for combined random IO test: 2.00
Periodic FSYNC
> > >enabled, calling fsync() each 1 requests.
> > >Calling fsync() at the end of test, Enabled.
> > >Using synchronous I/O mode
> > >Doing random r/w test
> > >
> > >File operations:
> > >reads/s:  36.36
> > >writes/s: 18.18
> > >fsyncs/s: 2318.59
> > >
> > >Throughput:
> > >read, MiB/s:  0.57
> > >written, MiB/s:   0.28
> > >
> > >General statistics:
> > >total time:  10.0071s
> > >total number of events:  23755
> > >
> > >Latency (ms):
> > > min:  0.01
> > > avg:  6.74
> > > max:   1112.58
> > > 95th percentile: 26.68
> > > sum: 160022.67
> > >
> > >Threads fairness:
> > >events (avg/stddev):   1484.6875/52.59
> > >execution time (avg/stddev):   10.0014/0.00
> > >
> > >Are these numbers reasonable for a cluster of our size?
> > >
> > >Best regards
> > >Felix
> > >IT-Services
> > >Telefon 02461 61-9243
> > >E-Mail: f.sto...@fz-juelich.de
> > >
> > >
 
> > >-
> > >
> > >
 
> > >-
> > >Forschungszentrum Juelich GmbH
> > >52425 Juelich
> > >Sitz der Gesellschaft: Juelich
> > >Eingetragen im Handelsregister des Amtsgerichts Dueren Nr.
HR B 3498
> > >Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen
Huthmacher
> > >Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt
(Vorsitzender),
> > >
> >

Re: [ceph-users] Expected IO in luminous Ceph Cluster

2019-06-11 Thread John Petrini

I certainly would, particularly on your SSD's. I'm not familiar with
those Toshibas but disabling disk cache has improved performance on my
clusters and others on this list.

Does the LSI controller you're using provide read/write cache and do
you have it enabled? 72k spinners are likely to see a huge performance
gain from controller cache, especially in regards to latency. Only
enable caching if the controller has a battery and make sure to enable
force write-through in the event that the battery fails. If your
controller doesn't have cache you may want to seriously consider
upgrading to controllers that do otherwise those 72k disks are going
to be a major limiting factor in terms of performance.

Regarding your db partition, the latest advice seems to be that your
db should be 2x the biggest layer (at least 60GB) to avoid spillover
to the OSD during compaction. See:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg54628.html.
With 72k disks you'll want to avoid small writes hitting them directly
if possible, especially if you have no controller cache.

It would be useful to see iowait on your cluster. iostat -x 2 and let
it run for a few cycles while the cluster is busy. If there's high
iowait on your SSD's disabling disk cache may show an improvement. If
there's high iowait on the HDD's, controller cache and/or increasing
your db size may help.

John Petrini
Platforms Engineer
215.297.4400 x 232
www.coredial.com
751 Arbor Way, Hillcrest I, Suite 150 Blue Bell, PA 19422
The information transmitted is intended only for the person or entity
to which it is addressed and may contain confidential and/or
privileged material. Any review, retransmission, dissemination or
other use of, or taking of any action in reliance upon, this
information by persons or entities other than the intended recipient
is prohibited. If you received this in error, please contact the
sender and delete the material from any computer.

On Tue, Jun 11, 2019 at 3:35 AM Stolte, Felix  wrote:
>
> Hi John,
>
> I have 9 HDDs and 3 SSDs behind a SAS3008 PCI-Express Fusion-MPT SAS-3 from 
> LSI. HDDs are HGST HUH721008AL (8TB, 7200k rpm), SSDs are Toshiba PX05SMB040 
> (400GB). OSDs are bluestore format, 3 HDDs have their wal and db on one SSD 
> (DB Size 50GB, wal 10 GB). I did not change any cache settings.
>
> I disabled cstates which improved performance slightly. Do you suggest to 
> turn off caching on disks?
>
> Regards
> Felix
>
> -
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> -
> ---------
>
>
> Von: John Petrini 
> Datum: Freitag, 7. Juni 2019 um 15:49
> An: "Stolte, Felix" 
> Cc: Sinan Polat , ceph-users 
> Betreff: Re: [ceph-users] Expected IO in luminous Ceph Cluster
>
> How's iowait look on your disks?
>
> How have you configured your disks and what are your cache settings?
>
> Did you disable cstates?
>
> On Friday, June 7, 2019, Stolte, Felix <mailto:f.sto...@fz-juelich.de> wrote:
> > Hi Sinan,
> >
> > thanks for the numbers. I am a little bit surprised that your SSD pool has 
> > nearly the same stats as you SAS pool.
> >
> > Nevertheless I would expect our pools to perform like your SAS pool, at 
> > least regarding to writes since all our write ops should be placed on our 
> > SSDs. But since I only achieve 10% of your numbers I need to figure out my 
> > bottle neck. For now I have no clue. According to our monitoring system 
> > network bandwith, ram or cpu usage is even close to be saturated.
> >
> > Could someone advice me on where to look?
> >
> > Regards Felix
> > -
> > -
> > Forschungszentrum Juelich GmbH
> > 52425 Juelich
> > Sitz der Gesellschaft: Juelich
> > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> > Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> > Prof. Dr. Sebastian M. Schmid

Re: [ceph-users] Major ceph disaster

2019-05-22 Thread John Petrini

It's been suggested here in the past to disable deep scrubbing temporarily
before running the repair because it does not execute immediately but gets
queued up behind deep scrubs.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] clock skew

2019-04-25 Thread John Petrini

+1 to Janne's suggestion. Also, how many time sources are you using? More
tend to be better and by default chrony has a pretty low limit on the
number of sources if you're using a pool (3 or 4 i think?). You can adjust
it by adding maxsources to the pool line.

pool pool.ntp.org iburst maxsources 8
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread John Petrini

Try ceph-disk list
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New best practices for osds???

2019-07-17 Thread John Petrini

Dell has a whitepaper that compares Ceph performance using JBOD and RAID-0
per disk that recommends RAID-0 for HDD's:
en.community.dell.com/techcenter/cloud/m/dell_cloud_resources/20442913/download

After switching from JBOD to RAID-0 we saw a huge reduction in latency, the
difference was much more significant than that whitepaper shows. RAID-0
allows us to leverage the controller cache which has major performance
improvements when used with HDD's. We also disable the disk cache on our
HDD's and SSD's as we had inconsistent performance with disk cache enabled.

As always I'd suggest testing various configurations with your own hardware
but I wouldn't shy away from RAID-0 simply because of "best practice".
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-13 Thread John Petrini

Do those SSD's have capacitors (aka power loss protection)? I took a
look at the spec sheet on samsung's site and I don't see it mentioned.
If that's the case it could certainly explain the performance you're
seeing. Not all enterprise SSD's have it and it's a must have for Ceph
since it syncs every write directly to disk.

You may also want to look for something with a higher DWPD so you can
get more life out of them.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

55 matches

Mail list logo