Re: [ceph-users] Ceph OSDs advice

2017-02-15 Thread John Petrini
- it's just Linux taking advantage of memory under the theory why not use it if it's there? Memory is fast so Linux will take advantage of it. With 72 OSD's 22G of memory puts you below the 500MB/daemon that you've mentioned so I don't think you have anything to be concerned about. ___ John Petrini

Re: [ceph-users] [Ceph-community] Consultation about ceph storage cluster architecture

2017-01-20 Thread John Petrini
Here's a really good write up on how to cluster NFS servers backed by RBD volumes. It could be adapted to use CephFS with relative ease. https://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/ ___ John Petrini NOC Systems Administrator // *CoreDial, LLC* // coredial.com // [image

Re: [ceph-users] Ceph and container

2016-11-15 Thread John Petrini
want to make sure you don't have a single point of failure. There's not much point it having three monitors if they are all going to run in containers/VM's on the same host. ___ John Petrini NOC Systems Administrator // *CoreDial, LLC* // coredial.com // [image: Twitter] <ht

Re: [ceph-users] Ceph and container

2016-11-15 Thread John Petrini
is running on SSD. I should also mention that this is a small cluster only used for non-critical backups which is why we're comfortable with it. ___ John Petrini NOC Systems Administrator // *CoreDial, LLC* // coredial.com // [image: Twitter] <https://twitter.com/coredial> [image: Li

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread John Petrini
What command are you using to start your OSD's? ___ John Petrini NOC Systems Administrator // *CoreDial, LLC* // coredial.com // [image: Twitter] <https://twitter.com/coredial> [image: LinkedIn] <http://www.linkedin.com/company/99631> [image: Google Plus] <https://

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread John Petrini
Also, don't run sgdisk again; that's just for creating the journal partitions. ceph-disk is a service used for prepping disks, only the OSD services need to be running as far as I know. Are the ceph-osd@x. services running now that you've mounted the disks? ___ John Petrini NOC Systems

Re: [ceph-users] OSDs down after reboot

2016-12-09 Thread John Petrini
Try using systemctl start ceph-osd* I usually refer to this documentation for ceph + systemd https://www.suse.com/documentation/ses-1/book_storage_admin/data/ceph_operating_services.html ___ John Petrini NOC Systems Administrator // *CoreDial, LLC* // coredial.com // [image: Twitter

Re: [ceph-users] Calamari or Alternative

2017-01-12 Thread John Petrini
I used Calamari before making the move to Ubuntu 16.04 and upgrading to Jewel. At the time I tried to install it on 16.04 but couldn't get it working. I'm now using ceph-dash along with the nagios plugin check_ceph_dash

Re: [ceph-users] Adding second interface to storage network - issue

2016-11-30 Thread John Petrini
as well. ___ John Petrini NOC Systems Administrator // *CoreDial, LLC* // coredial.com // [image: Twitter] <https://twitter.com/coredial> [image: LinkedIn] <http://www.linkedin.com/company/99631> [image: Google Plus] <https://plus.google.com/104062177220750809525/posts&

Re: [ceph-users] Adding second interface to storage network - issue

2016-11-30 Thread John Petrini
suggest getting LACP configured on all of the nodes before upping the MTU and even then make sure you understand the requirement of a larger MTU size before introducing it on your network. ___ John Petrini NOC Systems Administrator // *CoreDial, LLC* // coredial.com // [image: Twitter

Re: [ceph-users] osd down detection broken in jewel?

2016-11-30 Thread John Petrini
It's right there in your config. mon osd report timeout = 900 See: http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/ ___ John Petrini NOC Systems Administrator // *CoreDial, LLC* // coredial.com // [image: Twitter] <https://twitter.com/coredial>

Re: [ceph-users] ceph - even filling disks

2016-12-01 Thread John Petrini
sizes on a node. ___ John Petrini NOC Systems Administrator // *CoreDial, LLC* // coredial.com // [image: Twitter] <https://twitter.com/coredial> [image: LinkedIn] <http://www.linkedin.com/company/99631> [image: Google Plus] <https://plus.google.com/104062177220

Re: [ceph-users] Estimate Max IOPS of Cluster

2017-01-04 Thread John Petrini
Thank you both for the tools an suggestions. I expected the response "there are many variables" but this gives me a place to start in determining what our configuration is capable of. ___ John Petrini NOC Systems Administrator // *CoreDial, LLC* // coredial.com // [imag

[ceph-users] Estimate Max IOPS of Cluster

2017-01-03 Thread John Petrini
Hello, Does any one have a reasonably accurate way to determine the max IOPS of a Ceph cluster? Thank You, ___ John Petrini ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] clock skew

2017-04-01 Thread John Petrini
Just ntp. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] clock skew

2017-04-01 Thread John Petrini
. Is this a dangerous practice? ___ John Petrini NOC Systems Administrator // *CoreDial, LLC* // coredial.com // [image: Twitter] <https://twitter.com/coredial> [image: LinkedIn] <http://www.linkedin.com/company/99631> [image: Google Plus] <https://plus.google.com/104062

[ceph-users] High iowait on OSD node

2017-07-27 Thread John Petrini
Hello list, Just curious if anyone has ever seen this behavior and might have some ideas on how to troubleshoot it. We're seeing very high iowait in iostat across all OSD's in on a single OSD host. It's very spiky - dropping to zero and then shooting up to as high as 400 in some cases. Despite

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-12 Thread John Petrini
= true ___ John Petrini ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS | flapping OSD locked up NFS

2017-06-19 Thread John Petrini
testing. I'd be interested to hear the results if you do. ___ John Petrini ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High Load and High Apply Latency

2017-12-14 Thread John Petrini
Anyone have any ideas on this? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High Load and High Apply Latency

2017-12-18 Thread John Petrini
e, I would look at the hardware on the server. > > On Thu, Dec 14, 2017 at 10:15 AM John Petrini <jpetr...@coredial.com> > wrote: > >> Anyone have any ideas on this? >> ___ >> ceph-users mailing list >> ceph-users@lists.

Re: [ceph-users] High Load and High Apply Latency

2017-12-18 Thread John Petrini
or not though. I have another cluster running the same version of ceph that has the same symptom but the osds in our jewel cluster always show activity. John Petrini Platforms Engineer [image: Call CoreDial] 215.297.4400 x 232 <215-297-4400> [image: Call CoreDial] www.coredial.com <https://cor

[ceph-users] High Load and High Apply Latency

2017-12-11 Thread John Petrini
Hi List, I've got a 5 OSD node cluster running hammer. All of the OSD servers are identical but one has about 3-4x higher load than the others and the OSD's in this node are reporting high apply latency. The cause of the load appears to be the OSD processes. About half of the OSD processes are

Re: [ceph-users] Ceph on Public IP

2018-01-07 Thread John Petrini
I think what you're looking for is the public bind addr option. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph on Public IP

2018-01-08 Thread John Petrini
ceph will always bind to the local IP. It can't bind to an IP that isn't assigned directly to the server such as a NAT'd IP. So your public network should be the local network that's configured on each server. If you cluster network is 10.128.0.0/16 for instance your public network might be

Re: [ceph-users] osd crush reweight 0 on "out" OSD causes backfilling?

2018-02-13 Thread John Petrini
The rule of thumb is to reweight to 0 prior to marking out. This should avoid causing data movement twice as you're experiencing. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High Load and High Apply Latency

2018-02-16 Thread John Petrini
end non-raid mode for SATA but I have not tested this. You can view the whtiepaper here: http://en.community.dell.com/techcenter/cloud/m/dell_cloud_resources/20442913/download I hope this helps someone. John Petrini ___ ceph-users mailing list ceph-us

Re: [ceph-users] Automated Failover of CephFS Clients

2018-02-20 Thread John Petrini
I just wanted to add that even if you only provide one monitor IP the client will learn about the other monitors on mount so failover will still work. This only presents a problem when you try to remount or reboot a client while the monitor it's using is unavailable.

Re: [ceph-users] High Load and High Apply Latency

2017-12-20 Thread John Petrini
Hello, Looking at perf top it looks as though Ceph is spending most of it's CPU cycles on tcmalloc. Looking around online i found that this is a known issue and in fact I found this guide on how to increase the tcmalloc thread cache size:

[ceph-users] Least impact when adding PG's

2018-08-06 Thread John Petrini
Hello List, We're planning to add a couple new OSD nodes to one of our clusters but we've reached the point where we need to increase PG's before doing so. Our ratio is currently 52pg's per OSD. Based on the PG calc we need to make the following increases: compute - 1024 => 4096 images 512 =>

Re: [ceph-users] Least impact when adding PG's

2018-08-07 Thread John Petrini
Hi All, Any advice? Thanks, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk write cache - safe?

2018-03-15 Thread John Petrini
.html. The r730 controller has a battery so I don't think there's a reason to be concerned about moving to raid-0 w/cache. John Petrini Platforms Engineer [image: Call CoreDial] 215.297.4400 x 232 <215-297-4400> [image: Call CoreDial] www.coredial.com <https://coredial.com/> [image: C

Re: [ceph-users] New Ceph cluster design

2018-03-09 Thread John Petrini
What you linked was only a 2 week test. When Ceph is healthy it does not need a lot of RAM, it's during recovery that OOM appears and that's when you'll find yourself upgrading the RAM on your nodes just to stop OOM and allow the cluster to recover. Look through the mailing list and you'll see

Re: [ceph-users] Best way to remove an OSD node

2018-04-16 Thread John Petrini
There's a gentle reweight python script floating around on the net that does this. It gradually reduces the weight of each osd one by one waiting for rebalance to complete each time. I've never used it and it may not work on all versions so I'd make sure to test it. That or do it manually but

Re: [ceph-users] Monitor Recovery

2018-10-24 Thread John Petrini
Thanks Wido. That seems to have worked. I just had to pass the keyring and monmap when calling mkfs. I saved the keyring from the monitors data directory and used that, then I obtained the monmap using ceph mon getmap -o /var/tmp/monmap. After starting the monitor it synchronized and recreated

[ceph-users] Monitor Recovery

2018-10-23 Thread John Petrini
Hi List, I've got a monitor that won't stay up. It comes up and joins the cluster but crashes within a couple of minutes with no info in the logs. At this point I'd prefer to just give up on it and assume it's in a bad state and recover it from the working monitors. What's the best way to go

[ceph-users] osd reweight = pgs stuck unclean

2018-11-07 Thread John Petrini
Hello, I've got a small development cluster that shows some strange behavior that I'm trying to understand. If I reduce the weight of an OSD using ceph osd reweight X 0.9 for example Ceph will move data but recovery stalls and a few pg's remain stuck unclean. If I reset them all back to 1 ceph

Re: [ceph-users] too few PGs per OSD

2018-10-01 Thread John Petrini
You need to set the pg number before setting the pgp number, it's a two step process. ceph osd pool set cephfs_data pg_num 64 Setting the pg number creates new placement groups by splitting existing ones but keeps them on the local OSD. Setting the pgp number allows ceph to move the new pg's to

Re: [ceph-users] mount cephfs from a public network ip of mds

2018-10-01 Thread John Petrini
Multiple subnets are supported. http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/#id1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Huge latency spikes

2018-11-20 Thread John Petrini
g into Ceph (although even mdadm is a big improvement in my opinion). >> >> For journals you are better off putting half your OSDs on one SSD and half on the other instead of RAID1. >> >> -Brendan >> > > > _______ &

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread John Petrini
The iostat isn't very helpful because there are not many writes. I'd recommend disabling cstates entirely, not sure it's your problem but it's good practice and if your cluster goes as idle as your iostat suggests it could be the culprit. ___ ceph-users

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread John Petrini
I'd take a look at cstates if it's only happening during periods of low activity. If your journals are on SSD you should also check their health. They may have exceeded their write endurance - high apply latency is a tell tale sign of this and you'd see high iowait on those disks.

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread John Petrini
You can check if cstates are enabled with cat /proc/acpi/processor/info. Look for power management: yes/no. If they are enabled then you can check the current cstate of each core. 0 is the CPU's normal operating range, any other state means the processor is in a power saving mode. cat

Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-13 Thread John Petrini
Okay that makes more sense, I didn't realize the WAL functioned in a similar manner to filestore journals (though now that I've had another read of Sage's blog post, New in Luminous: BlueStore, I notice he does cover this). Is this to say that writes are acknowledged as soon as they hit the WAL?

Re: [ceph-users] ceph osd journal disk in RAID#1?

2019-02-14 Thread John Petrini
You can but it's usually not recommended. When you replace a failed disk the RAID rebuild is going to drag down the performance of the remaining disk and subsequently all OSD's that are backed by it. This can hamper the performance of the entire cluster. You could probably tune rebuild priority in

Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-13 Thread John Petrini
Anyone have any insight to offer here? Also I'm now curious to hear about experiences with 512e vs 4kn drives. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Bluestore HDD Cluster Advice

2019-02-01 Thread John Petrini
Hello, We'll soon be building out four new luminous clusters with Bluestore. Our current clusters are running filestore so we're not very familiar with Bluestore yet and I'd like to have an idea of what to expect. Here are the OSD hardware specs (5x per cluster): 2x 3.0GHz 18c/36t 22x 1.8TB 10K

Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-02 Thread John Petrini
Hi Martin, Hardware has already been aquired and was spec'd to mostly match our current clusters which perform very well for us. I'm really just hoping to hear from anyone who may have experience moving from filestore => bluestore with and HDD cluster. Obviously we'll be doing testing but it's

Re: [ceph-users] Expected IO in luminous Ceph Cluster

2019-06-07 Thread John Petrini
Best regards > > >Felix > > >IT-Services > > >Telefon 02461 61-9243 > > >E-Mail: f.sto...@fz-juelich.de > > > > > > > > >- > > > > &g

Re: [ceph-users] Expected IO in luminous Ceph Cluster

2019-06-11 Thread John Petrini
is busy. If there's high iowait on your SSD's disabling disk cache may show an improvement. If there's high iowait on the HDD's, controller cache and/or increasing your db size may help. John Petrini Platforms Engineer 215.297.4400 x 232 www.coredial.com 751 Arbor Way, Hillcrest I, Suite 150 Blue Bell

Re: [ceph-users] Major ceph disaster

2019-05-22 Thread John Petrini
It's been suggested here in the past to disable deep scrubbing temporarily before running the repair because it does not execute immediately but gets queued up behind deep scrubs. ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] clock skew

2019-04-25 Thread John Petrini
+1 to Janne's suggestion. Also, how many time sources are you using? More tend to be better and by default chrony has a pretty low limit on the number of sources if you're using a pool (3 or 4 i think?). You can adjust it by adding maxsources to the pool line. pool pool.ntp.org iburst maxsources

Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread John Petrini
Try ceph-disk list ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New best practices for osds???

2019-07-17 Thread John Petrini
Dell has a whitepaper that compares Ceph performance using JBOD and RAID-0 per disk that recommends RAID-0 for HDD's: en.community.dell.com/techcenter/cloud/m/dell_cloud_resources/20442913/download After switching from JBOD to RAID-0 we saw a huge reduction in latency, the difference was much

Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-13 Thread John Petrini
Do those SSD's have capacitors (aka power loss protection)? I took a look at the spec sheet on samsung's site and I don't see it mentioned. If that's the case it could certainly explain the performance you're seeing. Not all enterprise SSD's have it and it's a must have for Ceph since it syncs