[ceph-users] Getting rid of prometheus messages in /var/log/messages

2019-10-21 Thread Vladimir Brik
Hello /var/log/messages on machines in our ceph cluster are inundated with entries from Prometheus scraping ("GET /metrics HTTP/1.1" 200 - "" "Prometheus/2.11.1") Is it possible to configure ceph to not send those to syslog? If not, can I configure something so that none of ceph-mgr

Re: [ceph-users] Unexpected increase in the memory usage of OSDs

2019-10-09 Thread Vladimir Brik
On 10/9/19 11:51 AM, Gregory Farnum wrote: On Mon, Oct 7, 2019 at 7:20 AM Vladimir Brik wrote: > Do you have statistics on the size of the OSDMaps or count of them > which were being maintained by the OSDs? No, I don't think so. How can I find this information? Hmm I don't know if we

Re: [ceph-users] Unexpected increase in the memory usage of OSDs

2019-10-07 Thread Vladimir Brik
having noout set would change that if all the nodes were alive, but that's my bet. -Greg On Thu, Oct 3, 2019 at 7:04 AM Vladimir Brik wrote: And, just as unexpectedly, things have returned to normal overnight https://icecube.wisc.edu/~vbrik/graph-1.png The change seems to have coi

Re: [ceph-users] Unexpected increase in the memory usage of OSDs

2019-10-03 Thread Vladimir Brik
though. Vlad On 10/2/19 3:43 PM, Vladimir Brik wrote: Hello I am running a Ceph 14.2.2 cluster and a few days ago, memory consumption of our OSDs started to unexpectedly grow on all 5 nodes, after being stable for about 6 months. Node memory consumption: https://icecube.wisc.edu/~vbrik

[ceph-users] Unexpected increase in the memory usage of OSDs

2019-10-02 Thread Vladimir Brik
Hello I am running a Ceph 14.2.2 cluster and a few days ago, memory consumption of our OSDs started to unexpectedly grow on all 5 nodes, after being stable for about 6 months. Node memory consumption: https://icecube.wisc.edu/~vbrik/graph.png Average OSD resident size:

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-26 Thread Vladimir Brik
or on its own. I’m going to check with others who’re more familiar with this code path. Begin forwarded message: *From:*Vladimir Brik <mailto:vladimir.b...@icecube.wisc.edu>> *Subject:**Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred* *Date:*August 21, 2019

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik
ta ] Warning: Processed 574207 events and lost 4 chunks! Check IO/CPU overload! [ perf record: Captured and wrote 58.866 MB perf.data (233750 samples) ] Vlad On 8/21/19 11:16 AM, J. Eric Ivancich wrote: On 8/21/19 10:22 AM, Mark Nelson wrote: Hi Vladimir, On 8/21/19 8:54 AM, Vladimir Brik

[ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space

2019-08-21 Thread Vladimir Brik
Hello After increasing number of PGs in a pool, ceph status is reporting "Degraded data redundancy (low space): 1 pg backfill_toofull", but I don't understand why, because all OSDs seem to have enough space. ceph health detail says: pg 40.155 is active+remapped+backfill_toofull, acting

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik
Correction: the number of threads stuck using 100% of a CPU core varies from 1 to 5 (it's not always 5) Vlad On 8/21/19 8:54 AM, Vladimir Brik wrote: Hello I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, radosgw process on those machines starts consuming 100% of 5

[ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik
Hello I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, radosgw process on those machines starts consuming 100% of 5 CPU cores for days at a time, even though the machine is not being used for data transfers (nothing in radosgw logs, couple of KB/s of network). This

[ceph-users] radosgw daemons constantly reading default.rgw.log pool

2019-05-03 Thread Vladimir Brik
Hello I have set up rados gateway using "ceph-deploy rgw create" (default pools, 3 machines acting as gateways) on Ceph 13.2.5. For over 2 weeks now, the three rados gateways have been generating constant ~30MB/s 4K ops/s of read i/o on default.rgw.log even though nothing is using the rados

[ceph-users] Restricting access to RadosGW/S3 buckets

2019-05-02 Thread Vladimir Brik
Hello I am trying to figure out a way to restrict access to S3 buckets. Is it possible to create a RadosGW user that can only access specific bucket(s)? Thanks, Vlad ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] Bluestore nvme DB/WAL size

2018-12-20 Thread Vladimir Brik
Hello I am considering using logical volumes of an NVMe drive as DB or WAL devices for OSDs on spinning disks. The documentation recommends against DB devices smaller than 4% of slow disk size. Our servers have 16x 10TB HDDs and a single 1.5TB NVMe, so dividing it equally will result in

[ceph-users] Scrub behavior

2018-12-20 Thread Vladimir Brik
Hello I am experimenting with how Ceph (13.2.2) deals with on-disk data corruption, and I've run into some unexpected behavior. I am wondering if somebody could comment on whether I understand things correctly. In my tests I would dd /dev/urandom onto an OSD's disk and see what would

Re: [ceph-users] What could cause mon_osd_full_ratio to be exceeded?

2018-11-26 Thread Vladimir Brik
> Why didn't it stop at mon_osd_full_ratio (90%) Should be 95% Vlad On 11/26/18 9:28 AM, Vladimir Brik wrote: Hello I am doing some Ceph testing on a near-full cluster, and I noticed that, after I brought down a node, some OSDs' utilization reached osd_failsafe_full_ratio (97%).

[ceph-users] What could cause mon_osd_full_ratio to be exceeded?

2018-11-26 Thread Vladimir Brik
Hello I am doing some Ceph testing on a near-full cluster, and I noticed that, after I brought down a node, some OSDs' utilization reached osd_failsafe_full_ratio (97%). Why didn't it stop at mon_osd_full_ratio (90%) if mon_osd_backfillfull_ratio is 90%? Thanks, Vlad

[ceph-users] How many PGs per OSD is too many?

2018-11-14 Thread Vladimir Brik
Hello I have a ceph 13.2.2 cluster comprised of 5 hosts, each with 16 HDDs and 4 SSDs. HDD OSDs have about 50 PGs each, while SSD OSDs have about 400 PGs each (a lot more pools use SSDs than HDDs). Servers are fairly powerful: 48 HT cores, 192GB of RAM, and 2x25Gbps Ethernet. The impression

[ceph-users] Erasure coding with more chunks than servers

2018-10-04 Thread Vladimir Brik
Hello I have a 5-server cluster and I am wondering if it's possible to create pool that uses k=5 m=2 erasure code. In my experiments, I ended up with pools whose pgs are stuck in creating+incomplete state even when I created the erasure code profile with --crush-failure-domain=osd. Assuming that

[ceph-users] NVMe SSD not assigned "nvme" device class

2018-10-01 Thread Vladimir Brik
Hello, It looks like Ceph (13.2.2) assigns device class "ssd" to our Samsung PM1725a NVMe SSDs instead of "nvme". Is that a bug or is the "nvme" class reserved for a different kind of device? Vlad ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Problems after increasing number of PGs in a pool

2018-10-01 Thread Vladimir Brik
t; > Check "ceph osd df tree" to see how many PGs per OSD you got. > > Try increasing these two options to "fix" it. > > mon max pg per osd > osd max pg per osd hard ratio > > > Paul > Am Fr., 28. Sep. 2018 um 18:05 Uhr schrieb Vladimir Brik &

[ceph-users] Problems after increasing number of PGs in a pool

2018-09-28 Thread Vladimir Brik
Hello I've attempted to increase the number of placement groups of the pools in our test cluster and now ceph status (below) is reporting problems. I am not sure what is going on or how to fix this. Troubleshooting scenarios in the docs don't seem to quite match what I am seeing. I have no idea

[ceph-users] CephFS on a mixture of SSDs and HDDs

2018-09-06 Thread Vladimir Brik
Hello I am setting up a new ceph cluster (probably Mimic) made up of servers that have a mixture of solid state and spinning disks. I'd like CephFS to store data of some of our applications only on SSDs, and store data of other applications only on HDDs. Is there a way of doing this without