Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

2019-07-18 Thread Ravi Patel
We’ve been debugging this a while.  The data pool was originally EC backed with the bucket indexes on HDD pools. Moving the metadata to SSD backed pools improved usability and consistency and the change from EC to replicated improved the rados layer iops by 4x, but didn't seem to affect rgw

Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

2019-07-18 Thread Paul Emmerich
On Thu, Jul 18, 2019 at 3:44 AM Robert LeBlanc wrote: > I'm pretty new to RGW, but I'm needing to get max performance as well. > Have you tried moving your RGW metadata pools to nvme? Carve out a bit of > NVMe space and then pin the pool to the SSD class in CRUSH, that way the > small metadata

Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

2019-07-18 Thread Burkhard Linke
Hi, On 7/18/19 8:57 AM, Ravi Patel wrote: We’ve been debugging this a while.  The data pool was originally EC backed with the bucket indexes on HDD pools. Moving the metadata to SSD backed pools improved usability and consistency and the change from EC to replicated improved the rados layer

Re: [ceph-users] Allocation recommendations for separate blocks.db and WAL

2019-07-18 Thread Lars Marowsky-Bree
On 2019-07-17T11:56:25, Robert LeBlanc wrote: > So, I see the recommendation for 4% of OSD space for blocks.db/WAL and the > corresponding discussion regrading the 3/30/300GB vs 6/60/600GB allocation. > > How does this change when WAL is seperate from blocks.db? > > Reading [0] it seems that

[ceph-users] OSD replacement causes slow requests

2019-07-18 Thread Eugen Block
Hi list, we're facing an unexpected recovery behavior of an upgraded cluster (Luminous -> Nautilus). We added new servers with Nautilus to the existing Luminous cluster, so we could first replace the MONs step by step. Then we moved the old servers to a new root in the crush map and then

Re: [ceph-users] pools limit

2019-07-18 Thread M Ranga Swami Reddy
Hi - I can start using the 64 PGs...as Iam having 10 nodes - with 18 OSDs per node.. On Tue, Jul 16, 2019 at 9:01 PM Janne Johansson wrote: > Den tis 16 juli 2019 kl 16:16 skrev M Ranga Swami Reddy < > swamire...@gmail.com>: > >> Hello - I have created 10 nodes ceph cluster with 14.x version.

[ceph-users] Ceph OSD daemon causes network card issues

2019-07-18 Thread Geoffrey Rhodes
Hi Cephers, I've been having an issue since upgrading my cluster to Mimic 6 months ago (previously installed with Luminous 12.2.1). All nodes that have the same PCIe network card seem to loose network connectivity randomly. (frequency ranges from a few days to weeks per host node) The affected

Re: [ceph-users] Ceph OSD daemon causes network card issues

2019-07-18 Thread Konstantin Shalygin
I've been having an issue since upgrading my cluster to Mimic 6 months ago (previously installed with Luminous 12.2.1). All nodes that have the same PCIe network card seem to loose network connectivity randomly. (frequency ranges from a few days to weeks per host node) The affected nodes only

Re: [ceph-users] Ceph OSD daemon causes network card issues

2019-07-18 Thread Geoffrey Rhodes
Sure, also attached. cephuser@cephnode6:~$ ethtool -S enp3s0f0 NIC statistics: rx_packets: 3103528 tx_packets: 20954382 rx_bytes: 1385006975 tx_bytes: 30063866207 rx_broadcast: 8 tx_broadcast: 2 rx_multicast: 14098 tx_multicast: 476 multicast: 14098

[ceph-users] Ceph Day London - October 24 (Call for Papers!)

2019-07-18 Thread Wido den Hollander
Hi, We will be having Ceph Day London October 24th! https://ceph.com/cephdays/ceph-day-london-2019/ The CFP is now open for you to get your Ceph related content in front of the Ceph community ranging from all levels of expertise:

Re: [ceph-users] Ceph OSD daemon causes network card issues

2019-07-18 Thread Paul Emmerich
Hi, Intel 82576 is bad. I've seen quite a few problems with these older igb familiy NICs, but losing the PCIe link is a new one. I usually see them getting stuck with a message like "tx queue X hung, resetting device..." Try to disable offloading features using ethtool, that sometimes helps

[ceph-users] reproducable rbd-nbd crashes

2019-07-18 Thread Marc Schöchlin
Hello cephers, rbd-nbd crashes in a reproducible way here. I created the following bug report: https://tracker.ceph.com/issues/40822 Do you also experience this problem? Do you have suggestions for in depth debug data collection? I invoke the following command on a freshly mapped rbd and

[ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread Pelletier, Robert
How do I find the physical disk in a Ceph luminous cluster in order to replace it. Osd.9 is down in my cluster which resides on ceph-osd1 host. If I run lsblk -io KNAME,TYPE,SIZE,MODEL,SERIAL I can get the serial numbers of all the physical disks for example sdbdisk 1.8T ST2000DM001-1CH1

Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread John Petrini
Try ceph-disk list ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-18 Thread Jason Dillaman
On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin wrote: > > Hello cephers, > > rbd-nbd crashes in a reproducible way here. I don't see a crash report in the log below. Is it really crashing or is it shutting down? If it is crashing and it's reproducable, can you install the debuginfo packages,

Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread Paul Emmerich
On Thu, Jul 18, 2019 at 8:10 PM John Petrini wrote: > Try ceph-disk list > no, this system is running ceph-volume not ceph-disk because the mountpoints are in tmpfs ceph-volume lvm list But it looks like the disk is just completely broken and disappeared from the system. -- Paul Emmerich

Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread Reed Dier
You can use ceph-volume to get the LV ID > # ceph-volume lvm list > > == osd.24 == > > [block] > /dev/ceph-edeb727e-c6d3-4347-bfbb-b9ce7f60514b/osd-block-1da5910e-136a-48a7-8cf1-1c265b7b612a > > type block > osd id24 >

Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread ☣Adam
The block device can be found in /var/lib/ceph/osd/ceph-$ID/block # ls -l /var/lib/ceph/osd/ceph-9/block In my case it links to /dev/sdbvg/sdb which makes is pretty obvious which drive this is, but the Volume Group and Logical volume could be named anything. To see what physical disk(s) make up

[ceph-users] Bluestore Runaway Memory

2019-07-18 Thread Brett Kelly
Hello, We have a Nautilus cluster exhibiting what looks like this bug: https://tracker.ceph.com/issues/39618 No matter what is set as the osd_memory_target (currently 2147483648 ), each OSD process will surpass this value and peak around ~4.0GB then eventually start using swap. Cluster

Re: [ceph-users] Bluestore Runaway Memory

2019-07-18 Thread Mark Nelson
Hi Brett, Can you enable debug_bluestore = 5 and debug_prioritycache = 5 on one of the OSDs that's showing the behavior?  You'll want to look in the logs for lines that look like this: 2019-07-18T19:34:42.587-0400 7f4048b8d700  5 prioritycache tune_memory target: 4294967296 mapped:

Re: [ceph-users] Changing the release cadence

2019-07-18 Thread Konstantin Shalygin
Arch Linux packager for Ceph here o/ I take this opportunity to consider the possibility of the appearance not in Ceph packaging, but Archlinux+Ceph related. Currently with Archlinux packaging impossible to build "Samba CTDB Cluster with CephFS backend". This caused by lack of build options,

Re: [ceph-users] Ceph OSD daemon causes network card issues

2019-07-18 Thread Konstantin Shalygin
On 7/18/19 7:43 PM, Geoffrey Rhodes wrote: Sure, also attached. Try to disable flow control via `ethtool -K rx off tx off`. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com