[ceph-users] Ceph recommendations for ALL SSD
Hi Any suggestions/recommendations on all SSD for Ceph? I see SSD freezes occasionally on SATA drives, thus creating spikes in latency at times. Recovers after a brief pause of 20-30 secs. Any best practices like colocated journals or not, schedulers, hdparms etc appreciated. Working on 1.3. Regards, Rama ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Multipath Support on Infernalis
Hi, It appears that Multipath support is available on 512 and not 4k sector size. This is on RHEL 7.1. Can someone please confirm? 4k Sector size == Nov 13 16:20:16 colusa5-ceph kernel: device-mapper: table: 253:60: len=5119745 not aligned to h/w logical block size 4096 of dm-16 [ceph-node][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle [ceph-node][WARNIN] INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/mapper/mpathba [ceph-node][WARNIN] device-mapper: resume ioctl on mpathba2 failed: Invalid argument [ceph-node][WARNIN] device-mapper: remove ioctl on mpathba2 failed: No such device or address [ceph-node][WARNIN] Traceback (most recent call last): [ceph-node][WARNIN] File "/usr/sbin/ceph-disk", line 3576, in [ceph-node][WARNIN] main(sys.argv[1:]) [ceph-node][WARNIN] File "/usr/sbin/ceph-disk", line 3530, in main [ceph-node][WARNIN] args.func(args) [ceph-node][WARNIN] File "/usr/sbin/ceph-disk", line 1863, in main_prepare [ceph-node][WARNIN] luks=luks [ceph-node][WARNIN] File "/usr/sbin/ceph-disk", line 1465, in prepare_journal [ceph-node][WARNIN] return prepare_journal_dev(data, journal, journal_size, journal_uuid, journal_dm_keypath, cryptsetup_parameters, luks) [ceph-node][WARNIN] File "/usr/sbin/ceph-disk", line 1419, in prepare_journal_dev [ceph-node][WARNIN] raise Error(e) [ceph-node][WARNIN] __main__.Error: Error: Command '['/usr/sbin/partprobe', '/dev/mapper/mpathba']' returned non-zero exit status 1 [ceph-node][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/mapper/mpathba [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs 512 bytes === [ceph-node1][WARNIN] DEBUG:ceph-disk:Creating xfs fs on /dev/dm-20 [ceph-node1][WARNIN] INFO:ceph-disk:Running command: /usr/sbin/mkfs -t xfs -f -i size=2048 -- /dev/dm-20 [ceph-node1][DEBUG ] meta-data=/dev/dm-20 isize=2048 agcount=4, agsize=242908597 blks [ceph-node1][DEBUG ] = sectsz=512 attr=2, projid32bit=1 [ceph-node1][DEBUG ] = crc=0finobt=0 [ceph-node1][DEBUG ] data = bsize=4096 blocks=971634385, imaxpct=5 [ceph-node1][DEBUG ] = sunit=0 swidth=0 blks [ceph-node1][DEBUG ] naming =version 2 bsize=4096 ascii-ci=0 ftype=0 [ceph-node1][DEBUG ] log =internal log bsize=4096 blocks=474430, version=2 [ceph-node1][DEBUG ] = sectsz=512 sunit=0 blks, lazy-count=1 [ceph-node1][DEBUG ] realtime =none extsz=4096 blocks=0, rtextents=0 [ceph-node1][WARNIN] DEBUG:ceph-disk:Mounting /dev/dm-20 on /var/lib/ceph/tmp/mnt._dvVgI with options inode64,noatime,logbsize=256k Regards, Rama ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Poor RBD performance as LIO iSCSI target
Hi Dave Did you say iscsi only? The tracker issue does not say though. I am on giant, with both client and ceph on RHEL 7 and seems to work ok, unless I am missing something here. RBD on baremetal with kmod-rbd and caching disabled. [root@compute4 ~]# time fio --name=writefile --size=100G --filesize=100G --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio writefile: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=200 fio-2.1.11 Starting 1 process Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/853.0MB/0KB /s] [0/853/0 iops] [eta 00m:00s] ... Disk stats (read/write): rbd0: ios=184/204800, merge=0/0, ticks=70/16164931, in_queue=16164942, util=99.98% real1m56.175s user0m18.115s sys 0m10.430s Regards, Rama -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of David Moreau Simard Sent: Tuesday, November 18, 2014 3:49 PM To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target Testing without the cache tiering is the next test I want to do when I have time.. When it's hanging, there is no activity at all on the cluster. Nothing in ceph -w, nothing in ceph osd pool stats. I'll provide an update when I have a chance to test without tiering. -- David Moreau Simard On Nov 18, 2014, at 3:28 PM, Nick Fisk n...@fisk.me.ukmailto:n...@fisk.me.uk wrote: Hi David, Have you tried on a normal replicated pool with no cache? I've seen a number of threads recently where caching is causing various things to block/hang. It would be interesting to see if this still happens without the caching layer, at least it would rule it out. Also is there any sign that as the test passes ~50GB that the cache might start flushing to the backing pool causing slow performance? I am planning a deployment very similar to yours so I am following this with great interest. I'm hoping to build a single node test cluster shortly, so I might be in a position to work with you on this issue and hopefully get it resolved. Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of David Moreau Simard Sent: 18 November 2014 19:58 To: Mike Christie Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com; Christopher Spearman Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target Thanks guys. I looked at http://tracker.ceph.com/issues/8818 and chatted with dis on #ceph-devel. I ran a LOT of tests on a LOT of comabination of kernels (sometimes with tunables legacy). I haven't found a magical combination in which the following test does not hang: fio --name=writefile --size=100G --filesize=100G --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio Either directly on a mapped rbd device, on a mounted filesystem (over rbd), exported through iSCSI.. nothing. I guess that rules out a potential issue with iSCSI overhead. Now, something I noticed out of pure luck is that I am unable to reproduce the issue if I drop the size of the test to 50GB. Tests will complete in under 2 minutes. 75GB will hang right at the end and take more than 10 minutes. TL;DR of tests: - 3x fio --name=writefile --size=50G --filesize=50G --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio -- 1m44s, 1m49s, 1m40s - 3x fio --name=writefile --size=75G --filesize=75G --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio -- 10m12s, 10m11s, 10m13s Details of tests here: http://pastebin.com/raw.php?i=3v9wMtYP Does that ring you guys a bell ? -- David Moreau Simard On Nov 13, 2014, at 3:31 PM, Mike Christie mchri...@redhat.commailto:mchri...@redhat.com wrote: On 11/13/2014 10:17 AM, David Moreau Simard wrote: Running into weird issues here as well in a test environment. I don't have a solution either but perhaps we can find some things in common.. Setup in a nutshell: - Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs with separate public/cluster network in 10 Gbps) - iSCSI Proxy node (targetcli/LIO): Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (10 Gbps) - Client node: Ubuntu 12.04, Kernel 3.11 (10 Gbps) Relevant cluster config: Writeback cache tiering with NVME PCI-E cards (2 replica) in front of a erasure coded pool (k=3,m=2) backed by spindles. I'm following the instructions here: http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-im a ges-san-storage-devices No issues with
Re: [ceph-users] osds fails to start with mismatch in id
Hi It appears that in case of pre-created partitions, ceph-deploy create, unable to change the partition guid’s. The parted guid remains as it is. Ran manually sgdisk on each partition as sgdisk --change-name=2:ceph data --partition-guid=2:${osd_uuid} --typecode=2:${ptype2} /dev/${i}. The typecode for journal and data picked up from ceph-disk-udev. Udev working fine now after reboot and not required to make any changes in fstab. All osd’s are up too. ceph -s cluster 9c6cd1ae-66bf-45ce-b7ba-0256b572a8b7 health HEALTH_OK osdmap e358: 60 osds: 60 up, 60 in pgmap v1258: 4096 pgs, 1 pools, 0 bytes data, 0 objects 2802 MB used, 217 TB / 217 TB avail 4096 active+clean Thanks to all who responded. Regards, Rama From: Daniel Schwager [mailto:daniel.schwa...@dtnet.de] Sent: Monday, November 10, 2014 10:39 PM To: 'Irek Fasikhov'; Ramakrishna Nishtala (rnishtal); 'Gregory Farnum' Cc: 'ceph-us...@ceph.com' Subject: RE: [ceph-users] osds fails to start with mismatch in id Hi Ramakrishna, we use the phy. path (containing the serial number) to a disk to prevent complexity and wrong mapping... This path will never change: /etc/ceph/ceph.conf [osd.16] devs = /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z0SDCY-part1 osd_journal = /dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1 ... regards Danny From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Irek Fasikhov Sent: Tuesday, November 11, 2014 6:36 AM To: Ramakrishna Nishtala (rnishtal); Gregory Farnum Cc: ceph-us...@ceph.commailto:ceph-us...@ceph.com Subject: Re: [ceph-users] osds fails to start with mismatch in id Hi, Ramakrishna. I think you understand what the problem is: [ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-56/whoami 56 [ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-57/whoami 57 Tue Nov 11 2014 at 6:01:40, Ramakrishna Nishtala (rnishtal) rnish...@cisco.commailto:rnish...@cisco.com: Hi Greg, Thanks for the pointer. I think you are right. The full story is like this. After installation, everything works fine until I reboot. I do observe udevadm getting triggered in logs, but the devices do not come up after reboot. Exact issue as http://tracker.ceph.com/issues/5194. But this has been fixed a while back per the case details. As a workaround, I copied the contents from /proc/mounts to fstab and that’s where I landed into the issue. After your suggestion, defined as UUID in fstab, but similar problem. blkid.tab now moved to tmpfs and also isn’t consistent ever after issuing blkid explicitly to get the UUID’s. Goes in line with ceph-disk comments. Decided to reinstall, dd the partitions, zapdisks etc. Did not help. Very weird that links below change in /dev/disk/by-uuid and /dev/disk/by-partuuid etc. Before reboot lrwxrwxrwx 1 root root 10 Nov 10 06:31 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 - ../../sdd2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 89594989-90cb-4144-ac99-0ffd6a04146e - ../../sde2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 c17fe791-5525-4b09-92c4-f90eaaf80dc6 - ../../sda2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 c57541a1-6820-44a8-943f-94d68b4b03d4 - ../../sdc2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 da7030dd-712e-45e4-8d89-6e795d9f8011 - ../../sdb2 After reboot lrwxrwxrwx 1 root root 10 Nov 10 09:50 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 - ../../sdd2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 89594989-90cb-4144-ac99-0ffd6a04146e - ../../sde2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 c17fe791-5525-4b09-92c4-f90eaaf80dc6 - ../../sda2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 c57541a1-6820-44a8-943f-94d68b4b03d4 - ../../sdb2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 da7030dd-712e-45e4-8d89-6e795d9f8011 - ../../sdh2 Essentially, the transformation here is sdb2-sdh2 and sdc2- sdb2. In fact I haven’t partitioned my sdh at all before the test. The only difference probably from the standard procedure is I have pre-created the partitions for the journal and data, with parted. /lib/udev/rules.d osd rules has four different partition GUID codes, 45b0969e-9b03-4f30-b4c6-5ec00ceff106, 45b0969e-9b03-4f30-b4c6-b4b80ceff106, 4fbd7e29-9d25-41b8-afd0-062c0ceff05d, 4fbd7e29-9d25-41b8-afd0-5ec00ceff05d, But all my partitions journal/data are having ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 as partition guid code. Appreciate any help. Regards, Rama = -Original Message- From: Gregory Farnum [mailto:g...@gregs42.commailto:g...@gregs42.com] Sent: Sunday, November 09, 2014 3:36 PM To: Ramakrishna Nishtala (rnishtal) Cc: ceph-us...@ceph.commailto:ceph-us...@ceph.com Subject: Re: [ceph-users] osds fails to start with mismatch in id On Sun, Nov 9, 2014 at 3:21 PM, Ramakrishna Nishtala (rnishtal) rnish...@cisco.commailto:rnish...@cisco.com wrote: Hi I am on ceph 0.87, RHEL 7 Out of 60 few osd’s start and the rest complain about mismatch about id’s
Re: [ceph-users] osds fails to start with mismatch in id
Hi Greg, Thanks for the pointer. I think you are right. The full story is like this. After installation, everything works fine until I reboot. I do observe udevadm getting triggered in logs, but the devices do not come up after reboot. Exact issue as http://tracker.ceph.com/issues/5194. But this has been fixed a while back per the case details. As a workaround, I copied the contents from /proc/mounts to fstab and that’s where I landed into the issue. After your suggestion, defined as UUID in fstab, but similar problem. blkid.tab now moved to tmpfs and also isn’t consistent ever after issuing blkid explicitly to get the UUID’s. Goes in line with ceph-disk comments. Decided to reinstall, dd the partitions, zapdisks etc. Did not help. Very weird that links below change in /dev/disk/by-uuid and /dev/disk/by-partuuid etc. Before reboot lrwxrwxrwx 1 root root 10 Nov 10 06:31 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 - ../../sdd2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 89594989-90cb-4144-ac99-0ffd6a04146e - ../../sde2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 c17fe791-5525-4b09-92c4-f90eaaf80dc6 - ../../sda2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 c57541a1-6820-44a8-943f-94d68b4b03d4 - ../../sdc2 lrwxrwxrwx 1 root root 10 Nov 10 06:31 da7030dd-712e-45e4-8d89-6e795d9f8011 - ../../sdb2 After reboot lrwxrwxrwx 1 root root 10 Nov 10 09:50 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 - ../../sdd2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 89594989-90cb-4144-ac99-0ffd6a04146e - ../../sde2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 c17fe791-5525-4b09-92c4-f90eaaf80dc6 - ../../sda2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 c57541a1-6820-44a8-943f-94d68b4b03d4 - ../../sdb2 lrwxrwxrwx 1 root root 10 Nov 10 09:50 da7030dd-712e-45e4-8d89-6e795d9f8011 - ../../sdh2 Essentially, the transformation here is sdb2-sdh2 and sdc2- sdb2. In fact I haven’t partitioned my sdh at all before the test. The only difference probably from the standard procedure is I have pre-created the partitions for the journal and data, with parted. /lib/udev/rules.d osd rules has four different partition GUID codes, 45b0969e-9b03-4f30-b4c6-5ec00ceff106, 45b0969e-9b03-4f30-b4c6-b4b80ceff106, 4fbd7e29-9d25-41b8-afd0-062c0ceff05d, 4fbd7e29-9d25-41b8-afd0-5ec00ceff05d, But all my partitions journal/data are having ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 as partition guid code. Appreciate any help. Regards, Rama = -Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: Sunday, November 09, 2014 3:36 PM To: Ramakrishna Nishtala (rnishtal) Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] osds fails to start with mismatch in id On Sun, Nov 9, 2014 at 3:21 PM, Ramakrishna Nishtala (rnishtal) rnish...@cisco.commailto:rnish...@cisco.com wrote: Hi I am on ceph 0.87, RHEL 7 Out of 60 few osd’s start and the rest complain about mismatch about id’s as below. 2014-11-09 07:09:55.501177 7f4633e01880 -1 OSD id 56 != my id 53 2014-11-09 07:09:55.810048 7f636edf4880 -1 OSD id 57 != my id 54 2014-11-09 07:09:56.122957 7f459a766880 -1 OSD id 58 != my id 55 2014-11-09 07:09:56.429771 7f87f8e0c880 -1 OSD id 0 != my id 56 2014-11-09 07:09:56.741329 7fadd9b91880 -1 OSD id 2 != my id 57 Found one OSD ID in /var/lib/ceph/cluster-id/keyring. To check this out manually corrected it and turned authentication to none too, but did not help. Any clues, how it can be corrected? It sounds like maybe the symlinks to data and journal aren't matching up with where they're supposed to be. This is usually a result of using unstable /dev links that don't always match to the same physical disks. Have you checked that? -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osds fails to start with mismatch in id
Hi I am on ceph 0.87, RHEL 7 Out of 60 few osd's start and the rest complain about mismatch about id's as below. 2014-11-09 07:09:55.501177 7f4633e01880 -1 OSD id 56 != my id 53 2014-11-09 07:09:55.810048 7f636edf4880 -1 OSD id 57 != my id 54 2014-11-09 07:09:56.122957 7f459a766880 -1 OSD id 58 != my id 55 2014-11-09 07:09:56.429771 7f87f8e0c880 -1 OSD id 0 != my id 56 2014-11-09 07:09:56.741329 7fadd9b91880 -1 OSD id 2 != my id 57 Found one OSD ID in /var/lib/ceph/cluster-id/keyring. To check this out manually corrected it and turned authentication to none too, but did not help. Any clues, how it can be corrected? Few OSD's are up though. cluster 580f6503-2271-44b0-8ee6-e95c8f1c87c6 health HEALTH_WARN 3451 pgs stale; 3451 pgs stuck stale; 7/17 in osds are down monmap e1: 1 mons at {host=192.168.30.201:6789/0}, election epoch 1, quorum 0 host osdmap e410: 60 osds: 10 up, 17 in Regards, Rama ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com