Re: [ceph-users] Mimic 13.2.4 rbd du slowness

2019-02-27 Thread Wido den Hollander
On 2/28/19 2:59 AM, Glen Baars wrote: > Hello Ceph Users, > > Has anyone found a way to improve the speed of the rbd du command on large > rbd images? I have object map and fast diff enabled - no invalid flags on the > image or it's snapshots. > > We recently upgraded our Ubuntu 16.04 KVM

Re: [ceph-users] Multi-Site Cluster RGW Sync issues

2019-02-27 Thread Matthew H
Hey Ben, Could you include the following? radosgw-admin mdlog list Thanks, From: ceph-users on behalf of Benjamin.Zieglmeier Sent: Tuesday, February 26, 2019 9:33 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Multi-Site Cluster RGW Sync issues

Re: [ceph-users] radosgw sync falling behind regularly

2019-02-27 Thread Matthew H
Hey Christian, I'm making a while guess, but assuming this is 12.2.8. If so, it it possible that you can upgrade to 12.2.11? There's been rgw multisite bug fixes for metadata syncing and data syncing ( both separate issues ) that you could be hitting. Thanks,

[ceph-users] mon failed to return metadata for mds.ceph04: (2) No such file or directory

2019-02-27 Thread Ch Wan
Hi all, I have a ceph cluster(12.2.5), all servers run on centos7. Recently I noticed some error log in mgr > 2019-02-28 12:02:16.302888 7ff5665ff700 0 ms_deliver_dispatch: unhandled > message 0x7ff5913539c0 mgrreport(mds.ceph04 +24-0 packed 214) v5 from mds.0 > x.x.x.x:6800/3089687518 >

Re: [ceph-users] osd exit common/Thread.cc: 160: FAILED assert(ret == 0)--10.2.10

2019-02-27 Thread lin zhou
Thanks, Greg. Your reply always so fast. I check my system these limits. # ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals

[ceph-users] Mimic 13.2.4 rbd du slowness

2019-02-27 Thread Glen Baars
Hello Ceph Users, Has anyone found a way to improve the speed of the rbd du command on large rbd images? I have object map and fast diff enabled - no invalid flags on the image or it's snapshots. We recently upgraded our Ubuntu 16.04 KVM servers for Cloudstack to Ubuntu 18.04. The upgrades

[ceph-users] radosgw sync falling behind regularly

2019-02-27 Thread Christian Rice
Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters in one zonegroup. Often we find either metadata or data sync behind, and it doesn’t look to ever recover until…we restart the endpoint radosgw target service. eg at 15:45:40: dc11-ceph-rgw1:/var/log/ceph#

Re: [ceph-users] RBD poor performance

2019-02-27 Thread Mark Nelson
FWIW, I've got recent tests of a fairly recent master build (14.0.1-3118-gd239c2a) showing a single OSD hitting ~33-38K 4k randwrite IOPS with 3 client nodes running fio (io_depth = 32) both with RBD and with CephFS.  The OSD node had older gen CPUs (Xeon E5-2650 v3) and NVMe drives (Intel

Re: [ceph-users] RBD poor performance

2019-02-27 Thread Vitaliy Filippov
By "maximum write iops of an osd" I mean total iops divided by the number of OSDs. For example, an expensive setup from Micron (https://www.micron.com/about/blog/2018/april/micron-9200-max-red-hat-ceph-storage-30-reference-architecture-block-performance) has got only 8750 peak write iops per

[ceph-users] Ceph 2 PGs Inactive and Incomplete after node reboot and OSD toast

2019-02-27 Thread Pardhiv Karri
Hi, We had a hardware failure of one node and when it came back we had one OSD 489 that is showing as live but is not taking IO, we stopped the OSD and changed the crush weight to 0, but then those two PGs moved to 2 different OSDs (490.492). This caused rebalancing and 2 PGs being stuck inactive

Re: [ceph-users] RBD poor performance

2019-02-27 Thread Marc Roos
At some point I would expect the cpu to be the bottleneck. They have always been saying this here for better latency get fast cpu's. Would be nice to know what GHz you are testing, and how that scales. Rep 1-3, erasure propably also takes a hit. How do you test maximum iops of the osd? (Just

Re: [ceph-users] rbd space usage

2019-02-27 Thread Marc Roos
They are 'thin provisioned' meaning if you create a 10GB rbd, it does not use 10GB at the start. (afaik) -Original Message- From: solarflow99 [mailto:solarflo...@gmail.com] Sent: 27 February 2019 22:55 To: Ceph Users Subject: [ceph-users] rbd space usage using ceph df it looks as

[ceph-users] rbd space usage

2019-02-27 Thread solarflow99
using ceph df it looks as if RBD images can use the total free space available of the pool it belongs to, 8.54% yet I know they are created with a --size parameter and thats what determines the actual space. I can't understand the difference i'm seeing, only 5T is being used but ceph df shows

Re: [ceph-users] RBD poor performance

2019-02-27 Thread Vitaliy Filippov
To me it seems Ceph's iops limit is 1 (maybe 15000 with BIS hardware) per an OSD. After that number it starts to get stuck on CPU. I've tried to create a pool from 3 OSDs in loop devices over tmpfs and I've only got ~15000 iops :) good disks aren't the bottleneck, CPU is. -- With best

Re: [ceph-users] Blocked ops after change from filestore on HDD to bluestore on SDD

2019-02-27 Thread Vitaliy Filippov
I think this should not lead to blocked ops in any case, even if the performance is low... -- With best regards, Vitaliy Filippov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph osd pg-upmap-items not working

2019-02-27 Thread Kári Bertilsson
Hello I am trying to diagnose why upmap stopped working where it was previously working fine. Trying to move pg 41.1 to 123 has no effect and seems to be ignored. # ceph osd pg-upmap-items 41.1 23 123 set 41.1 pg_upmap_items mapping to [23->123] No rebalacing happens and if i run it again it

Re: [ceph-users] luminous 12.2.11 on debian 9 requires nscd?

2019-02-27 Thread Gregory Farnum
This is probably a build issue of some kind, but I'm not quite sure how... The MDS (and all the Ceph code) is just invoking the getgrnam_r function, which is part of POSIX and implemented by glib (or whatever other libc). So any dependency on nscd is being created "behind our backs" somewhere.

Re: [ceph-users] osd exit common/Thread.cc: 160: FAILED assert(ret == 0)--10.2.10

2019-02-27 Thread Gregory Farnum
The OSD tried to create a new thread, and the kernel told it no. You probably need to turn up the limits on threads and/or file descriptors. -Greg On Wed, Feb 27, 2019 at 2:36 AM hnuzhoulin2 wrote: > Hi, guys > > So far, there have been 10 osd service exit because of this error. > the error

[ceph-users] PG Calculations Issue

2019-02-27 Thread Krishna Venkata
Greetings, I am having issues in the way PGs are calculated in https://ceph.com/pgcalc/ [Ceph PGs per Pool Calculator ] and the formulae mentioned in the site. Below are my findings The formula to calculate PGs as mentioned in the https://ceph.com/pgcalc/ : 1. Need to pick the highest

Re: [ceph-users] [Ceph-community] How does ceph use the STS service?

2019-02-27 Thread Pritha Srivastava
Sorry I overlooked the ceph versions in the email. STS Lite is not a part of ceph version 12.2.11 or ceph version 13.2.2. Thanks, Pritha On Wed, Feb 27, 2019 at 9:09 PM Pritha Srivastava wrote: > You need to attach a policy to be able to invoke GetSessionToken. Please > read the documentation

Re: [ceph-users] [Ceph-community] How does ceph use the STS service?

2019-02-27 Thread Pritha Srivastava
You need to attach a policy to be able to invoke GetSessionToken. Please read the documentation below at: https://github.com/ceph/ceph/pull/24818/commits/512b6d8bd951239d44685b25dccaf904f19872b2 Thanks, Pritha On Wed, Feb 27, 2019 at 8:59 PM Sage Weil wrote: > Moving this to ceph-users. > >

Re: [ceph-users] [Ceph-community] How does ceph use the STS service?

2019-02-27 Thread Sage Weil
Moving this to ceph-users. On Wed, 27 Feb 2019, admin wrote: > I want to use the STS service to generate temporary credentials for use by > third-party clients. > > I configured STS lite based on the documentation. > http://docs.ceph.com/docs/master/radosgw/STSLite/ > > This is my

Re: [ceph-users] rbd unmap fails with error: rbd: sysfs write failed rbd: unmap failed: (16) Device or resource busy

2019-02-27 Thread Ilya Dryomov
On Wed, Feb 27, 2019 at 12:00 PM Thomas <74cmo...@gmail.com> wrote: > > Hi, > I have noticed an error when writing to a mapped RBD. > Therefore I unmounted the block device. > Then I tried to unmap it w/o success: > ld2110:~ # rbd unmap /dev/rbd0 > rbd: sysfs write failed > rbd: unmap failed: (16)

Re: [ceph-users] ceph migration

2019-02-27 Thread John Hearns
We did a similar upgrade on a test system yesterday, from mimic to nautilus. All of the PGSstayed offlien till we issued this command: ceph osd require-osd-release nautlius --yes-i-really-mean-it} On Wed, 27 Feb 2019 at 12:19, Zhenshi Zhou wrote: > Hi, > > The servers have moved to the new

Re: [ceph-users] Mimic and cephfs

2019-02-27 Thread Hector Martin
On 26/02/2019 04:17, Andras Pataki wrote: Hi ceph users, As I understand, cephfs in Mimic had significant issues up to and including version 13.2.2.  With some critical patches in Mimic 13.2.4, is cephfs now production quality in Mimic?  Are there folks out there using it in a production

Re: [ceph-users] ceph migration

2019-02-27 Thread Zhenshi Zhou
Hi, The servers have moved to the new datacenter and I got it online following the instruction. # ceph -s cluster: id: 7712ab7e-3c38-44b3-96d3-4e1de9da0ff6 health: HEALTH_OK services: mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 mgr: ceph-mon3(active), standbys:

Re: [ceph-users] Diskprediction - smart returns

2019-02-27 Thread John Hearns
lth metrics in Nautilus Looks very > useful. http://docs.ceph.com/docs/nautilus/mgr/diskprediction/ > > On a Debian 9 system with smartctl version 6.6 2016-05-31 I get this: > > # ceph device get-health-metrics SEAGATE_ST1000NM0023_Z1W1ZB0P > { &

[ceph-users] Diskprediction - smart returns

2019-02-27 Thread John Hearns
I am looking at the diskprediction health metrics in Nautilus Looks very useful. http://docs.ceph.com/docs/nautilus/mgr/diskprediction/ On a Debian 9 system with smartctl version 6.6 2016-05-31 I get this: # ceph device get-health-metrics SEAGATE_ST1000NM0023_Z1W1ZB0P { "20190227-1

[ceph-users] rbd unmap fails with error: rbd: sysfs write failed rbd: unmap failed: (16) Device or resource busy

2019-02-27 Thread Thomas
Hi, I have noticed an error when writing to a mapped RBD. Therefore I unmounted the block device. Then I tried to unmap it w/o success: ld2110:~ # rbd unmap /dev/rbd0 rbd: sysfs write failed rbd: unmap failed: (16) Device or resource busy The same block device is mapped on another client and

Re: [ceph-users] Cephfs recursive stats | rctime in the future

2019-02-27 Thread Hector Martin
On 27/02/2019 19:22, David C wrote: Hi All I'm seeing quite a few directories in my filesystem with rctime years in the future. E.g ]# getfattr -d -m ceph.dir.* /path/to/dir getfattr: Removing leading '/' from absolute path names # file:  path/to/dir ceph.dir.entries="357" ceph.dir.files="1"

[ceph-users] osd exit common/Thread.cc: 160: FAILED assert(ret == 0)--10.2.10

2019-02-27 Thread hnuzhoulin2
Hi, guysSo far, there have been 10 osd service exit because of this error.the error messages are all the same.2019-02-27 17:14:59.757146 7f89925ff700 0 -- 10.191.175.15:6886/192803 >> 10.191.175.49:6833/188731 pipe(0x55ebba819400 sd=741 :6886 s=0 pgs=0 cs=0 l=0

Re: [ceph-users] Blocked ops after change from filestore on HDD to bluestore on SDD

2019-02-27 Thread Igor Fedotov
Hi Uwe, AFAIR Samsung 860 Pro isn't for enterprise market, you shouldn't use consumer SSDs for Ceph. I had some experience with Samsung 960 Pro a while ago and it turned out that it handled fsync-ed writes very slowly (comparing to the original/advertised performance). Which one can

Re: [ceph-users] Blocked ops after change from filestore on HDD to bluestore on SDD

2019-02-27 Thread Joachim Kraftmayer
Hi Uwe, I can only recommend the use of enterprise SSDs. We've tested many consumer SSDs in the past, including your SSDs. Many of them are not suitable for long-term use and some weard out within 6 months. Cheers, Joachim Homepage: https://www.clyso.com Am 27.02.2019 um 10:24 schrieb

[ceph-users] Cephfs recursive stats | rctime in the future

2019-02-27 Thread David C
Hi All I'm seeing quite a few directories in my filesystem with rctime years in the future. E.g ]# getfattr -d -m ceph.dir.* /path/to/dir getfattr: Removing leading '/' from absolute path names # file: path/to/dir ceph.dir.entries="357" ceph.dir.files="1" ceph.dir.rbytes="35606883904011"

Re: [ceph-users] RBD poor performance

2019-02-27 Thread Ashley Merrick
The SSD's your using are rated for random writes of around 16,000. After you remove the 3 way replication across your 12 disks you only actually have 4 physical disks worth of I/O, as the other disks would be also busy writing the 2nd and 3rd copy of the object (if this makes sense) So with the

[ceph-users] RBD poor performance

2019-02-27 Thread Weird Deviations
Hello all I faced with poor performance on RBD images First, my lab's hardware consists of 3 intel server with - 2 intel xeon e5-2660 v4 (all powersaving stuff are turned off in BIOS) running on - S2600TPR MOBO - 256 Gb RAM - 4 Sata SSD intel 960 Gb model DC S3520 for OSD - 2 Sata SSD intel 480 Gb

Re: [ceph-users] Blocked ops after change from filestore on HDD to bluestore on SDD

2019-02-27 Thread Eneko Lacunza
Hi Uwe, We tried to use a Samsung 840 Pro SSD as OSD some time ago and it was a no-go; it wasn't that performance was bad, it just didn't work for the kind of use of OSD. Any HDD was better than it (the disk was healthy and have been used in a software raid-1 for a pair of years). I suggest