Re: [ceph-users] OSDs crashing

2018-09-25 Thread Brad Hubbard
On Tue, Sep 25, 2018 at 11:31 PM Josh Haft wrote: > > Hi cephers, > > I have a cluster of 7 storage nodes with 12 drives each and the OSD > processes are regularly crashing. All 84 have crashed at least once in > the past two days. Cluster is Luminous 12.2.2 on CentOS 7.4.1708, > kernel version

Re: [ceph-users] PG inconsistent, "pg repair" not working

2018-09-25 Thread Brad Hubbard
On Tue, Sep 25, 2018 at 7:50 PM Sergey Malinin wrote: > > # rados list-inconsistent-obj 1.92 > {"epoch":519,"inconsistents":[]} It's likely the epoch has changed since the last scrub and you'll need to run another scrub to repopulate this data. > > September 25, 2018 4:58 AM, "Brad Hubbard"

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread by morphin
After I tried too many things with so many helps on IRC. My pool health is still in ERROR and I think I can't recover from this. https://paste.ubuntu.com/p/HbsFnfkYDT/ At the end 2 of 3 mons crashed and started at same time and the pool is offlined. Recovery takes more than 12hours and it is way

Re: [ceph-users] ceph-fuse using excessive memory

2018-09-25 Thread Andras Pataki
Hi Zheng, Here is a debug dump: https://users.flatironinstitute.org/apataki/public_www/7f0011f676112cd4/ I have also included some other corresponding information (cache dump, mempool dump, perf dump and ceph.conf).  This corresponds to a 100GB ceph-fuse process while the client code is

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread by morphin
Hi, Cluster is still down :( Up to not we have managed to compensate the OSDs. 118s of 160 OSD are stable and cluster is still in the progress of settling. Thanks for the guy Be-El in the ceph IRC channel. Be-El helped a lot to make flapping OSDs stable. What we learned up now is that this is

Re: [ceph-users] ACL '+' not shown in 'ls' on kernel cephfs mount

2018-09-25 Thread Chad W Seys
P.S. kernel 4.18.6 # uname -a Linux tardis 4.18.0-1-amd64 #1 SMP Debian 4.18.6-1 (2018-09-06) x86_64 GNU/Linux ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ACL '+' not shown in 'ls' on kernel cephfs mount

2018-09-25 Thread Chad W Seys
Hi all, It appears as though the '+' which indicates an extended ACL is not shown when 'ls'-ing cephfs is mounted by kernel. # ls -al total 9 drwxrwxr-x+ 4 root smbadmin4096 Aug 13 10:14 . drwxrwxr-x 5 root smbadmin4096 Aug 17 09:37 .. dr-xr-xr-x 4 root root 3 Sep 11 09:50

Re: [ceph-users] advice with erasure coding

2018-09-25 Thread Paul Emmerich
VMs on erasure coded SSDs with fast_read work fine since 12.2.2. Paul Am Sa., 8. Sep. 2018 um 18:17 Uhr schrieb David Turner : > > I tested running VMs on EC back in Hammer. The performance was just bad. I > didn't even need much io, but even performing standard maintenance was > annoying

[ceph-users] OSDs crashing

2018-09-25 Thread Josh Haft
Hi cephers, I have a cluster of 7 storage nodes with 12 drives each and the OSD processes are regularly crashing. All 84 have crashed at least once in the past two days. Cluster is Luminous 12.2.2 on CentOS 7.4.1708, kernel version 3.10.0-693.el7.x86_64. I rebooted one of the OSD nodes to see if

Re: [ceph-users] issued! = cap->implemented in handle_cap_export

2018-09-25 Thread Yan, Zheng
> On Sep 25, 2018, at 20:24, Ilya Dryomov wrote: > > On Tue, Sep 25, 2018 at 2:05 PM 刘 轩 wrote: >> >> Hi Ilya: >> >> I have some questions about the commit >> d84b37f9fa9b23a46af28d2e9430c87718b6b044 about the function >> handle_cap_export. In which case, issued! = cap->implemented may

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread Eugen Block
I would try to reduce recovery to a minimum, something like this helped us in in a small cluster (25 OSDs on 3 hosts) in case of recovery while operation continued without impact: ceph tell 'osd.*' injectargs '--osd-recovery-max-active 2' ceph tell 'osd.*' injectargs '--osd-max-backfills 8'

Re: [ceph-users] issued! = cap->implemented in handle_cap_export

2018-09-25 Thread Ilya Dryomov
On Tue, Sep 25, 2018 at 2:05 PM 刘 轩 wrote: > > Hi Ilya: > > I have some questions about the commit > d84b37f9fa9b23a46af28d2e9430c87718b6b044 about the function > handle_cap_export. In which case, issued! = cap->implemented may occur. > > I encountered this kind of mistake in my cluster. Do

Re: [ceph-users] can we drop support of centos/rhel 7.4?

2018-09-25 Thread kefu chai
On Mon, Sep 24, 2018 at 11:39 PM Ken Dreyer wrote: > > On Thu, Sep 13, 2018 at 8:48 PM kefu chai wrote: > > my question is: is it okay to drop the support of centos/rhel 7.4? so > > we will solely build and test the supported Ceph releases (luminous, > > mimic) on 7.5 ? > > CentOS itself does

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread by morphin
After reducing the recovery parameter values did not change much. There are a lot of OSD still marked down. I don't know what I need to do after this point. [osd] osd recovery op priority = 63 osd client op priority = 1 osd recovery max active = 1 osd max scrubs = 1 ceph -s cluster: id:

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread Sergey Malinin
Now you also have PGs in 'creating' state. Creating PGs is very IO intensive operation. To me, nothing special going on there - recovery + deep scrubbing + creating PGs results in expected degradation of performance. September 25, 2018 2:32 PM, "by morphin" wrote: > 29 creating+down > 4

[ceph-users] tiering vs bluestore blockdb

2018-09-25 Thread Fyodor Ustinov
Hi! Question: for what better use SSD? For tiering or for blockdb? I have this configuration: sdb - 512GB SSD sdc - 512GB SSD sdd - 10T HDD sdb splitted to sdb1 and sdb2. sdb1 used as blockdb for sdd. In the near future, it is planned to add one more HDD and sdb2 will be used as a blockdb for

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread by morphin
The config didnt work. Because increasing the number faced with more OSD Drops. bhfs -s cluster: id: 89569e73-eb89-41a4-9fc9-d2a5ec5f4106 health: HEALTH_ERR norebalance,norecover flag(s) set 1 osds down 17/8839434 objects unfound (0.000%)

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread Sergey Malinin
Settings that heavily affect recovery performance are: osd_recovery_sleep osd_recovery_sleep_[hdd|ssd] See this for details: http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/ September 25, 2018 1:57 PM, "by morphin" wrote: > Thank you for answer > > What do you think the

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread by morphin
Thank you for answer What do you think the conf for speed the recover? [osd] osd recovery op priority = 63 osd client op priority = 1 osd recovery max active = 16 osd max scrubs = 16 adresine sahip kullanıcı 25 Eyl 2018 Sal, 13:37 tarihinde şunu yazdı: > > Just let it recover. > > data: >

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread Caspar Smit
You can set: *osd_scrub_during_recovery = false* and in addition maybe set the noscrub and nodeep-scrub flags to let it settle. Kind regards, Caspar Op di 25 sep. 2018 om 12:39 schreef Sergey Malinin : > Just let it recover. > > data: > pools: 1 pools, 4096 pgs > objects: 8.95 M

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread Sergey Malinin
Just let it recover. data: pools: 1 pools, 4096 pgs objects: 8.95 M objects, 17 TiB usage: 34 TiB used, 577 TiB / 611 TiB avail pgs: 94.873% pgs not active 48475/17901254 objects degraded (0.271%) 1/8950627 objects unfound (0.000%)

[ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread by morphin
Hello. Half a hour ago, 7/28 of my servers are crashed (because of corosync! "2.4.4-3") and 2 of them was MON, I have 3 MON on my cluster. After they come back, I see high disk utilization because of ceph-osd processes. All of my cluster is not responding right now! All of my OSDs are consuming

Re: [ceph-users] PG inconsistent, "pg repair" not working

2018-09-25 Thread Sergey Malinin
# rados list-inconsistent-obj 1.92 {"epoch":519,"inconsistents":[]} September 25, 2018 4:58 AM, "Brad Hubbard" wrote: > What does the output of the following command look like? > > $ rados list-inconsistent-obj 1.92 ___ ceph-users mailing list

Re: [ceph-users] PG inconsistent, "pg repair" not working

2018-09-25 Thread Marc Roos
And where is the manual for bluestore? -Original Message- From: mj [mailto:li...@merit.unu.edu] Sent: dinsdag 25 september 2018 9:56 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] PG inconsistent, "pg repair" not working Hi, I was able to solve a similar issue on our

Re: [ceph-users] PG inconsistent, "pg repair" not working

2018-09-25 Thread mj
Hi, I was able to solve a similar issue on our cluster using this blog: https://ceph.com/geen-categorie/ceph-manually-repair-object/ It does help if you are running a 3/2 config. Perhaps it helps you as well. MJ On 09/25/2018 02:37 AM, Sergey Malinin wrote: Hello, During normal operation