On Tue, Sep 25, 2018 at 11:31 PM Josh Haft wrote:
>
> Hi cephers,
>
> I have a cluster of 7 storage nodes with 12 drives each and the OSD
> processes are regularly crashing. All 84 have crashed at least once in
> the past two days. Cluster is Luminous 12.2.2 on CentOS 7.4.1708,
> kernel version
On Tue, Sep 25, 2018 at 7:50 PM Sergey Malinin wrote:
>
> # rados list-inconsistent-obj 1.92
> {"epoch":519,"inconsistents":[]}
It's likely the epoch has changed since the last scrub and you'll need
to run another scrub to repopulate this data.
>
> September 25, 2018 4:58 AM, "Brad Hubbard"
After I tried too many things with so many helps on IRC. My pool
health is still in ERROR and I think I can't recover from this.
https://paste.ubuntu.com/p/HbsFnfkYDT/
At the end 2 of 3 mons crashed and started at same time and the pool
is offlined. Recovery takes more than 12hours and it is way
Hi Zheng,
Here is a debug dump:
https://users.flatironinstitute.org/apataki/public_www/7f0011f676112cd4/
I have also included some other corresponding information (cache dump,
mempool dump, perf dump and ceph.conf). This corresponds to a 100GB
ceph-fuse process while the client code is
Hi,
Cluster is still down :(
Up to not we have managed to compensate the OSDs. 118s of 160 OSD are
stable and cluster is still in the progress of settling. Thanks for
the guy Be-El in the ceph IRC channel. Be-El helped a lot to make
flapping OSDs stable.
What we learned up now is that this is
P.S. kernel 4.18.6
# uname -a
Linux tardis 4.18.0-1-amd64 #1 SMP Debian 4.18.6-1 (2018-09-06) x86_64
GNU/Linux
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hi all,
It appears as though the '+' which indicates an extended ACL is not
shown when 'ls'-ing cephfs is mounted by kernel.
# ls -al
total 9
drwxrwxr-x+ 4 root smbadmin4096 Aug 13 10:14 .
drwxrwxr-x 5 root smbadmin4096 Aug 17 09:37 ..
dr-xr-xr-x 4 root root 3 Sep 11 09:50
VMs on erasure coded SSDs with fast_read work fine since 12.2.2.
Paul
Am Sa., 8. Sep. 2018 um 18:17 Uhr schrieb David Turner :
>
> I tested running VMs on EC back in Hammer. The performance was just bad. I
> didn't even need much io, but even performing standard maintenance was
> annoying
Hi cephers,
I have a cluster of 7 storage nodes with 12 drives each and the OSD
processes are regularly crashing. All 84 have crashed at least once in
the past two days. Cluster is Luminous 12.2.2 on CentOS 7.4.1708,
kernel version 3.10.0-693.el7.x86_64. I rebooted one of the OSD nodes
to see if
> On Sep 25, 2018, at 20:24, Ilya Dryomov wrote:
>
> On Tue, Sep 25, 2018 at 2:05 PM 刘 轩 wrote:
>>
>> Hi Ilya:
>>
>> I have some questions about the commit
>> d84b37f9fa9b23a46af28d2e9430c87718b6b044 about the function
>> handle_cap_export. In which case, issued! = cap->implemented may
I would try to reduce recovery to a minimum, something like this
helped us in in a small cluster (25 OSDs on 3 hosts) in case of
recovery while operation continued without impact:
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 2'
ceph tell 'osd.*' injectargs '--osd-max-backfills 8'
On Tue, Sep 25, 2018 at 2:05 PM 刘 轩 wrote:
>
> Hi Ilya:
>
> I have some questions about the commit
> d84b37f9fa9b23a46af28d2e9430c87718b6b044 about the function
> handle_cap_export. In which case, issued! = cap->implemented may occur.
>
> I encountered this kind of mistake in my cluster. Do
On Mon, Sep 24, 2018 at 11:39 PM Ken Dreyer wrote:
>
> On Thu, Sep 13, 2018 at 8:48 PM kefu chai wrote:
> > my question is: is it okay to drop the support of centos/rhel 7.4? so
> > we will solely build and test the supported Ceph releases (luminous,
> > mimic) on 7.5 ?
>
> CentOS itself does
After reducing the recovery parameter values did not change much.
There are a lot of OSD still marked down.
I don't know what I need to do after this point.
[osd]
osd recovery op priority = 63
osd client op priority = 1
osd recovery max active = 1
osd max scrubs = 1
ceph -s
cluster:
id:
Now you also have PGs in 'creating' state. Creating PGs is very IO intensive
operation.
To me, nothing special going on there - recovery + deep scrubbing + creating
PGs results in expected degradation of performance.
September 25, 2018 2:32 PM, "by morphin" wrote:
> 29 creating+down
> 4
Hi!
Question: for what better use SSD? For tiering or for blockdb?
I have this configuration:
sdb - 512GB SSD
sdc - 512GB SSD
sdd - 10T HDD
sdb splitted to sdb1 and sdb2. sdb1 used as blockdb for sdd. In the near
future, it is planned to add one more HDD and sdb2 will be used as a blockdb
for
The config didnt work. Because increasing the number faced with more OSD Drops.
bhfs -s
cluster:
id: 89569e73-eb89-41a4-9fc9-d2a5ec5f4106
health: HEALTH_ERR
norebalance,norecover flag(s) set
1 osds down
17/8839434 objects unfound (0.000%)
Settings that heavily affect recovery performance are:
osd_recovery_sleep
osd_recovery_sleep_[hdd|ssd]
See this for details:
http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/
September 25, 2018 1:57 PM, "by morphin" wrote:
> Thank you for answer
>
> What do you think the
Thank you for answer
What do you think the conf for speed the recover?
[osd]
osd recovery op priority = 63
osd client op priority = 1
osd recovery max active = 16
osd max scrubs = 16
adresine sahip kullanıcı 25 Eyl 2018 Sal,
13:37 tarihinde şunu yazdı:
>
> Just let it recover.
>
> data:
>
You can set:
*osd_scrub_during_recovery = false*
and in addition maybe set the noscrub and nodeep-scrub flags to let it
settle.
Kind regards,
Caspar
Op di 25 sep. 2018 om 12:39 schreef Sergey Malinin :
> Just let it recover.
>
> data:
> pools: 1 pools, 4096 pgs
> objects: 8.95 M
Just let it recover.
data:
pools: 1 pools, 4096 pgs
objects: 8.95 M objects, 17 TiB
usage: 34 TiB used, 577 TiB / 611 TiB avail
pgs: 94.873% pgs not active
48475/17901254 objects degraded (0.271%)
1/8950627 objects unfound (0.000%)
Hello.
Half a hour ago, 7/28 of my servers are crashed (because of corosync!
"2.4.4-3")
and 2 of them was MON, I have 3 MON on my cluster.
After they come back, I see high disk utilization because of ceph-osd processes.
All of my cluster is not responding right now! All of my OSDs are
consuming
# rados list-inconsistent-obj 1.92
{"epoch":519,"inconsistents":[]}
September 25, 2018 4:58 AM, "Brad Hubbard" wrote:
> What does the output of the following command look like?
>
> $ rados list-inconsistent-obj 1.92
___
ceph-users mailing list
And where is the manual for bluestore?
-Original Message-
From: mj [mailto:li...@merit.unu.edu]
Sent: dinsdag 25 september 2018 9:56
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG inconsistent, "pg repair" not working
Hi,
I was able to solve a similar issue on our
Hi,
I was able to solve a similar issue on our cluster using this blog:
https://ceph.com/geen-categorie/ceph-manually-repair-object/
It does help if you are running a 3/2 config.
Perhaps it helps you as well.
MJ
On 09/25/2018 02:37 AM, Sergey Malinin wrote:
Hello,
During normal operation
25 matches
Mail list logo