from:"Brad Hubbard"

Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)

2018-07-18 Thread Brad Hubbard

On Wed, Jul 18, 2018 at 2:57 AM, Troy Ablan wrote: > I was on 12.2.5 for a couple weeks and started randomly seeing > corruption, moved to 12.2.6 via yum update on Sunday, and all hell broke > loose. I panicked and moved to Mimic, and when that didn't solve the > problem, only then did I start

Re: [ceph-users] Jewel PG stuck inconsistent with 3 0-size objects

2018-07-16 Thread Brad Hubbard

Your issue is different since not only do the omap digests of all replicas not match the omap digest from the auth object info but they are all different to each other. What is min_size of pool 67 and what can you tell us about the events leading up to this? On Mon, Jul 16, 2018 at 7:06 PM,

Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Brad Hubbard

Ceph doesn't shut down systems as in kill or reboot the box if that's what you're saying? On Mon, Jul 23, 2018 at 5:04 PM, Nicolas Huillard wrote: > Le lundi 23 juillet 2018 à 11:07 +0700, Konstantin Shalygin a écrit : >> > I even have no fancy kernel or device, just real standard Debian. >> >

Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)

2018-07-18 Thread Brad Hubbard

On Thu, Jul 19, 2018 at 12:47 PM, Troy Ablan wrote: > > > On 07/18/2018 06:37 PM, Brad Hubbard wrote: >> On Thu, Jul 19, 2018 at 2:48 AM, Troy Ablan wrote: >>> >>> >>> On 07/17/2018 11:14 PM, Brad Hubbard wrote: >>>> >>>>

Re: [ceph-users] Slow requests

2018-07-04 Thread Brad Hubbard

On Wed, Jul 4, 2018 at 6:26 PM, Benjamin Naber wrote: > Hi @all, > > im currently in testing for setup an production environment based on the > following OSD Nodes: > > CEPH Version: luminous 12.2.5 > > 5x OSD Nodes with following specs: > > - 8 Core Intel Xeon 2,0 GHZ > > - 96GB Ram > > - 10x

Re: [ceph-users] Slow requests

2018-07-09 Thread Brad Hubbard

rnel exhibiting the problem. > > kind regards > > Ben > >> Brad Hubbard hat am 5. Juli 2018 um 01:16 geschrieben: >> >> >> On Wed, Jul 4, 2018 at 6:26 PM, Benjamin Naber >> wrote: >> > Hi @all, >> > >> > im currently in testing for

Re: [ceph-users] Ceph 12.2.2 - Compiler Hangs on src/rocksdb/monitoring/statistics.cc

2018-01-13 Thread Brad Hubbard

On Sun, Jan 14, 2018 at 4:41 AM, Dyweni - Ceph-Users <6exbab4fy...@dyweni.com> wrote: > Hi, > > GLIBC 2.25-r9 > GCC 6.4.0-r1 > > When compiling Ceph 12.2.2, the compilation hangs (cc1plus goes into an > infinite loop and never finishes, requiring the process to be killed > manually) while

Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

2018-01-16 Thread Brad Hubbard

On Wed, Jan 17, 2018 at 2:20 AM, Nikos Kormpakis <nk...@noc.grnet.gr> wrote: > On 01/16/2018 12:53 AM, Brad Hubbard wrote: >> On Tue, Jan 16, 2018 at 1:35 AM, Alexander Peters <apet...@sphinx.at> wrote: >>> i created the dump output but it looks very cryptic to me so

Re: [ceph-users] OSD doesn't start - fresh installation

2018-01-22 Thread Brad Hubbard

On Mon, Jan 22, 2018 at 10:37 PM, Hüseyin Atatür YILDIRIM < hyildi...@havelsan.com.tr> wrote: > > Hi again, > > > > In the “journalctl –xe” output: > > > > Jan 22 15:29:18 mon02 ceph-osd-prestart.sh[1526]: OSD data directory > /var/lib/ceph/osd/ceph-1 does not exist; bailing out. > > > > Also in

Re: [ceph-users] pg inconsistent

2018-03-07 Thread Brad Hubbard

On Thu, Mar 8, 2018 at 1:22 AM, Harald Staub wrote: > "ceph pg repair" leads to: > 5.7bd repair 2 errors, 0 fixed > > Only an empty list from: > rados list-inconsistent-obj 5.7bd --format=json-pretty > > Inspired by http://tracker.ceph.com/issues/12577 , I tried again with

Re: [ceph-users] /var/lib/ceph/osd/ceph-xxx/current/meta shows "Structure needs cleaning"

2018-03-08 Thread Brad Hubbard

On Thu, Mar 8, 2018 at 5:01 PM, 赵贺东 wrote: > Hi All, > > Every time after we activate osd, we got “Structure needs cleaning” in > /var/lib/ceph/osd/ceph-xxx/current/meta. > > > /var/lib/ceph/osd/ceph-xxx/current/meta > # ls -l > ls: reading directory .: Structure needs

Re: [ceph-users] /var/lib/ceph/osd/ceph-xxx/current/meta shows "Structure needs cleaning"

2018-03-08 Thread Brad Hubbard

On Thu, Mar 8, 2018 at 7:33 PM, 赵赵贺东 <zhaohed...@gmail.com> wrote: > Hi Brad, > > Thank you for your attention. > >> 在 2018年3月8日，下午4:47，Brad Hubbard <bhubb...@redhat.com> 写道： >> >> On Thu, Mar 8, 2018 at 5:01 PM, 赵贺东 <zhaohed...@gmail.com> wrote:

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-08 Thread Brad Hubbard

On Fri, Mar 9, 2018 at 3:54 AM, Subhachandra Chandra wrote: > I noticed a similar crash too. Unfortunately, I did not get much info in the > logs. > > *** Caught signal (Segmentation fault) ** > > Mar 07 17:58:26 data7 ceph-osd-run.sh[796380]: in thread 7f63a0a97700 >

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-03-06 Thread Brad Hubbard

On Tue, Mar 6, 2018 at 5:26 PM, Marco Baldini - H.S. Amiata < mbald...@hsamiata.it> wrote: > Hi > > I monitor dmesg in each of the 3 nodes, no hardware issue reported. And > the problem happens with various different OSDs in different nodes, for me > it is clear it's not an hardware problem. >

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-03-06 Thread Brad Hubbard

debug_osd that is... :) On Tue, Mar 6, 2018 at 7:10 PM, Brad Hubbard <bhubb...@redhat.com> wrote: > > > On Tue, Mar 6, 2018 at 5:26 PM, Marco Baldini - H.S. Amiata < > mbald...@hsamiata.it> wrote: > >> Hi >> >> I monitor dmesg in ea

Re: [ceph-users] 1 mon unable to join the quorum

2018-04-04 Thread Brad Hubbard

method of http://docs.ceph.com/docs/jewel/rados/operations/add-or-rm-mons/ > (with id controller02) > > The logs provided are when the controller02 was added with the manual > method. > > But the controller02 won't join the cluster > > Hope It helps understand > > >

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread Brad Hubbard

See the thread in this very ML titled "Ceph iSCSI is a prank?", last update thirteen days ago. If your questions are not answered by that thread let us know. Please also remember that CentOS is not the only platform that ceph runs on by a long shot and that not all distros lag as much as it (not

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-27 Thread Brad Hubbard

"NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this." Have you ever wondered what this means and why it's there? :) This is at least something you can try. it may provide useful information, it may not. This stack looks like it is either corrupted, or possibly not in

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread Brad Hubbard

d it probably will be correct again in the near future and, if not, we can review and correct it as necessary. > There is something confused about what the documentation minimal > requirements, the dashboard suggest to be able to do, and what i read > around about modded Ceph for ot

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread Brad Hubbard

On Tue, Mar 27, 2018 at 9:46 PM, Brad Hubbard <bhubb...@redhat.com> wrote: > > > On Tue, Mar 27, 2018 at 9:12 PM, Max Cuttins <m...@phoenixweb.it> wrote: > >> Hi Brad, >> >> that post was mine. I knew it quite well. >> > That Post was about

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-27 Thread Brad Hubbard

t us started. Getting late here for me so I'll take a look at this tomorrow. Thanks! > > http://tracker.ceph.com/issues/23431 > > Maybe Oliver has something to add as well. > > > Dietmar > > > On 03/27/2018 11:37 AM, Brad Hubbard wrote: >> "NOTE: a copy o

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-28 Thread Brad Hubbard

On Wed, Mar 28, 2018 at 6:53 PM, Max Cuttins <m...@phoenixweb.it> wrote: > Il 27/03/2018 13:46, Brad Hubbard ha scritto: > > > > On Tue, Mar 27, 2018 at 9:12 PM, Max Cuttins <m...@phoenixweb.it> wrote: >> >> Hi Brad, >> >> that post was mi

Re: [ceph-users] 1 mon unable to join the quorum

2018-03-30 Thread Brad Hubbard

I'm not sure I completely understand your "test". What exactly are you trying to achieve and what documentation are you following? On Fri, Mar 30, 2018 at 10:49 PM, Julien Lavesque <julien.laves...@objectif-libre.com> wrote: > Brad, > > Thanks for your answer > > On

Re: [ceph-users] 1 mon unable to join the quorum

2018-03-28 Thread Brad Hubbard

Can you update with the result of the following commands from all of the MONs? # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok mon_status # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok quorum_status On Thu, Mar 29, 2018 at 3:11 PM, Gauvain Pocentek

Re: [ceph-users] 1 mon unable to join the quorum

2018-03-29 Thread Brad Hubbard

"name": "controller03", "addr": "172.18.8.7:6789\/0" } ] } } In the monmaps we are called 'controller02', not 'mon.controller02'. These names need to be identical. On Thu, Mar 29, 2018 at 7:23 PM, Julien Lav

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-05 Thread Brad Hubbard

On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev wrote: > On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote: >> Blocked requests and slow requests are synonyms in ceph. They are 2 names >> for the exact same thing. >> >> >> On Thu, Mar 1, 2018,

Re: [ceph-users] fixing unrepairable inconsistent PG

2018-06-28 Thread Brad Hubbard

provide from the time leading up to when the issue was first seen? > > Cheers > > Andrei > - Original Message - >> From: "Brad Hubbard" >> To: "Andrei Mikhailovsky" >> Cc: "ceph-users" >> Sent: Thursday, 28 June, 2018 01:

Re: [ceph-users] How to repair active+clean+inconsistent?

2018-11-11 Thread Brad Hubbard

What does "rados list-inconsistent-obj " say? Note that you may have to do a deep scrub to populate the output. On Mon, Nov 12, 2018 at 5:10 AM K.C. Wong wrote: > > Hi folks, > > I would appreciate any pointer as to how I can resolve a > PG stuck in “active+clean+inconsistent” state. This has >

Re: [ceph-users] How to subscribe to developers list

2018-11-11 Thread Brad Hubbard

What do you get if you send "help" (without quotes) to m ajord...@vger.kernel.org ? On Sun, Nov 11, 2018 at 10:15 AM Cranage, Steve < scran...@deepspacestorage.com> wrote: > Can anyone tell me the secret? A colleague tried and failed many times so > I tried and got this: > > > > > > Steve

Re: [ceph-users] How to repair active+clean+inconsistent?

2018-11-11 Thread Brad Hubbard

C. Wong >> kcw...@verseon.com >> M: +1 (408) 769-8235 >> >> - >> Confidentiality Notice: >> This message contains confidential information. If you are not the >> intended recipient and received this message

Re: [ceph-users] How to repair active+clean+inconsistent?

2018-11-14 Thread Brad Hubbard

t; Clearly, on osd.67, the “attrs” array is empty. The question is, > how do I fix this? > > Many thanks in advance, > > -kc > > K.C. Wong > kcw...@verseon.com > M: +1 (408) 769-8235 > > - > Confidentiality Notice: &

Re: [ceph-users] OSDs crashing

2018-09-25 Thread Brad Hubbard

On Tue, Sep 25, 2018 at 11:31 PM Josh Haft wrote: > > Hi cephers, > > I have a cluster of 7 storage nodes with 12 drives each and the OSD > processes are regularly crashing. All 84 have crashed at least once in > the past two days. Cluster is Luminous 12.2.2 on CentOS 7.4.1708, > kernel version

Re: [ceph-users] PG inconsistent, "pg repair" not working

2018-09-25 Thread Brad Hubbard

On Tue, Sep 25, 2018 at 7:50 PM Sergey Malinin wrote: > > # rados list-inconsistent-obj 1.92 > {"epoch":519,"inconsistents":[]} It's likely the epoch has changed since the last scrub and you'll need to run another scrub to repopulate this data. > > Septem

Re: [ceph-users] process stuck in D state on cephfs kernel mount

2019-01-21 Thread Brad Hubbard

http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html should still be current enough and makes good reading on the subject. On Mon, Jan 21, 2019 at 8:46 PM Stijn De Weirdt wrote: > > hi marc, > > > - how to prevent the D state process to accumulate so much load? > you can't. in

Re: [ceph-users] Ceph 10.2.11 - Status not working

2018-12-17 Thread Brad Hubbard

On Tue, Dec 18, 2018 at 10:23 AM Mike O'Connor wrote: > > Hi All > > I have a ceph cluster which has been working with out issues for about 2 > years now, it was upgrade about 6 month ago to 10.2.11 > > root@blade3:/var/lib/ceph/mon# ceph status > 2018-12-18 10:42:39.242217 7ff770471700 0 --

Re: [ceph-users] Ceph OOM Killer Luminous

2018-12-21 Thread Brad Hubbard

Can you provide the complete OOM message from the dmesg log? On Sat, Dec 22, 2018 at 7:53 AM Pardhiv Karri wrote: > > > Thank You for the quick response Dyweni! > > We are using FileStore as this cluster is upgraded from > Hammer-->Jewel-->Luminous 12.2.8. 16x2TB HDD per node for all nodes.

Re: [ceph-users] Crush, data placement and randomness

2018-12-06 Thread Brad Hubbard

https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf On Thu, Dec 6, 2018 at 8:11 PM Leon Robinson wrote: > > The most important thing to remember about CRUSH is that the H stands for > hashing. > > If you hash the same object you're going to get the same result. > > e.g. cat

Re: [ceph-users] centos 7.6 kernel panic caused by osd

2019-01-10 Thread Brad Hubbard

On Fri, Jan 11, 2019 at 12:20 AM Rom Freiman wrote: > > Hey, > After upgrading to centos7.6, I started encountering the following kernel > panic > > [17845.147263] XFS (rbd4): Unmounting Filesystem > [17846.860221] rbd: rbd4: capacity 3221225472 features 0x1 > [17847.109887] XFS (rbd4): Mounting

Re: [ceph-users] centos 7.6 kernel panic caused by osd

2019-01-10 Thread Brad Hubbard

same setup, you might be hitting the same > bug. Thanks for that Jason, I wasn't aware of that bug. I'm interested to see the details. > > On Thu, Jan 10, 2019 at 6:46 PM Brad Hubbard wrote: > > > > On Fri, Jan 11, 2019 at 12:20 AM Rom Freiman wrote: > > > >

Re: [ceph-users] centos 7.6 kernel panic caused by osd

2019-01-11 Thread Brad Hubbard

Haha, in the email thread he says CentOS but the bug is opened against RHEL :P Is it worth recommending a fix in skb_can_coalesce() upstream so other modules don't hit this? On Fri, Jan 11, 2019 at 7:39 PM Ilya Dryomov wrote: > > On Fri, Jan 11, 2019 at 1:38 AM Brad Hubbard

Re: [ceph-users] centos 7.6 kernel panic caused by osd

2019-01-11 Thread Brad Hubbard

On Fri, Jan 11, 2019 at 8:58 PM Rom Freiman wrote: > > Same kernel :) Not exactly the point I had in mind, but sure ;) > > > On Fri, Jan 11, 2019, 12:49 Brad Hubbard wrote: >> >> Haha, in the email thread he says CentOS but the bug is opened against RHEL >>

Re: [ceph-users] Compacting omap data

2019-01-03 Thread Brad Hubbard

Nautilus will make this easier. https://github.com/ceph/ceph/pull/18096 On Thu, Jan 3, 2019 at 5:22 AM Bryan Stillwell wrote: > > Recently on one of our bigger clusters (~1,900 OSDs) running Luminous > (12.2.8), we had a problem where OSDs would frequently get restarted while >

Re: [ceph-users] [RGWRados]librados: Objecter returned from getxattrs r=-36

2018-09-19 Thread Brad Hubbard

Are you using filestore or bluestore on the OSDs? If filestore what is the underlying filesystem? You could try setting debug_osd and debug_filestore to 20 and see if that gives some more info? On Wed, Sep 19, 2018 at 12:36 PM fatkun chan wrote: > > > ceph version 12.2.5

Re: [ceph-users] Slow OPS

2019-03-20 Thread Brad Hubbard

On Thu, Mar 21, 2019 at 12:11 AM Glen Baars wrote: > > Hello Ceph Users, > > > > Does anyone know what the flag point ‘Started’ is? Is that ceph osd daemon > waiting on the disk subsystem? This is set by "mark_started()" and is roughly set when the pg starts processing the op. Might want to

Re: [ceph-users] Slow OPS

2019-03-20 Thread Brad Hubbard

Actually, the lag is between "sub_op_committed" and "commit_sent". Is there any pattern to these slow requests? Do they involve the same osd, or set of osds? On Thu, Mar 21, 2019 at 3:37 PM Brad Hubbard wrote: > > On Thu, Mar 21, 2019 at 3:20 PM Glen Baars > wrote:

Re: [ceph-users] Slow OPS

2019-03-20 Thread Brad Hubbard

> > Does anyone know what that section is waiting for? Hi Glen, These are documented, to some extent, here. http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ It looks like it may be taking a long time to communicate the commit message back to the client? Are these sl

Re: [ceph-users] scrub errors

2019-03-25 Thread Brad Hubbard

It would help to know what version you are running but, to begin with, could you post the output of the following? $ sudo ceph pg 10.2a query $ sudo rados list-inconsistent-obj 10.2a --format=json-pretty Also, have a read of

Re: [ceph-users] scrub errors

2019-03-25 Thread Brad Hubbard

"last_epoch_clean": 20840, > "parent": "0.0", > "parent_split_bits": 0, > "last_scrub": "21395'11835365", > "last_scrub_stamp": "20

Re: [ceph-users] OS Upgrade now monitor wont start

2019-03-24 Thread Brad Hubbard

Do a "ps auwwx" to see how a running monitor was started and use the equivalent command to try to start the MON that won't start. "ceph-mon --help" will show you what you need. Most important is to get the ID portion right and to add "-d" to get it to run in teh foreground and log to stdout. HTH

Re: [ceph-users] VM management setup

2019-04-05 Thread Brad Hubbard

If you want to do containers at the same time, or transition some/all to containers at some point in future maybe something based on kubevirt [1] would be more futureproof? [1] http://kubevirt.io/ CNV is an example, https://www.redhat.com/en/resources/container-native-virtualization On Sat, Apr

Re: [ceph-users] scrub errors

2019-03-28 Thread Brad Hubbard

ed+inconsistent+peering, and the other peer is active+clean+inconsistent Per the document I linked previously if a pg remains remapped you likely have a problem with your configuration. Take a good look at your crushmap, pg distribution, pool configuration, etc. > > > On Wed, Mar 27, 2019 at 4:1

Re: [ceph-users] scrub errors

2019-03-27 Thread Brad Hubbard

{ > "osd": "7", > "status": "not queried" > }, > { > "osd": "8", > "status": "already probed" > }, >

Re: [ceph-users] scrub errors

2019-03-26 Thread Brad Hubbard

ther OSDs appear to be ok, I see > them up and in, why do you see something wrong? > > On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard wrote: >> >> Hammer is no longer supported. >> >> What's the status of osds 7 and 17? >> >> On Tue, Mar 26, 2019 at 8:56 A

Re: [ceph-users] Fedora 29 Issues.

2019-03-26 Thread Brad Hubbard

https://bugzilla.redhat.com/show_bug.cgi?id=1662496 On Wed, Mar 27, 2019 at 5:00 AM Andrew J. Hutton wrote: > > More or less followed the install instructions with modifications as > needed; but I'm suspecting that either a dependency was missed in the > F29 package or something else is up. I

Re: [ceph-users] http://tracker.ceph.com/issues/38122

2019-03-06 Thread Brad Hubbard

+Jos Collin On Thu, Mar 7, 2019 at 9:41 AM Milanov, Radoslav Nikiforov wrote: > Can someone elaborate on > > > > From http://tracker.ceph.com/issues/38122 > > > > Which exactly package is missing? > > And why is this happening ? In Mimic all dependencies are resolved by yum? > > - Rado > > >

Re: [ceph-users] Failed to repair pg

2019-03-07 Thread Brad Hubbard

you could try reading the data from this object and write it again using rados get then rados put. On Fri, Mar 8, 2019 at 3:32 AM Herbert Alexander Faleiros wrote: > > On Thu, Mar 07, 2019 at 01:37:55PM -0300, Herbert Alexander Faleiros wrote: > > Hi, > > > > # ceph health detail > > HEALTH_ERR

Re: [ceph-users] leak memory when mount cephfs

2019-03-19 Thread Brad Hubbard

On Tue, Mar 19, 2019 at 7:54 PM Zhenshi Zhou wrote: > > Hi, > > I mount cephfs on my client servers. Some of the servers mount without any > error whereas others don't. > > The error: > # ceph-fuse -n client.kvm -m ceph.somedomain.com:6789 /mnt/kvm -r /kvm -d > 2019-03-19 17:03:29.136

Re: [ceph-users] Large OMAP Objects in default.rgw.log pool

2019-03-07 Thread Brad Hubbard

On Fri, Mar 8, 2019 at 4:46 AM Samuel Taylor Liston wrote: > > Hello All, > I have recently had 32 large map objects appear in my default.rgw.log > pool. Running luminous 12.2.8. > > Not sure what to think about these.I’ve done a lot of reading > about how when these

Re: [ceph-users] Slow OPS

2019-03-21 Thread Brad Hubbard

21 16:51:56.862447", > "age": 376.527241, > "duration": 1.331278, > > Kind regards, > Glen Baars > > -Original Message- > From: Brad Hubbard > Sent: Thursday, 21 March 2019 1:43 PM > To: Glen Baars > Cc: cep

Re: [ceph-users] Debugging 'slow requests' ...

2019-02-08 Thread Brad Hubbard

Try capturing another log with debug_ms turned up. 1 or 5 should be Ok to start with. On Fri, Feb 8, 2019 at 8:37 PM Massimo Sgaravatto wrote: > > Our Luminous ceph cluster have been worked without problems for a while, but > in the last days we have been suffering from continuous slow

Re: [ceph-users] OSD fails to start (fsck error, unable to read osd superblock)

2019-02-09 Thread Brad Hubbard

On Sun, Feb 10, 2019 at 1:56 AM Ruben Rodriguez wrote: > > Hi there, > > Running 12.2.11-1xenial on a machine with 6 SSD OSD with bluestore. > > Today we had two disks fail out of the controller, and after a reboot > they both seemed to come back fine but ceph-osd was only able to start > in one

Re: [ceph-users] Debugging 'slow requests' ...

2019-02-09 Thread Brad Hubbard

t; > 2019-02-09 07:35:14.627462 7f99972cc700 1 -- 192.168.222.204:6804/4159520 > <== osd.5 192.168.222.202:6816/157436 2527 > osd_repop(client.171725953.0:404377591 8.9b e1205833/1205735) v2 > 1050+0+123635 (1225076790 0 171428115) 0x5610f5128a00 con 0x5610fc5bf000 > 2019-02-0

Re: [ceph-users] OSD fails to start (fsck error, unable to read osd superblock)

2019-02-13 Thread Brad Hubbard

A single OSD should be expendable and you should be able to just "zap" it and recreate it. Was this not true in your case? On Wed, Feb 13, 2019 at 1:27 AM Ruben Rodriguez wrote: > > > > On 2/9/19 5:40 PM, Brad Hubbard wrote: > > On Sun, Feb 10, 2019 at 1:

Re: [ceph-users] Debugging 'slow requests' ...

2019-02-11 Thread Brad Hubbard

rong/misconfigured with the new switch: we > would try to replicate the problem, possibly without a ceph deployment ... > > Thanks again for your help ! > > Cheers, Massimo > > On Sun, Feb 10, 2019 at 12:07 AM Brad Hubbard wrote: >> >> The log ends at >> >>

Re: [ceph-users] backfill_toofull after adding new OSDs

2019-02-06 Thread Brad Hubbard

Let's try to restrict discussion to the original thread "backfill_toofull while OSDs are not full" and get a tracker opened up for this issue. On Sat, Feb 2, 2019 at 11:52 AM Fyodor Ustinov wrote: > > Hi! > > Right now, after adding OSD: > > # ceph health detail > HEALTH_ERR 74197563/199392333

Re: [ceph-users] showing active config settings

2019-04-15 Thread Brad Hubbard

On Tue, Apr 16, 2019 at 7:38 AM solarflow99 wrote: > > Then why doesn't this work? > > # ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4' > osd.0: osd_recovery_max_active = '4' (not observed, change may require > restart) > osd.1: osd_recovery_max_active = '4' (not observed, change may

Re: [ceph-users] showing active config settings

2019-04-16 Thread Brad Hubbard

puzzled why it doesn't show any change when I run this no matter > what I set it to: > > # ceph -n osd.1 --show-config | grep osd_recovery_max_active > osd_recovery_max_active = 3 > > in fact it doesn't matter if I use an OSD number that doesn't exist, same > thing if I use c

Re: [ceph-users] Is it possible to run a standalone Bluestore instance?

2019-04-17 Thread Brad Hubbard

:15 libceph-common.so -> > libceph-common.so.0 > -rwxr-xr-x. 1 root root 211853400 Apr 17 11:15 libceph-common.so.0 > > > > > Best, > Can Zhang > > On Thu, Apr 18, 2019 at 7:00 AM Brad Hubbard wrote: > > > > On Wed, Apr 17, 2019 at 1:37 PM Can Zhang w

Re: [ceph-users] Is it possible to run a standalone Bluestore instance?

2019-04-17 Thread Brad Hubbard

On Wed, Apr 17, 2019 at 1:37 PM Can Zhang wrote: > > Thanks for your suggestions. > > I tried to build libfio_ceph_objectstore.so, but it fails to load: > > ``` > $ LD_LIBRARY_PATH=./lib ./bin/fio --enghelp=libfio_ceph_objectstore.so > > fio: engine libfio_ceph_objectstore.so not loadable > IO

Re: [ceph-users] Is it possible to run a standalone Bluestore instance?

2019-04-18 Thread Brad Hubbard

gt; Notice the "U" and "V" from nm results. > > > > > Best, > Can Zhang > > On Thu, Apr 18, 2019 at 9:36 AM Brad Hubbard wrote: > > > > Does it define _ZTIN13PriorityCache8PriCacheE ? If it does, and all is > > as you say, then it

Re: [ceph-users] obj_size_info_mismatch error handling

2019-06-17 Thread Brad Hubbard

relating to the clearing in mon, mgr, or osd logs. > > > > So, not entirely sure what fixed it, but it is resolved on its own. > > > > Thanks, > > > > Reed > > > > On Apr 30, 2019, at 8:01 PM, Brad Hubbard wrote: > > > > On Wed, May 1, 2019 at

Re: [ceph-users] obj_size_info_mismatch error handling

2019-04-30 Thread Brad Hubbard

On Wed, May 1, 2019 at 10:54 AM Brad Hubbard wrote: > > Which size is correct? Sorry, accidental discharge =D If the object info size is *incorrect* try forcing a write to the OI with something like the following. 1. rados -p [name_of_pool_17] setomapval 10008536718. tempora

Re: [ceph-users] obj_size_info_mismatch error handling

2019-04-30 Thread Brad Hubbard

Which size is correct? On Tue, Apr 30, 2019 at 1:06 AM Reed Dier wrote: > > Hi list, > > Woke up this morning to two PG's reporting scrub errors, in a way that I > haven't seen before. > > $ ceph versions > { > "mon": { > "ceph version 13.2.5

Re: [ceph-users] Is it possible to run a standalone Bluestore instance?

2019-04-19 Thread Brad Hubbard

If you can give me specific steps so I can reproduce this from a freshly cloned tree I'd be happy to look further into it. Good luck. On Thu, Apr 18, 2019 at 7:00 PM Brad Hubbard wrote: > > Let me try to reproduce this on centos 7.5 with master and I'll let > you know how I go. > >

Re: [ceph-users] set_mon_vals failed to set cluster_network Configuration option 'cluster_network' may not be modified at runtime

2019-07-02 Thread Brad Hubbard

I'd suggest creating a tracker similar to http://tracker.ceph.com/issues/40554 which was created for the issue in the thread you mentioned. On Wed, Jul 3, 2019 at 12:29 AM Vandeir Eduardo wrote: > > Hi, > > on client machines, when I use the command rbd, for example, rbd ls > poolname, this

Re: [ceph-users] details about cloning objects using librados

2019-06-27 Thread Brad Hubbard

On Thu, Jun 27, 2019 at 8:58 PM nokia ceph wrote: > > Hi Team, > > We have a requirement to create multiple copies of an object and currently we > are handling it in client side to write as separate objects and this causes > huge network traffic between client and cluster. > Is there

Re: [ceph-users] details about cloning objects using librados

2019-07-02 Thread Brad Hubbard

gt; application is responsible for any locking needed. > -Greg > > On Tue, Jul 2, 2019 at 3:49 AM Brad Hubbard wrote: > > > > Yes, this should be possible using an object class which is also a > > RADOS client (via the RADOS API). You'll still have some client > >

Re: [ceph-users] Is it possible to run a standalone Bluestore instance?

2019-04-21 Thread Brad Hubbard

> > Best, > Can Zhang > > > On Fri, Apr 19, 2019 at 6:28 PM Brad Hubbard wrote: > > > > OK. So this works for me with master commit > > bdaac2d619d603f53a16c07f9d7bd47751137c4c on Centos 7.5.1804. > > > > I cloned the repo and ran './install-deps.sh'

Re: [ceph-users] showing active config settings

2019-04-16 Thread Brad Hubbard

On Tue, Apr 16, 2019 at 6:03 PM Paul Emmerich wrote: > > This works, it just says that it *might* require a restart, but this > particular option takes effect without a restart. We've already looked at changing the wording once to make it more palatable. http://tracker.ceph.com/issues/18424 >

Re: [ceph-users] details about cloning objects using librados

2019-07-02 Thread Brad Hubbard

t;> Thank you for your response , and we will check this video as well. >>> Our requirement is while writing an object into the cluster , if we can >>> provide number of copies to be made , the network consumption between >>> client and cluster will be only for one object write.

Re: [ceph-users] Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2019-08-18 Thread Brad Hubbard

On Thu, Aug 15, 2019 at 2:09 AM Troy Ablan wrote: > > Paul, > > Thanks for the reply. All of these seemed to fail except for pulling > the osdmap from the live cluster. > > -Troy > > -[~:#]- ceph-objectstore-tool --op get-osdmap --data-path > /var/lib/ceph/osd/ceph-45/ --file osdmap45 >

Re: [ceph-users] Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2019-08-18 Thread Brad Hubbard

On Thu, Aug 15, 2019 at 2:09 AM Troy Ablan wrote: > > Paul, > > Thanks for the reply. All of these seemed to fail except for pulling > the osdmap from the live cluster. > > -Troy > > -[~:#]- ceph-objectstore-tool --op get-osdmap --data-path > /var/lib/ceph/osd/ceph-45/ --file osdmap45 >

Re: [ceph-users] Possibly a bug on rocksdb

2019-08-11 Thread Brad Hubbard

Could you create a tracker for this? Also, if you can reproduce this could you gather a log with debug_osd=20 ? That should show us the superblock it was trying to decode as well as additional details. On Mon, Aug 12, 2019 at 6:29 AM huxia...@horebdata.cn wrote: > > Dear folks, > > I had an OSD

Re: [ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space

2019-08-22 Thread Brad Hubbard

https://tracker.ceph.com/issues/41255 is probably reporting the same issue. On Thu, Aug 22, 2019 at 6:31 PM Lars Täuber wrote: > > Hi there! > > We also experience this behaviour of our cluster while it is moving pgs. > > # ceph health detail > HEALTH_ERR 1 MDSs report slow metadata IOs; Reduced

Re: [ceph-users] BlueStore.cc: 11208: ceph_abort_msg("unexpected error")

2019-08-25 Thread Brad Hubbard

https://tracker.ceph.com/issues/38724 On Fri, Aug 23, 2019 at 10:18 PM Paul Emmerich wrote: > > I've seen that before (but never on Nautilus), there's already an > issue at tracker.ceph.com but I don't recall the id or title. > > > Paul > > -- > Paul Emmerich > > Looking for help with your Ceph

Re: [ceph-users] ceph-fuse segfaults in 14.2.2

2019-09-06 Thread Brad Hubbard

On Wed, Sep 4, 2019 at 9:42 PM Andras Pataki wrote: > > Dear ceph users, > > After upgrading our ceph-fuse clients to 14.2.2, we've been seeing sporadic > segfaults with not super revealing stack traces: > > in thread 7fff5a7fc700 thread_name:ceph-fuse > > ceph version 14.2.2

Re: [ceph-users] ZeroDivisionError when running ceph osd status

2019-09-11 Thread Brad Hubbard

On Thu, Sep 12, 2019 at 1:52 AM Benjamin Tayehanpour wrote: > > Greetings! > > I had an OSD down, so I ran ceph osd status and got this: > > [root@ceph1 ~]# ceph osd status > Error EINVAL: Traceback (most recent call last): > File "/usr/lib64/ceph/mgr/status/module.py", line 313, in

Re: [ceph-users] 14.2.2 - OSD Crash

2019-08-06 Thread Brad Hubbard

-63> 2019-08-07 00:51:52.861 7fe987e49700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had suicide timed out after 150 You hit a suicide timeout, that's fatal. On line 80 the process kills the thread based on the assumption it's hung. src/common/HeartbeatMap.cc: 66

Re: [ceph-users] OSD crashed during the fio test

2019-10-01 Thread Brad Hubbard

Removed ceph-de...@vger.kernel.org and added d...@ceph.io On Tue, Oct 1, 2019 at 4:26 PM Alex Litvak wrote: > > Hellow everyone, > > Can you shed the line on the cause of the crash? Could actually client > request trigger it? > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30

Re: [ceph-users] ceph pg repair fails...?

2019-10-01 Thread Brad Hubbard

On Wed, Oct 2, 2019 at 1:15 AM Mattia Belluco wrote: > > Hi Jake, > > I am curious to see if your problem is similar to ours (despite the fact > we are still on Luminous). > > Could you post the output of: > > rados list-inconsistent-obj > > and > > rados list-inconsistent-snapset Make sure

Re: [ceph-users] OSD crashed during the fio test

2019-10-01 Thread Brad Hubbard

9 at 8:03 AM Sasha Litvak > wrote: >> >> It was hardware indeed. Dell server reported a disk being reset with power >> on. Checking the usual suspects i.e. controller firmware, controller event >> log (if I can get one), drive firmware. >> I will report more when I g

Re: [ceph-users] ceph-osd@n crash dumps

2019-10-01 Thread Brad Hubbard

On Tue, Oct 1, 2019 at 10:43 PM Del Monaco, Andrea < andrea.delmon...@atos.net> wrote: > Hi list, > > After the nodes ran OOM and after reboot, we are not able to restart the > ceph-osd@x services anymore. (Details about the setup at the end). > > I am trying to do this manually, so we can see

Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-30 Thread Brad Hubbard

up > ([27,30,38], p27) acting ([30,25], p30) > > I also checked the logs of all OSDs already done and got the same logs > about this object : > * osd.4, last time : 2019-10-10 16:15:20 > * osd.32, last time : 2019-10-14 01:54:56 > * osd.33, last time : 2019-10-11 06:24:01 >

Re: [ceph-users] ceph-objectstore-tool crash when trying to recover pg from OSD

2019-11-07 Thread Brad Hubbard

I'd suggest you open a tracker under the Bluestore component so someone can take a look. I'd also suggest you include a log with 'debug_bluestore=20' added to the COT command line. On Thu, Nov 7, 2019 at 6:56 PM Eugene de Beste wrote: > > Hi, does anyone have any feedback for me regarding this?

Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-28 Thread Brad Hubbard

Yes, try and get the pgs healthy, then you can just re-provision the down OSDs. Run a scrub on each of these pgs and then use the commands on the following page to find out more information for each case. https://docs.ceph.com/docs/luminous/rados/troubleshooting/troubleshooting-pg/ Focus on the

Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-29 Thread Brad Hubbard

On Tue, Oct 29, 2019 at 9:09 PM Jérémy Gardais wrote: > > Thus spake Brad Hubbard (bhubb...@redhat.com) on mardi 29 octobre 2019 à > 08:20:31: > > Yes, try and get the pgs healthy, then you can just re-provision the down > > OSDs. > > > > Run a scrub

Re: [ceph-users] ceph; pg scrub errors

2019-09-24 Thread Brad Hubbard

On Tue, Sep 24, 2019 at 10:51 PM M Ranga Swami Reddy wrote: > > Interestingly - "rados list-inconsistent-obj ${PG} --format=json" not > showing any objects inconsistent-obj. > And also "rados list-missing-obj ${PG} --format=json" also not showing any > missing or unfound objects. Complete a

Re: [ceph-users] lot of inconsistent+failed_repair - failed to pick suitable auth object (14.2.3)

2019-10-10 Thread Brad Hubbard

Does pool 6 have min_size = 1 set? https://tracker.ceph.com/issues/24994#note-5 would possibly be helpful here, depending on what the output of the following command looks like. # rados list-inconsistent-obj [pgid] --format=json-pretty On Thu, Oct 10, 2019 at 8:16 PM Kenneth Waegeman wrote: >

Re: [ceph-users] lot of inconsistent+failed_repair - failed to pick suitable auth object (14.2.3)

2019-10-10 Thread Brad Hubbard

ashpspool stripe_width 0 application cephfs This looked like something min_size 1 could cause, but I guess that's not the cause here. > so inconsistens is empty, which is weird, no ? Try scrubbing the pg just before running the command. > > Thanks again! > > K > > > On 10/10/2019

Re: [ceph-users] Ceph pg repair clone_missing?

2019-10-08 Thread Brad Hubbard

On Fri, Oct 4, 2019 at 6:09 PM Marc Roos wrote: > > > > >Try something like the following on each OSD that holds a copy of > >rbd_data.1f114174b0dc51.0974 and see what output you get. > >Note that you can drop the bluestore flag if they are not bluestore > >osds and you will need

< 1 2 3 4 5 >

301 - 400 of 404 matches

Mail list logo