Re: [ceph-users] node and its OSDs down...

2016-12-08 Thread Brad Hubbard
ried about it... > > Thanks > Swami > > > On Thu, Dec 8, 2016 at 6:40 AM, Brad Hubbard wrote: > >> >> >> On Wed, Dec 7, 2016 at 9:11 PM, M Ranga Swami Reddy > > wrote: >> >>> That's right.. >>> But, my question was: when an OSD d

Re: [ceph-users] Pgs stuck on undersized+degraded+peered

2016-12-10 Thread Brad Hubbard
On Sun, Dec 11, 2016 at 5:22 AM, fridifree wrote: > The min size was on 3 changing to 1 solve the problem > thanks Please be aware of the previous posts about the dangers of setting min_size=1. > > On Dec 10, 2016 02:06, "Christian Wuerdig" > wrote: >> >> Hi, >> >> it's useful to generally prov

Re: [ceph-users] CephFS FAILED assert(dn->get_linkage()->is_null())

2016-12-10 Thread Brad Hubbard
On Sat, Dec 10, 2016 at 11:50 PM, Sean Redmond wrote: > Hi Goncarlo, > > With the output from "ceph tell mds.0 damage ls" we tracked the inodes of > two damaged directories using 'find /mnt/ceph/ -inum $inode', after > reviewing the paths involved we confirmed a backup was availble for this > da

Re: [ceph-users] Looking for a definition for some undocumented variables

2016-12-12 Thread Brad Hubbard
On Tue, Dec 13, 2016 at 3:56 AM, Jake Young wrote: > Thanks John, > > To partially answer my own question: > > OPTION(osd_recovery_sleep, OPT_FLOAT, 0) // seconds to sleep between > recovery ops > > OPTION(osd_recovery_max_single_start, OPT_U64, 1) > > Funny, in the examples where I've seen osd_re

Re: [ceph-users] Ceph Import Error

2016-12-21 Thread Brad Hubbard
What output do you get from the following? $ strace -eopen ceph 2>&1|grep ceph_argparse On Thu, Dec 22, 2016 at 8:55 AM, Aakanksha Pudipeddi wrote: > Hi John, > > Thanks for your response. Here is what I am setting them to: > > I am installing all binaries in the folder: > ~/src/vanilla-ceph/cep

Re: [ceph-users] Ceph Import Error

2016-12-21 Thread Brad Hubbard
such file or directory) > open("/home/ssd/src/vanilla-ceph/ceph-install/lib/python2.7/site-packages/ceph_argparse.py", > O_RDONLY) = 3 > open("/home/ssd/src/vanilla-ceph/ceph-install/lib/python2.7/site-packages/ceph_argparse.pyc", > O_RDONLY) = -1 ENOENT (No such

Re: [ceph-users] Ceph pg active+clean+inconsistent

2016-12-23 Thread Brad Hubbard
Could you also try this? $ attr -l ./DIR_1/DIR_F/DIR_3/DIR_9/DIR_8/1000187bb70.0009__head_EED893F1__6 Take note of any of ceph._, ceph._@1, ceph._@2, etc. For me on my test cluster it looks like this. $ attr -l dev/osd1/current/0.3_head/benchmark\\udata\\urskikr.localdomain\\u16952\\uobjec

Re: [ceph-users] Recover VM Images from Dead Cluster

2016-12-24 Thread Brad Hubbard
On Sun, Dec 25, 2016 at 3:33 AM, w...@42on.com wrote: > > >> Op 24 dec. 2016 om 17:20 heeft L. Bader het volgende >> geschreven: >> >> Do you have any references on this? >> >> I searched for something like this quite a lot and did not find anything... >> > > No, saw it somewhere on the ML I thi

Re: [ceph-users] osd removal problem

2016-12-29 Thread Brad Hubbard
This was dealt with on ceph-devel mailing list. On Thu, Dec 29, 2016 at 11:08 PM, Łukasz Chrustek wrote: > Hi, > > As I wrote at first mail - I have already done that, but I'm affraid > to load it back - is there any chance, that something will go wrong ? > > Thanks for answer ! > >> Hi, > > >>

Re: [ceph-users] installation docs

2016-12-30 Thread Brad Hubbard
+ceph-devel On Fri, Dec 30, 2016 at 6:02 PM, Manuel Sopena Ballesteros wrote: > Hi, > > > > I just would like to point a couple of issues I have following the > INSTALLATION (QUICK) document. > > > > 1. The order to clean ceph deployment is: > > a. Ceph-deploy purge {ceph-node} [{ceph

Re: [ceph-users] linux kernel version for clients

2016-12-31 Thread Brad Hubbard
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] linux kernel version for clients

2017-01-01 Thread Brad Hubbard
H... my original email got eaten by the big bit bucket in the sky... What it said was you need to look at the ceph_features.h file for kernel/userspace to see the differences. This, for those that can access it, is the latest rhel7.3 version (I imagine the CentOS version would be similar, if

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Brad Hubbard
There is currently a thread about this very issue on the ceph-devel mailing list (check archives for "PG stuck unclean after rebalance-by-weight" in the last few days. Have a read of http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/ and try bumping choose_total_tries up t

Re: [ceph-users] Slow requests

2018-07-09 Thread Brad Hubbard
ernel exhibiting the problem. > > kind regards > > Ben > >> Brad Hubbard hat am 5. Juli 2018 um 01:16 geschrieben: >> >> >> On Wed, Jul 4, 2018 at 6:26 PM, Benjamin Naber >> wrote: >> > Hi @all, >> > >> > im currently in testin

Re: [ceph-users] Jewel PG stuck inconsistent with 3 0-size objects

2018-07-16 Thread Brad Hubbard
Your issue is different since not only do the omap digests of all replicas not match the omap digest from the auth object info but they are all different to each other. What is min_size of pool 67 and what can you tell us about the events leading up to this? On Mon, Jul 16, 2018 at 7:06 PM, Matth

Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)

2018-07-17 Thread Brad Hubbard
On Wed, Jul 18, 2018 at 2:57 AM, Troy Ablan wrote: > I was on 12.2.5 for a couple weeks and started randomly seeing > corruption, moved to 12.2.6 via yum update on Sunday, and all hell broke > loose. I panicked and moved to Mimic, and when that didn't solve the > problem, only then did I start to

Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)

2018-07-18 Thread Brad Hubbard
On Thu, Jul 19, 2018 at 2:48 AM, Troy Ablan wrote: > > > On 07/17/2018 11:14 PM, Brad Hubbard wrote: >> >> On Wed, Jul 18, 2018 at 2:57 AM, Troy Ablan wrote: >>> >>> I was on 12.2.5 for a couple weeks and started randomly seeing >>> corruption, m

Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)

2018-07-18 Thread Brad Hubbard
On Thu, Jul 19, 2018 at 12:47 PM, Troy Ablan wrote: > > > On 07/18/2018 06:37 PM, Brad Hubbard wrote: >> On Thu, Jul 19, 2018 at 2:48 AM, Troy Ablan wrote: >>> >>> >>> On 07/17/2018 11:14 PM, Brad Hubbard wrote: >>>> >>>> On Wed

Re: [ceph-users] Omap warning in 12.2.6

2018-07-19 Thread Brad Hubbard
Search the cluster log for 'Large omap object found' for more details. On Fri, Jul 20, 2018 at 5:13 AM, Brent Kennedy wrote: > I just upgraded our cluster to 12.2.6 and now I see this warning about 1 > large omap object. I looked and it seems this warning was just added in > 12.2.6. I found a f

Re: [ceph-users] active+clean+inconsistent PGs after upgrade to 12.2.7

2018-07-19 Thread Brad Hubbard
I've updated the tracker. On Thu, Jul 19, 2018 at 7:51 PM, Robert Sander wrote: > On 19.07.2018 11:15, Ronny Aasen wrote: > >> Did you upgrade from 12.2.5 or 12.2.6 ? > > Yes. > >> sounds like you hit the reason for the 12.2.7 release >> >> read : https://ceph.com/releases/12-2-7-luminous-release

Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Brad Hubbard
Ceph doesn't shut down systems as in kill or reboot the box if that's what you're saying? On Mon, Jul 23, 2018 at 5:04 PM, Nicolas Huillard wrote: > Le lundi 23 juillet 2018 à 11:07 +0700, Konstantin Shalygin a écrit : >> > I even have no fancy kernel or device, just real standard Debian. >> > Th

Re: [ceph-users] OMAP warning ( again )

2018-07-31 Thread Brad Hubbard
Search the cluster log for 'Large omap object found' for more details. On Wed, Aug 1, 2018 at 3:50 AM, Brent Kennedy wrote: > Upgraded from 12.2.5 to 12.2.6, got a “1 large omap objects” warning > message, then upgraded to 12.2.7 and the message went away. I just added > four OSDs to balance out

Re: [ceph-users] OMAP warning ( again )

2018-08-01 Thread Brad Hubbard
> "swift_versioning": "false", > "swift_ver_location": "", > "index_type": 0, > "mdsearch_config": [], > "reshard_status": 0, > "new_b

Re: [ceph-users] fyi: Luminous 12.2.7 pulled wrong osd disk, resulted in node down

2018-08-01 Thread Brad Hubbard
On Wed, Aug 1, 2018 at 10:38 PM, Marc Roos wrote: > > > Today we pulled the wrong disk from a ceph node. And that made the whole > node go down/be unresponsive. Even to a simple ping. I cannot find to > much about this in the log files. But I expect that the > /usr/bin/ceph-osd process caused a ke

Re: [ceph-users] Luminous OSD crashes every few seconds: FAILED assert(0 == "past_interval end mismatch")

2018-08-01 Thread Brad Hubbard
What is the status of the cluster with this osd down and out? On Thu, Aug 2, 2018 at 5:42 AM, J David wrote: > Hello all, > > On Luminous 12.2.7, during the course of recovering from a failed OSD, > one of the other OSDs started repeatedly crashing every few seconds > with an assertion failure: >

Re: [ceph-users] Luminous OSD crashes every few seconds: FAILED assert(0 == "past_interval end mismatch")

2018-08-01 Thread Brad Hubbard
If you don't already know why, you should investigate why your cluster could not recover after the loss of a single osd. Your solution seems valid given your description. On Thu, Aug 2, 2018 at 12:15 PM, J David wrote: > On Wed, Aug 1, 2018 at 9:53 PM, Brad Hubbard wrote: >&g

Re: [ceph-users] Bluestore OSD Segfaults (12.2.5/12.2.7)

2018-08-07 Thread Brad Hubbard
Looks like https://tracker.ceph.com/issues/21826 which is a dup of https://tracker.ceph.com/issues/20557 On Wed, Aug 8, 2018 at 1:49 AM, Thomas White wrote: > Hi all, > > We have recently begun switching over to Bluestore on our Ceph cluster, > currently on 12.2.7. We first began encountering se

Re: [ceph-users] OSD had suicide timed out

2018-08-07 Thread Brad Hubbard
Try to work out why the other osds are saying this one is down. Is it because this osd is too busy to respond or something else. debug_ms = 1 will show you some message debugging which may help. On Tue, Aug 7, 2018 at 10:34 PM, Josef Zelenka wrote: > To follow up, I did some further digging with

Re: [ceph-users] OSD had suicide timed out

2018-08-08 Thread Brad Hubbard
ealthy > 'OSD::peering_tp thread 0x7fe03f52f700' had suicide timed out after 150 > 0> 2018-08-08 09:14:00.970742 7fe03f52f700 -1 *** Caught signal > (Aborted) ** > > > Could it be that the suiciding OSDs are rejecting the ping somehow? I'm > quite confused

Re: [ceph-users] OSD had suicide timed out

2018-08-08 Thread Brad Hubbard
Do you see "internal heartbeat not healthy" messages in the log of the osd that suicides? On Wed, Aug 8, 2018 at 5:45 PM, Brad Hubbard wrote: > What is the load like on the osd host at the time and what does the > disk utilization look like? > > Also, what does the transact

Re: [ceph-users] OSD had suicide timed out

2018-08-08 Thread Brad Hubbard
#x27;34485 mlcod 13572'34485 > active+clean] publish_stats_to_osd 13593:2966970 > 2018-08-08 10:45:33.022697 7effb95a4700 1 -- 10.12.125.1:6803/1319081 <== > osd.13 10.12.125.3:0/735946 22 osd_ping(ping e13589 stamp 2018-08-08 > 10:45:33.021217) v4 2004+0+0 (3639738084

Re: [ceph-users] [Jewel 10.2.11] OSD Segmentation fault

2018-08-13 Thread Brad Hubbard
Jewel is almost EOL. It looks similar to several related issues, one of which is http://tracker.ceph.com/issues/21826 On Mon, Aug 13, 2018 at 9:19 PM, Alexandru Cucu wrote: > Hi, > > Already tried zapping the disk. Unfortunaltely the same segfaults keep > me from adding the OSD back to the clust

Re: [ceph-users] what is Implicated osds

2018-08-20 Thread Brad Hubbard
On Tue, Aug 21, 2018 at 2:37 AM, Satish Patel wrote: > Folks, > > Today i found ceph -s is really slow and just hanging for minute or 2 > minute to give me output also same with "ceph osd tree" output, > command just hanging long time to give me output.. > > This is what i am seeing output, one OS

Re: [ceph-users] [RGWRados]librados: Objecter returned from getxattrs r=-36

2018-09-19 Thread Brad Hubbard
Are you using filestore or bluestore on the OSDs? If filestore what is the underlying filesystem? You could try setting debug_osd and debug_filestore to 20 and see if that gives some more info? On Wed, Sep 19, 2018 at 12:36 PM fatkun chan wrote: > > > ceph version 12.2.5 (cad919881333ac9227417158

Re: [ceph-users] PG inconsistent, "pg repair" not working

2018-09-25 Thread Brad Hubbard
On Tue, Sep 25, 2018 at 7:50 PM Sergey Malinin wrote: > > # rados list-inconsistent-obj 1.92 > {"epoch":519,"inconsistents":[]} It's likely the epoch has changed since the last scrub and you'll need to run another scrub to repopulate this data. > &

Re: [ceph-users] OSDs crashing

2018-09-25 Thread Brad Hubbard
On Tue, Sep 25, 2018 at 11:31 PM Josh Haft wrote: > > Hi cephers, > > I have a cluster of 7 storage nodes with 12 drives each and the OSD > processes are regularly crashing. All 84 have crashed at least once in > the past two days. Cluster is Luminous 12.2.2 on CentOS 7.4.1708, > kernel version 3.

Re: [ceph-users] PG_DAMAGED Possible data damage: 1 pg inconsistent

2018-02-21 Thread Brad Hubbard
On Wed, Feb 21, 2018 at 6:40 PM, Yoann Moulin wrote: > Hello, > > I migrated my cluster from jewel to luminous 3 weeks ago (using ceph-ansible > playbook), a few days after, ceph status told me "PG_DAMAGED > Possible data damage: 1 pg inconsistent", I tried to repair the PG without > success, I

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-05 Thread Brad Hubbard
On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev wrote: > On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote: >> Blocked requests and slow requests are synonyms in ceph. They are 2 names >> for the exact same thing. >> >> >> On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev >> wrote: >>> >>> On Thu,

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-03-06 Thread Brad Hubbard
On Tue, Mar 6, 2018 at 5:26 PM, Marco Baldini - H.S. Amiata < mbald...@hsamiata.it> wrote: > Hi > > I monitor dmesg in each of the 3 nodes, no hardware issue reported. And > the problem happens with various different OSDs in different nodes, for me > it is clear it's not an hardware problem. > If

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-03-06 Thread Brad Hubbard
debug_osd that is... :) On Tue, Mar 6, 2018 at 7:10 PM, Brad Hubbard wrote: > > > On Tue, Mar 6, 2018 at 5:26 PM, Marco Baldini - H.S. Amiata < > mbald...@hsamiata.it> wrote: > >> Hi >> >> I monitor dmesg in each of the 3 nodes, no hardware issue re

Re: [ceph-users] pg inconsistent

2018-03-07 Thread Brad Hubbard
On Thu, Mar 8, 2018 at 1:22 AM, Harald Staub wrote: > "ceph pg repair" leads to: > 5.7bd repair 2 errors, 0 fixed > > Only an empty list from: > rados list-inconsistent-obj 5.7bd --format=json-pretty > > Inspired by http://tracker.ceph.com/issues/12577 , I tried again with more > verbose logging a

Re: [ceph-users] /var/lib/ceph/osd/ceph-xxx/current/meta shows "Structure needs cleaning"

2018-03-08 Thread Brad Hubbard
On Thu, Mar 8, 2018 at 5:01 PM, 赵贺东 wrote: > Hi All, > > Every time after we activate osd, we got “Structure needs cleaning” in > /var/lib/ceph/osd/ceph-xxx/current/meta. > > > /var/lib/ceph/osd/ceph-xxx/current/meta > # ls -l > ls: reading directory .: Structure needs cleaning > total 0 > > Coul

Re: [ceph-users] /var/lib/ceph/osd/ceph-xxx/current/meta shows "Structure needs cleaning"

2018-03-08 Thread Brad Hubbard
On Thu, Mar 8, 2018 at 7:33 PM, 赵赵贺东 wrote: > Hi Brad, > > Thank you for your attention. > >> 在 2018年3月8日,下午4:47,Brad Hubbard 写道: >> >> On Thu, Mar 8, 2018 at 5:01 PM, 赵贺东 wrote: >>> Hi All, >>> >>> Every time after we activate osd, we

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-08 Thread Brad Hubbard
On Fri, Mar 9, 2018 at 3:54 AM, Subhachandra Chandra wrote: > I noticed a similar crash too. Unfortunately, I did not get much info in the > logs. > > *** Caught signal (Segmentation fault) ** > > Mar 07 17:58:26 data7 ceph-osd-run.sh[796380]: in thread 7f63a0a97700 > thread_name:safe_timer > >

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread Brad Hubbard
See the thread in this very ML titled "Ceph iSCSI is a prank?", last update thirteen days ago. If your questions are not answered by that thread let us know. Please also remember that CentOS is not the only platform that ceph runs on by a long shot and that not all distros lag as much as it (not

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-27 Thread Brad Hubbard
"NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this." Have you ever wondered what this means and why it's there? :) This is at least something you can try. it may provide useful information, it may not. This stack looks like it is either corrupted, or possibly not in

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-27 Thread Brad Hubbard
for me so I'll take a look at this tomorrow. Thanks! > > http://tracker.ceph.com/issues/23431 > > Maybe Oliver has something to add as well. > > > Dietmar > > > On 03/27/2018 11:37 AM, Brad Hubbard wrote: >> "NOTE: a copy of the executable, or `objdum

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread Brad Hubbard
ught to be correct and it probably will be correct again in the near future and, if not, we can review and correct it as necessary. > There is something confused about what the documentation minimal > requirements, the dashboard suggest to be able to do, and what i read > around about modded Ceph for ot

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-27 Thread Brad Hubbard
On Tue, Mar 27, 2018 at 9:46 PM, Brad Hubbard wrote: > > > On Tue, Mar 27, 2018 at 9:12 PM, Max Cuttins wrote: > >> Hi Brad, >> >> that post was mine. I knew it quite well. >> > That Post was about confirm the fact that minimum requirements writte

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-28 Thread Brad Hubbard
On Wed, Mar 28, 2018 at 6:53 PM, Max Cuttins wrote: > Il 27/03/2018 13:46, Brad Hubbard ha scritto: > > > > On Tue, Mar 27, 2018 at 9:12 PM, Max Cuttins wrote: >> >> Hi Brad, >> >> that post was mine. I knew it quite well. >> >> That Post w

Re: [ceph-users] 1 mon unable to join the quorum

2018-03-28 Thread Brad Hubbard
Can you update with the result of the following commands from all of the MONs? # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok mon_status # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok quorum_status On Thu, Mar 29, 2018 at 3:11 PM, Gauvain Pocentek wrote: > Hello Ceph

Re: [ceph-users] 1 mon unable to join the quorum

2018-03-29 Thread Brad Hubbard
uot;: 2, "name": "controller03", "addr": "172.18.8.7:6789\/0" } ] } } In the monmaps we are called 'controller02', not 'mon.controller02'. These names need to be identical. On Thu,

Re: [ceph-users] 1 mon unable to join the quorum

2018-03-30 Thread Brad Hubbard
I'm not sure I completely understand your "test". What exactly are you trying to achieve and what documentation are you following? On Fri, Mar 30, 2018 at 10:49 PM, Julien Lavesque wrote: > Brad, > > Thanks for your answer > > On 30/03/2018 02:09, Brad Hubbard

Re: [ceph-users] 1 mon unable to join the quorum

2018-04-04 Thread Brad Hubbard
ewel/rados/operations/add-or-rm-mons/ > (with id controller02) > > The logs provided are when the controller02 was added with the manual > method. > > But the controller02 won't join the cluster > > Hope It helps understand > > > > On 31/03/2018 02:12, Bra

Re: [ceph-users] Open-sourcing GRNET's Ceph-related tooling

2018-05-11 Thread Brad Hubbard
+ceph-devel On Wed, May 9, 2018 at 10:00 PM, Nikos Kormpakis wrote: > Hello, > > I'm happy to announce that GRNET [1] is open-sourcing its Ceph-related > tooling on GitHub [2]. This repo includes multiple monitoring health > checks compatible with Luminous and tooling in order deploy quickly our

Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-16 Thread Brad Hubbard
On Wed, May 16, 2018 at 6:16 PM, Uwe Sauter wrote: > Hi folks, > > I'm currently chewing on an issue regarding "slow requests are blocked". I'd > like to identify the OSD that is causing those events > once the cluster is back to HEALTH_OK (as I have no monitoring yet that would > get this info

Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-17 Thread Brad Hubbard
On Thu, May 17, 2018 at 4:16 PM, Uwe Sauter wrote: > Hi, > >>> I'm currently chewing on an issue regarding "slow requests are blocked". >>> I'd like to identify the OSD that is causing those events >>> once the cluster is back to HEALTH_OK (as I have no monitoring yet that >>> would get this info

Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-18 Thread Brad Hubbard
On Thu, May 17, 2018 at 6:06 PM, Uwe Sauter wrote: > Brad, > > thanks for the bug report. This is exactly the problem I am having (log-wise). You don't give any indication what version you are running but see https://tracker.ceph.com/issues/23205 >>> >>> >>> the cluster is an Proxmo

Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-19 Thread Brad Hubbard
On Sat, May 19, 2018 at 5:01 PM, Uwe Sauter wrote: > The mistery is that these blocked requests occur numerously when at > least > one of the 6 servers is booted with kernel 4.15.17, if all are running > 4.13.16 the number of blocked requests is infrequent and low. S

Re: [ceph-users] how to build libradosstriper

2018-05-29 Thread Brad Hubbard
On Wed, May 30, 2018 at 10:42 AM, Jialin Liu wrote: > Hi, > I'm trying to use the libradosstriper api, but having some trouble in > linking to lradosstriper. I copied only the `required' libraries from an > pre-installed ceph (10.2.10), and put them under my local directory > /rados_install/lib an

Re: [ceph-users] how to build libradosstriper

2018-05-29 Thread Brad Hubbard
exports that symbol so finding the library that exports the symbols in the error message should resolve the error. > > > I also found this thread: http://tracker.ceph.com/issues/14788 > which looks similar to the error I run into, and that thread mentioned the > version between

Re: [ceph-users] inconsistent pgs :- stat mismatch in whiteouts

2018-06-01 Thread Brad Hubbard
On Fri, Jun 1, 2018 at 6:41 PM, shrey chauhan wrote: > Hi, > > I keep getting inconsistent placement groups and every time its the > whiteout. > > > cluster [ERR] 9.f repair stat mismatch, got 1563/1563 objects, 0/0 clones, > 1551/1551 dirty, 78/78 omap, 0/0 pinned, 12/12 hit_set_archive, 0/-9 > w

[ceph-users] Fwd: inconsistent pgs :- stat mismatch in whiteouts

2018-06-01 Thread Brad Hubbard
-- Forwarded message -- From: Brad Hubbard Date: Fri, Jun 1, 2018 at 9:24 PM Subject: Re: [ceph-users] inconsistent pgs :- stat mismatch in whiteouts To: shrey chauhan Cc: ceph-users Too late for me today. If you send your reply to the list someone else may provide an answer

Re: [ceph-users] whiteouts mismatch

2018-06-05 Thread Brad Hubbard
On Tue, Jun 5, 2018 at 4:46 PM, shrey chauhan wrote: > I am consistently getting whiteout mismatches due to which pgs are going in > inconsistent state, and I am not able to figure out why is this happening? > though as it was explained before that whiteouts dont exist and its nothing, > its still

Re: [ceph-users] Installing iSCSI support

2018-06-11 Thread Brad Hubbard
I'm afraid the answer currently is http://tracker.ceph.com/issues/22143 On Mon, Jun 11, 2018 at 8:08 PM, Max Cuttins wrote: > > Really? :) > So in this huge-big-mailing list have never installed iSCSI and get these > errors before me. > Wow sounds like I'm a pioneer here. > > The installation gui

Re: [ceph-users] ceph pg dump

2018-06-14 Thread Brad Hubbard
Try this and pay careful attention to the IPs and ports in use. Then you can make sure there are no connectivity issues. # ceph -s --debug_ms 20 On Fri, Jun 15, 2018 at 3:31 AM, Ranjan Ghosh wrote: > Hi all, > > we have two small clusters (3 nodes each) called alpha and beta. One node > (alpha0/

Re: [ceph-users] Frequent slow requests

2018-06-14 Thread Brad Hubbard
Turn up debug logging, at least debug_osd 20, and search for the operation in the osd logs. On Thu, Jun 14, 2018 at 5:38 PM, Frank (lists) wrote: > Hi, > > On a small cluster (3 nodes) I frequently have slow requests. When dumping > the inflight ops from the hanging OSD, it seems it doesn't get a

Re: [ceph-users] fixing unrepairable inconsistent PG

2018-06-19 Thread Brad Hubbard
Can you post the output of a pg query? On Tue, Jun 19, 2018 at 11:44 PM, Andrei Mikhailovsky wrote: > A quick update on my issue. I have noticed that while I was trying to move > the problem object on osds, the file attributes got lost on one of the osds, > which is I guess why the error messages

Re: [ceph-users] fixing unrepairable inconsistent PG

2018-06-21 Thread Brad Hubbard
problem getting command descriptions from pg.18.2 > > Cheers > > > > ----- Original Message - >> From: "Brad Hubbard" >> To: "andrei" >> Cc: "ceph-users" >> Sent: Wednesday, 20 June, 2018 00:02:07 >> Subject: Re: [cep

Re: [ceph-users] Ceph Mimic on CentOS 7.5 dependency issue (liboath)

2018-06-23 Thread Brad Hubbard
As Brian pointed out # yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm On Sun, Jun 24, 2018 at 2:46 PM, Michael Kuriger wrote: > CentOS 7.5 is pretty new. Have you tried CentOS 7.4? > > Mike Kuriger > Sr. Unix Systems Engineer > > > > -Original Mess

Re: [ceph-users] fixing unrepairable inconsistent PG

2018-06-24 Thread Brad Hubbard
a43700 10 In get_auth_session_handler for > protocol 2 > 2018-06-22 10:47:27.678417 7f70eda45700 10 cephx client: build_authorizer for > service osd > 2018-06-22 10:47:27.678914 7f70eda45700 10 In get_auth_session_handler for > protocol 2 > 2018-06-22 10:47:27.679003 7f70eda45700 10

Re: [ceph-users] fixing unrepairable inconsistent PG

2018-06-25 Thread Brad Hubbard
2240127b0 > 2018-06-25 10:59:12.112538 7fe244b28700 5 -- 192.168.168.201:0/3046734987 > shutdown_connections mark down 192.168.168.201:6789/0 0x7fe24017a420 > 2018-06-25 10:59:12.112543 7fe244b28700 5 -- 192.168.168.201:0/3046734987 > shutdown_connections mark down 192.168.168

Re: [ceph-users] fixing unrepairable inconsistent PG

2018-06-26 Thread Brad Hubbard
caps: [osd] allow rwx > client.radosgw2.gateway > caps: [mgr] allow r > caps: [mon] allow rw > caps: [osd] allow rwx > client.ssdcs > caps: [mgr] allow r > caps: [mon] allow r > caps: [osd] allow class-read object_prefix r

Re: [ceph-users] fixing unrepairable inconsistent PG

2018-06-27 Thread Brad Hubbard
uot;key" : "", >"oid" : ".dir.default.80018061.2", >"namespace" : "", >"snapid" : -2, >"max" : 0 > }, > "truncate_size" : 0, > &qu

Re: [ceph-users] fixing unrepairable inconsistent PG

2018-06-28 Thread Brad Hubbard
you provide from the time leading up to when the issue was first seen? > > Cheers > > Andrei > - Original Message - >> From: "Brad Hubbard" >> To: "Andrei Mikhailovsky" >> Cc: "ceph-users" >> Sent: Thursday, 28 June, 201

Re: [ceph-users] Slow requests

2018-07-04 Thread Brad Hubbard
On Wed, Jul 4, 2018 at 6:26 PM, Benjamin Naber wrote: > Hi @all, > > im currently in testing for setup an production environment based on the > following OSD Nodes: > > CEPH Version: luminous 12.2.5 > > 5x OSD Nodes with following specs: > > - 8 Core Intel Xeon 2,0 GHZ > > - 96GB Ram > > - 10x 1,

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-11 Thread Brad Hubbard
Your current problem has nothing to do with clients and neither does choose_total_tries. Try setting just this value to 100 and see if your situation improves. Ultimately you need to take a good look at your cluster configuration and how your crush map is configured to deal with that configuratio

Re: [ceph-users] slow requests break performance

2017-01-11 Thread Brad Hubbard
On Thu, Jan 12, 2017 at 2:19 AM, Eugen Block wrote: > Hi, > > I simply grepped for "slow request" in ceph.log. What exactly do you mean by > "effective OSD"? > > If I have this log line: > 2017-01-11 [...] osd.16 [...] cluster [WRN] slow request 32.868141 seconds > old, received at 2017-01-11 [...

Re: [ceph-users] slow requests break performance

2017-01-12 Thread Brad Hubbard
or rw locks", I truncated the output. > Based on the message "waiting for subops from 9,15" I also dumped the > historic_ops for these two OSDs. > > Duration on OSD.9 > > "initiated_at": "2017-01-12 10:38:29.258221", >

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-13 Thread Brad Hubbard
Want to install debuginfo packages and use something like this to try and find out where it is spending most of its time? https://poormansprofiler.org/ Note that you may need to do multiple runs to get a "feel" for where it is spending most of its time. Also not that likely only one or two thread

Re: [ceph-users] Cannot shutdown monitors

2017-02-10 Thread Brad Hubbard
That looks like dmesg output from the libceph kernel module. Do you have the libceph kernel module loaded? If the answer to that question is "yes" the follow-up question is "Why?" as it is not required for a MON or OSD host. On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen wrote: > Yeah, all th

Re: [ceph-users] Cannot shutdown monitors

2017-02-10 Thread Brad Hubbard
tep. Any > further info I can give to help? > > > > On Fri, Feb 10, 2017 at 8:42 PM, Michael Andersen > wrote: >> >> Sorry this email arrived out of order. I will do the modprobe -r test >> >> On Fri, Feb 10, 2017 at 8:20 PM, Brad Hubbard wrote: >>>

Re: [ceph-users] Cannot shutdown monitors

2017-02-10 Thread Brad Hubbard
On Sat, Feb 11, 2017 at 2:58 PM, Brad Hubbard wrote: > Just making sure the list sees this for those that are following. > > On Sat, Feb 11, 2017 at 2:49 PM, Michael Andersen > wrote: >> Right, so yes libceph is loaded >> >> root@compound-7:~# l

Re: [ceph-users] OSD Repeated Failure

2017-02-10 Thread Brad Hubbard
On Sat, Feb 11, 2017 at 2:51 PM, Ashley Merrick wrote: > Hello, > > > > I have a particular OSD (53), which at random will crash with the OSD > process stopping. > > > > OS: Debian 8.x > > CEPH : ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) > > > > From the logs at the time of th

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-02-13 Thread Brad Hubbard
as 300% of that. I > have 3 AIO nodes, and only one of them seemed to be affected. > > On Sat, Jan 14, 2017 at 12:18 AM, Brad Hubbard wrote: >> >> Want to install debuginfo packages and use something like this to try >> and find out where it is spending most of its time? &

Re: [ceph-users] PG stuck peering after host reboot

2017-02-13 Thread Brad Hubbard
t; On 11/02/2017, 00:40, "Brad Hubbard" wrote: > > On Thu, Feb 9, 2017 at 3:36 AM, wrote: > > Hi Corentin, > > > > I've tried that, the primary hangs when trying to injectargs so I set > the option in the config file and restarted all OSDs in

Re: [ceph-users] After upgrading from 0.94.9 to Jewel 10.2.5 on Ubuntu 14.04 OSDs fail to start with a crash dump

2017-02-13 Thread Brad Hubbard
Capture a log with debug_osd at 30 (yes, that's correct, 30) and see if that sheds more light on the issue. On Tue, Feb 14, 2017 at 6:53 AM, Alfredo Colangelo wrote: > Hi Ceph experts, > > after updating from ceph 0.94.9 to ceph 10.2.5 on Ubuntu 14.04, 2 out of 3 > osd processes are unable to sta

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-02-20 Thread Brad Hubbard
d go ahead and open one. >> >> Cheers, >> John >> >> > Thanks, >> > Muthu >> > >> > On 14 February 2017 at 03:13, Brad Hubbard wrote: >> >> >> >> Could one of the reporters open a tracker for this issue and at

Re: [ceph-users] Authentication error CEPH installation

2017-02-23 Thread Brad Hubbard
You need ceph.client.admin.keyring in /etc/ceph/ On Thu, Feb 23, 2017 at 8:13 PM, Chaitanya Ravuri wrote: > Hi Team, > > I have recently deployed a new CEPH cluster for OEL6 boxes for my testing. I > am getting below error on the admin host. not sure how can i fix it. > > 2017-02-23 02:13:04.1663

Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
On Thu, Feb 23, 2017 at 5:18 PM, Schlacta, Christ wrote: > So I updated suse leap, and now I'm getting the following error from > ceph. I know I need to disable some features, but I'm not sure what > they are.. Looks like 14, 57, and 59, but I can't figure out what > they correspond to, nor ther

Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
"has_v3_rules": 0, > "has_v4_buckets": 0, > "require_feature_tunables5": 1, I suspect setting the above to 0 would resolve the issue with the client but there may be a reason why this is set? Where did those packages come from? > "has_v5_r

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
They're from the suse leap ceph team. They maintain ceph, and build >> up to date versions for suse leap. What I don't know is how to >> disable it. When I try, I get the following mess: >> >> aarcane@densetsu:/etc/target$ ceph --cluster rk

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
Did you dump out the crushmap and look? On Fri, Feb 24, 2017 at 1:36 PM, Schlacta, Christ wrote: > insofar as I can tell, yes. Everything indicates that they are in effect. > > On Thu, Feb 23, 2017 at 7:14 PM, Brad Hubbard wrote: >> Is your change reflected in the current cru

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
ruleset 0 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > > # end crush map > > On Thu, Feb 23, 2017 at 7:37 PM, Brad Hubbard wrote: >

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
Kefu has just pointed out that this has the hallmarks of https://github.com/ceph/ceph/pull/13275 On Fri, Feb 24, 2017 at 3:00 PM, Brad Hubbard wrote: > Hmm, > > What's interesting is the feature set reported by the servers has only > changed from > > e0106b84a846a42 > &

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
issues/18842 > > On Feb 23, 2017 21:06, "Brad Hubbard" wrote: >> >> Kefu has just pointed out that this has the hallmarks of >> https://github.com/ceph/ceph/pull/13275 >> >> On Fri, Feb 24, 2017 at 3:00 PM, Brad Hubbard wrote: >> > Hmm, &g

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-25 Thread Brad Hubbard
This fix is now merged into the kraken branch. On Sat, Feb 25, 2017 at 12:00 AM, David Disseldorp wrote: > Hi, > > On Thu, 23 Feb 2017 21:07:41 -0800, Schlacta, Christ wrote: > >> So hopefully when the suse ceph team get 11.2 released it should fix this, >> yes? > > Please raise a bug at bugzilla

Re: [ceph-users] pgs stuck inactive

2017-03-09 Thread Brad Hubbard
Can you explain more about what happened? The query shows progress is blocked by the following OSDs. "blocked_by": [ 14, 17, 51, 58, 63, 64,

Re: [ceph-users] pgs stuck inactive

2017-03-10 Thread Brad Hubbard
"num_objects_omap": 0, > "num_objects_hit_set_archive": 0, > "num_bytes_hit_set_archive": 0 > }, > "up": [ > 28, >

Re: [ceph-users] pgs stuck inactive

2017-03-10 Thread Brad Hubbard
commands. > I got some extra info about the network problem. A faulty network device has > flooded the network eating up all the bandwidth so the OSDs were not able to > properly communicate with each other. This has lasted for almost 1 day. > > Thank you, > Laszlo > > > >

<    1   2   3   4   5   >