Re: [ceph-users] Bug in OSD Maps

2017-07-30 Thread Stuart Harland
I know this thread has been silent for a while, however due to various reasons, 
I have been forced to work specifically on this issue this weekend.

As it turns out, you were partly right, the fix for the state is to use 
ceph-objectstore, however it was not to remove the PG in question, rather to 
inject the missing OSD Map Epoch. Once it has the required Epoch, it can 
successfully start the OSD in question and resume its download of OSDmaps 
through the normal mechanism.

As an example, osd id 123 on storage1 with missing epoch 9876:

On A monitor:
  ceph osd getmap 9876 > e9876

SCP (or other mechanism) the file e9876 from monitor to storage1

Then forcibly inject the epoch into the not-running OSD (our system is 
configured with cluster name txc1, as a result your mileage may vary).

  sudo ceph-objectstore-tool --cluster=txc1 --data-path 
/var/lib/ceph/osd/txc1-123 --journal-path /var/lib/ceph/osd/txc1-123/journal 
--op set-osdmap --file /path/to/e9876 --epoch 9876 --force

I wanted to share this nugget of information for posterity, as I can not be the 
only person out there who has run across this and there appears to be limited 
documentation on this (and what documentation of ceph-objectstore-tool there 
is, is slightly inconsistent with the realities of its use). Thanks also to 
Wido for the poke in the right direction elsewhere, as he filled in the missing 
bits.

Regards,

Stuart 


 − Stuart Harland: 
Infrastructure Engineer
Email: s.harl...@livelinktechnology.net 

Tel: +44 (0) 207 183 1411



LiveLink Technology Ltd
McCormack House
56A East Street
Havant
PO9 1BS

IMPORTANT: The information transmitted in this e-mail is intended only for the 
person or entity to whom it is addressed and may contain confidential and/or 
privileged information. If you are not the intended recipient of this message, 
please do not read, copy, use or disclose this communication and notify the 
sender immediately. Any review, retransmission, dissemination or other use of, 
or taking any action in reliance upon this information by persons or entities 
other than the intended recipient is prohibited. Any views or opinions 
presented in this e-mail are solely those of the author and do not necessarily 
represent those of LiveLink. This e-mail message has been checked for the 
presence of computer viruses. However, LiveLink is not able to accept liability 
for any damage caused by this e-mail.



> On 26 May 2017, at 22:53, Gregory Farnum  wrote:
> 
> Yeah, not sure. It might just be that the restarting is newly exposing old 
> issues, but I don't see how. I gather from skimming that ticket that it was a 
> disk state bug earlier on that was going undetected until Jewel, which is why 
> I was wondering about the upgrades.
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bug in OSD Maps

2017-05-29 Thread Vincent Godin
We had similar problem few month ago when migrating from hammer to
jewel. We encountered some old bugs (which were declared closed on
Hammer !!!l). We had some OSDs refusing to start because of lack of pg
map like yours, some others which were completly busy and start
declaring valid OSDs losts => the cluster was flapping. These OSDs
were localized on some hosts only, and these hosts were first in giant
then in hammer version before jewel.Hosts which never knew giant were
OK. So in fact, we ported some old bugs from giant to jewel !!! The
solution was to isolate one by one the hosts which were one day in
Giant and to recreate them with a fresh jewel version. That solved the
problem (but it took a lot of time). I hope this will help you ...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bug in OSD Maps

2017-05-26 Thread Gregory Farnum
On Fri, May 26, 2017 at 3:05 AM Stuart Harland <
s.harl...@livelinktechnology.net> wrote:

> Could you elaborate about what constitutes deleting the PG in this
> instance, is a simple `rm` of the directories with the PG number in current
> sufficient? or does it need some poking of anything else?
>

No, you need to look at how to use the ceph-objectstore tool. Just removing
the directories will leave associated metadata behind in leveldb/rocksdb.


>
> It is conceivable that there is a fault with the disks, they are known to
> be ‘faulty’ in the general sense that they suffer a cliff-edge Perf issue,
> however I’m somewhat confused about why this would suddenly happen in the
> way it has been detected.
>

Yeah, not sure. It might just be that the restarting is newly exposing old
issues, but I don't see how. I gather from skimming that ticket that it was
a disk state bug earlier on that was going undetected until Jewel, which is
why I was wondering about the upgrades.
-Greg


>
> We are past early life failures, most of these disks don’t appear to have
> any significant issues in their smart data to indicate that any write
> failures are occurring, and I haven’t seen this error once until a couple
> of weeks ago (we’ve been operating this cluster over 2 years now).
>
> The only versions I’m seeing running (just double checked) currently are
> 10.2.5,6 and 7. There was one node that had hammer running on it a while
> back, but it’s been running jewel for months now, so I doubt it’s related
> to that.
>
>
>
> On 26 May 2017, at 00:22, Gregory Farnum  wrote:
>
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bug in OSD Maps

2017-05-25 Thread Gregory Farnum
On Thu, May 25, 2017 at 8:39 AM Stuart Harland <
s.harl...@livelinktechnology.net> wrote:

> Has no-one any idea about this? If needed I can produce more information
> or diagnostics on request. I find it hard to believe that we are the only
> people experiencing this, and thus far we have lost about 40 OSDs to
> corruption due to this.
>
> Regards
>
> Stuart Harland
>
>
>
> On 24 May 2017, at 10:32, Stuart Harland 
> wrote:
>
> Hello
>
> I think I’m running into a bug that is described at
> http://tracker.ceph.com/issues/14213 for Hammer.
>
> However I’m running the latest version of Jewel 10.2.7, although I’m in
> the middle of upgrading the cluster (from 10.2.5). At first it was on a
> couple of nodes, but now it seems to be more pervasive.
>
> I have seen this issue with osd_map_cache_size set to 20 as well as 500,
> which I increased to try and compensate for it.
>
> My two questions, are
>
> 1) is this fixed, if so in which version.
>
> The only person who's reported this remotely recently was working on a
FreeBSD port. Other than the one tracker bug you found, errors like this
are usually the result of failing disks, buggy local filesystems, or
incorrect configuration (like turning off barriers).
I assume you didn't just upgrade from a pre-Jewel release that might have
been susceptible to that tracker.



> 2) is there a way to recover the damaged OSD metadata, as I really don’t
> want to keep having to rebuild large numbers of disks based on something
> arbitrary.
>
>
I saw somewhere (check the list archives?) that you may be able to get
around it by removing just the PG which is causing the crash, assuming it
has replicas elsewhere.

But more generally you want to figure out how this is happening. Either
you've got disk state which was previously broken and undetected (which, if
you've been running 10.2.5 on all your OSDs, I don't think is possible), or
you've experienced recent failures which are unlikely Ceph software bugs.
(They might be! But you'd be the only to report them anywhere I can see.)
-Greg


>
>
> SEEK_HOLE is disabled via 'filestore seek data hole' config option
>-31> 2017-05-24 10:23:10.152349 7f24035e2800  0
> genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features:
> splice is s
> upported
>-30> 2017-05-24 10:23:10.182065 7f24035e2800  0
> genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features:
> syncfs(2) s
> yscall fully supported (by glibc and kernel)
>-29> 2017-05-24 10:23:10.182112 7f24035e2800  0
> xfsfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_feature: extsize is
> disab
> led by conf
>-28> 2017-05-24 10:23:10.182839 7f24035e2800  1 leveldb: Recovering log
> #23079
>-27> 2017-05-24 10:23:10.284173 7f24035e2800  1 leveldb: Delete type=0
> #23079
>
>-26> 2017-05-24 10:23:10.284223 7f24035e2800  1 leveldb: Delete type=3
> #23078
>
>-25> 2017-05-24 10:23:10.284807 7f24035e2800  0
> filestore(/var/lib/ceph/osd/txc1-1908) mount: enabling WRITEAHEAD journal
> mode: c
> heckpoint is not enabled
>-24> 2017-05-24 10:23:10.285581 7f24035e2800  2 journal open
> /var/lib/ceph/osd/txc1-1908/journal fsid 8dada68b-0d1c-4f2a-bc96-1d8
> 61577bc98 fs_op_seq 20363902
>-23> 2017-05-24 10:23:10.289523 7f24035e2800  1 journal _open
> /var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
> size 4096 bytes, directio = 1, aio = 1
>-22> 2017-05-24 10:23:10.293733 7f24035e2800  2 journal open advancing
> committed_seq 20363681 to fs op_seq 20363902
>-21> 2017-05-24 10:23:10.293743 7f24035e2800  2 journal read_entry --
> not readable
>-20> 2017-05-24 10:23:10.293744 7f24035e2800  2 journal read_entry --
> not readable
>-19> 2017-05-24 10:23:10.293745 7f24035e2800  3 journal journal_replay:
> end of journal, done.
>-18> 2017-05-24 10:23:10.297605 7f24035e2800  1 journal _open
> /var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
> size 4096 bytes, directio = 1, aio = 1
>-17> 2017-05-24 10:23:10.298470 7f24035e2800  1
> filestore(/var/lib/ceph/osd/txc1-1908) upgrade
>-16> 2017-05-24 10:23:10.298509 7f24035e2800  2 osd.1908 0 boot
>-15> 2017-05-24 10:23:10.300096 7f24035e2800  1 
> cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
>-14> 2017-05-24 10:23:10.300384 7f24035e2800  1 
> cls/user/cls_user.cc:375: Loaded user class!
>-13> 2017-05-24 10:23:10.300617 7f24035e2800  0 
> cls/hello/cls_hello.cc:305: loading cls_hello
>-12> 2017-05-24 10:23:10.303748 7f24035e2800  1 
> cls/refcount/cls_refcount.cc:232: Loaded refcount class!
>-11> 2017-05-24 10:23:10.304120 7f24035e2800  1 
> cls/version/cls_version.cc:228: Loaded version class!
>-10> 2017-05-24 10:23:10.304439 7f24035e2800  1 
> cls/log/cls_log.cc:317: Loaded log class!
> -9> 2017-05-24 10:23:10.307437 7f24035e2800  1 
> cls/rgw/cls_rgw.cc:3359: Loaded rgw class!
> -8> 2017-05-24 10:23:10.307768 7f24035e2800  1 
> 

Re: [ceph-users] Bug in OSD Maps

2017-05-25 Thread Stuart Harland
Has no-one any idea about this? If needed I can produce more information or 
diagnostics on request. I find it hard to believe that we are the only people 
experiencing this, and thus far we have lost about 40 OSDs to corruption due to 
this.

Regards 

Stuart Harland


> On 24 May 2017, at 10:32, Stuart Harland  
> wrote:
> 
> Hello
> 
> I think I’m running into a bug that is described at 
> http://tracker.ceph.com/issues/14213  
> for Hammer.
> 
> However I’m running the latest version of Jewel 10.2.7, although I’m in the 
> middle of upgrading the cluster (from 10.2.5). At first it was on a couple of 
> nodes, but now it seems to be more pervasive.
> 
> I have seen this issue with osd_map_cache_size set to 20 as well as 500, 
> which I increased to try and compensate for it.
> 
> My two questions, are 
> 
> 1) is this fixed, if so in which version.
> 2) is there a way to recover the damaged OSD metadata, as I really don’t want 
> to keep having to rebuild large numbers of disks based on something arbitrary.
> 
> 
> SEEK_HOLE is disabled via 'filestore seek data hole' config option
>-31> 2017-05-24 10:23:10.152349 7f24035e2800  0 
> genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: splice 
> is s
> upported
>-30> 2017-05-24 10:23:10.182065 7f24035e2800  0 
> genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: 
> syncfs(2) s
> yscall fully supported (by glibc and kernel)
>-29> 2017-05-24 10:23:10.182112 7f24035e2800  0 
> xfsfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_feature: extsize is 
> disab
> led by conf
>-28> 2017-05-24 10:23:10.182839 7f24035e2800  1 leveldb: Recovering log 
> #23079
>-27> 2017-05-24 10:23:10.284173 7f24035e2800  1 leveldb: Delete type=0 
> #23079
> 
>-26> 2017-05-24 10:23:10.284223 7f24035e2800  1 leveldb: Delete type=3 
> #23078
> 
>-25> 2017-05-24 10:23:10.284807 7f24035e2800  0 
> filestore(/var/lib/ceph/osd/txc1-1908) mount: enabling WRITEAHEAD journal 
> mode: c
> heckpoint is not enabled
>-24> 2017-05-24 10:23:10.285581 7f24035e2800  2 journal open 
> /var/lib/ceph/osd/txc1-1908/journal fsid 8dada68b-0d1c-4f2a-bc96-1d8
> 61577bc98 fs_op_seq 20363902
>-23> 2017-05-24 10:23:10.289523 7f24035e2800  1 journal _open 
> /var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
> size 4096 bytes, directio = 1, aio = 1
>-22> 2017-05-24 10:23:10.293733 7f24035e2800  2 journal open advancing 
> committed_seq 20363681 to fs op_seq 20363902
>-21> 2017-05-24 10:23:10.293743 7f24035e2800  2 journal read_entry -- not 
> readable
>-20> 2017-05-24 10:23:10.293744 7f24035e2800  2 journal read_entry -- not 
> readable
>-19> 2017-05-24 10:23:10.293745 7f24035e2800  3 journal journal_replay: 
> end of journal, done.
>-18> 2017-05-24 10:23:10.297605 7f24035e2800  1 journal _open 
> /var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
> size 4096 bytes, directio = 1, aio = 1
>-17> 2017-05-24 10:23:10.298470 7f24035e2800  1 
> filestore(/var/lib/ceph/osd/txc1-1908) upgrade
>-16> 2017-05-24 10:23:10.298509 7f24035e2800  2 osd.1908 0 boot
>-15> 2017-05-24 10:23:10.300096 7f24035e2800  1  
> cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
>-14> 2017-05-24 10:23:10.300384 7f24035e2800  1  
> cls/user/cls_user.cc:375: Loaded user class!
>-13> 2017-05-24 10:23:10.300617 7f24035e2800  0  
> cls/hello/cls_hello.cc:305: loading cls_hello
>-12> 2017-05-24 10:23:10.303748 7f24035e2800  1  
> cls/refcount/cls_refcount.cc:232: Loaded refcount class!
>-11> 2017-05-24 10:23:10.304120 7f24035e2800  1  
> cls/version/cls_version.cc:228: Loaded version class!
>-10> 2017-05-24 10:23:10.304439 7f24035e2800  1  
> cls/log/cls_log.cc:317: Loaded log class!
> -9> 2017-05-24 10:23:10.307437 7f24035e2800  1  
> cls/rgw/cls_rgw.cc:3359: Loaded rgw class!
> -8> 2017-05-24 10:23:10.307768 7f24035e2800  1  
> cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class!
> -7> 2017-05-24 10:23:10.307927 7f24035e2800  0  
> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
> -6> 2017-05-24 10:23:10.308086 7f24035e2800  1  
> cls/statelog/cls_statelog.cc:306: Loaded log class!
> -5> 2017-05-24 10:23:10.315241 7f24035e2800  0 osd.1908 863035 crush map 
> has features 2234490552320, adjusting msgr requires for
>  clients
> -4> 2017-05-24 10:23:10.315258 7f24035e2800  0 osd.1908 863035 crush map 
> has features 2234490552320 was 8705, adjusting msgr req
> uires for mons
> -3> 2017-05-24 10:23:10.315267 7f24035e2800  0 osd.1908 863035 crush map 
> has features 2234490552320, adjusting msgr requires for
>  osds
> -2> 2017-05-24 10:23:10.441444 7f24035e2800  0 osd.1908 863035 load_pgs
> -1> 2017-05-24 10:23:10.442608 7f24035e2800 -1 osd.1908 863035 load_pgs: 
> have pgid 11.3f5a at epoch 863078, but missing map.  Crashing.
>  0> 2017-05-24 

[ceph-users] Bug in OSD Maps

2017-05-24 Thread Stuart Harland
Hello

I think I’m running into a bug that is described at 
http://tracker.ceph.com/issues/14213 for Hammer.

However I’m running the latest version of Jewel 10.2.7, although I’m in the 
middle of upgrading the cluster (from 10.2.5). At first it was on a couple of 
nodes, but now it seems to be more pervasive.

I have seen this issue with osd_map_cache_size set to 20 as well as 500, which 
I increased to try and compensate for it.

My two questions, are 

1) is this fixed, if so in which version.
2) is there a way to recover the damaged OSD metadata, as I really don’t want 
to keep having to rebuild large numbers of disks based on something arbitrary.


SEEK_HOLE is disabled via 'filestore seek data hole' config option
   -31> 2017-05-24 10:23:10.152349 7f24035e2800  0 
genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: splice is 
s
upported
   -30> 2017-05-24 10:23:10.182065 7f24035e2800  0 
genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: syncfs(2) 
s
yscall fully supported (by glibc and kernel)
   -29> 2017-05-24 10:23:10.182112 7f24035e2800  0 
xfsfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_feature: extsize is 
disab
led by conf
   -28> 2017-05-24 10:23:10.182839 7f24035e2800  1 leveldb: Recovering log 
#23079
   -27> 2017-05-24 10:23:10.284173 7f24035e2800  1 leveldb: Delete type=0 #23079

   -26> 2017-05-24 10:23:10.284223 7f24035e2800  1 leveldb: Delete type=3 #23078

   -25> 2017-05-24 10:23:10.284807 7f24035e2800  0 
filestore(/var/lib/ceph/osd/txc1-1908) mount: enabling WRITEAHEAD journal mode: 
c
heckpoint is not enabled
   -24> 2017-05-24 10:23:10.285581 7f24035e2800  2 journal open 
/var/lib/ceph/osd/txc1-1908/journal fsid 8dada68b-0d1c-4f2a-bc96-1d8
61577bc98 fs_op_seq 20363902
   -23> 2017-05-24 10:23:10.289523 7f24035e2800  1 journal _open 
/var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
size 4096 bytes, directio = 1, aio = 1
   -22> 2017-05-24 10:23:10.293733 7f24035e2800  2 journal open advancing 
committed_seq 20363681 to fs op_seq 20363902
   -21> 2017-05-24 10:23:10.293743 7f24035e2800  2 journal read_entry -- not 
readable
   -20> 2017-05-24 10:23:10.293744 7f24035e2800  2 journal read_entry -- not 
readable
   -19> 2017-05-24 10:23:10.293745 7f24035e2800  3 journal journal_replay: end 
of journal, done.
   -18> 2017-05-24 10:23:10.297605 7f24035e2800  1 journal _open 
/var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
size 4096 bytes, directio = 1, aio = 1
   -17> 2017-05-24 10:23:10.298470 7f24035e2800  1 
filestore(/var/lib/ceph/osd/txc1-1908) upgrade
   -16> 2017-05-24 10:23:10.298509 7f24035e2800  2 osd.1908 0 boot
   -15> 2017-05-24 10:23:10.300096 7f24035e2800  1  
cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
   -14> 2017-05-24 10:23:10.300384 7f24035e2800  1  
cls/user/cls_user.cc:375: Loaded user class!
   -13> 2017-05-24 10:23:10.300617 7f24035e2800  0  
cls/hello/cls_hello.cc:305: loading cls_hello
   -12> 2017-05-24 10:23:10.303748 7f24035e2800  1  
cls/refcount/cls_refcount.cc:232: Loaded refcount class!
   -11> 2017-05-24 10:23:10.304120 7f24035e2800  1  
cls/version/cls_version.cc:228: Loaded version class!
   -10> 2017-05-24 10:23:10.304439 7f24035e2800  1  
cls/log/cls_log.cc:317: Loaded log class!
-9> 2017-05-24 10:23:10.307437 7f24035e2800  1  
cls/rgw/cls_rgw.cc:3359: Loaded rgw class!
-8> 2017-05-24 10:23:10.307768 7f24035e2800  1  
cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class!
-7> 2017-05-24 10:23:10.307927 7f24035e2800  0  
cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
-6> 2017-05-24 10:23:10.308086 7f24035e2800  1  
cls/statelog/cls_statelog.cc:306: Loaded log class!
-5> 2017-05-24 10:23:10.315241 7f24035e2800  0 osd.1908 863035 crush map 
has features 2234490552320, adjusting msgr requires for
 clients
-4> 2017-05-24 10:23:10.315258 7f24035e2800  0 osd.1908 863035 crush map 
has features 2234490552320 was 8705, adjusting msgr req
uires for mons
-3> 2017-05-24 10:23:10.315267 7f24035e2800  0 osd.1908 863035 crush map 
has features 2234490552320, adjusting msgr requires for
 osds
-2> 2017-05-24 10:23:10.441444 7f24035e2800  0 osd.1908 863035 load_pgs
-1> 2017-05-24 10:23:10.442608 7f24035e2800 -1 osd.1908 863035 load_pgs: 
have pgid 11.3f5a at epoch 863078, but missing map.  Crashing.
 0> 2017-05-24 10:23:10.444151 7f24035e2800 -1 osd/OSD.cc: In function 
'void OSD::load_pgs()' thread 7f24035e2800 time 2017-05-24 10:23:10.442617
osd/OSD.cc: 3189: FAILED assert(0 == "Missing map in load_pgs")

 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) 
[0x55d1874be6db]
 2: (OSD::load_pgs()+0x1f9b) [0x55d186e6a26b]
 3: (OSD::init()+0x1f74) [0x55d186e7aec4]
 4: (main()+0x29d1) [0x55d186de1d71]
 5: (__libc_start_main()+0xf5) [0x7f24004fdf45]
 6: (()+0x356a47) [0x55d186e2aa47]
 NOTE: a copy of the executable, or