[ceph-users] can't start OSD

2017-11-10 Thread Rémi BUISSON

Hello,

I have some issues to restart down OSDs.

My cluster is running on debian stretch (with backported kernel 4.13.0) 
with luminous version (12.2.0).


An admin changed the fsid and did restart the OSDs of one machine. I 
don't know if it can be the cause of all of this but my cluster is in 
HEALTH_ERR and some PG are down or inactive. Now the good config is back 
but some OSDs of my cluster (on other machines too) can't start.


Here is the health detail:

HEALTH_ERR 2282635/254779209 objects misplaced (0.896%); Reduced data 
availability: 3 pgs inactive, 1 pg down; Degraded data redundancy: 
2837613/254779209 objects degraded (1.114%), 93 pgs unclean, 70 pgs 
degraded, 64 pgs undersized; 4017 stuck requests are blocked > 4096 sec

OBJECT_MISPLACED 2282635/254779209 objects misplaced (0.896%)
PG_AVAILABILITY Reduced data availability: 3 pgs inactive, 1 pg down
    pg 14.12a is down, acting [28,13,19]
    pg 14.15d is stuck inactive for 5344.345563, current state unknown, 
last acting []
    pg 14.1d7 is stuck inactive for 4306.284248, current state 
undersized+degraded+remapped+backfilling+peered, last acting [13]
PG_DEGRADED Degraded data redundancy: 2837613/254779209 objects degraded 
(1.114%), 93 pgs unclean, 70 pgs degraded, 64 pgs undersized
    pg 10.3 is stuck unclean for 5483.175862, current state 
active+remapped+backfill_wait, last acting [35,44,30]

    pg 10.1f is active+recovery_wait+degraded, acting [56,8,52]
    pg 14.0 is stuck undersized for 6003.911469, current state 
active+undersized+degraded+remapped+backfilling, last acting [13,42]
    pg 14.21 is stuck undersized for 437.855288, current state 
active+undersized+degraded+remapped+backfilling, last acting [40,59]
    pg 14.2b is stuck unclean for 123.787607, current state 
active+remapped+backfill_wait, last acting [62,30,24]
    pg 14.4a is stuck undersized for 723.893114, current state 
active+undersized+degraded+remapped+backfill_wait, last acting [43,22]
    pg 14.56 is stuck unclean for 123.821351, current state 
active+remapped+backfill_wait, last acting [56,43,63]
    pg 14.1fe is stuck undersized for 123.800787, current state 
active+undersized+degraded+remapped+backfill_wait, last acting [63,8]
    pg 14.20a is stuck unclean for 24341.489625, current state 
active+remapped+backfill_wait, last acting [20,28,37]
    pg 14.20b is stuck unclean for 24351.403819, current state 
active+remapped+backfill_wait, last acting [60,6,57]
    pg 14.21d is stuck unclean for 24345.292525, current state 
active+remapped+backfill_wait, last acting [59,62,10]
    pg 14.226 is stuck undersized for 363.681151, current state 
active+undersized+degraded+remapped+backfilling, last acting [44,19]
    pg 14.22c is stuck unclean for 123.793121, current state 
active+remapped+backfill_wait, last acting [16,40,9]
    pg 14.236 is stuck undersized for 163.374339, current state 
active+undersized+degraded+remapped+backfill_wait, last acting [61,6]
    pg 14.240 is stuck undersized for 437.857887, current state 
active+undersized+degraded+remapped+backfilling, last acting [57,27]
    pg 14.24d is stuck undersized for 115.191726, current state 
active+undersized+degraded+remapped+backfilling, last acting [19,27]
    pg 14.268 is stuck undersized for 7932.097742, current state 
active+undersized+degraded+remapped+backfilling, last acting [12,58]
    pg 14.27d is stuck unclean for 7935.169818, current state 
active+remapped+backfilling, last acting [12,47,8]
    pg 14.290 is stuck undersized for 437.855071, current state 
active+undersized+degraded+remapped+backfilling, last acting [29,3]
    pg 14.2aa is stuck undersized for 114.181416, current state 
active+undersized+degraded+remapped+backfill_wait, last acting [3,46]
    pg 14.2ac is stuck undersized for 123.821179, current state 
active+undersized+degraded+remapped+backfill_wait, last acting [47,18]
    pg 14.2b9 is stuck undersized for 3704.234924, current state 
active+undersized+degraded+remapped+backfilling, last acting [13,38]
    pg 14.2c4 is stuck undersized for 123.824405, current state 
active+undersized+degraded+remapped+backfill_wait, last acting [15,36]
    pg 14.2c5 is stuck undersized for 161.266102, current state 
active+undersized+degraded+remapped+backfill_wait, last acting [63,44]
    pg 14.2e0 is stuck undersized for 438.862093, current state 
active+undersized+degraded+remapped+backfilling, last acting [9,21]
    pg 14.2eb is stuck undersized for 437.860653, current state 
active+undersized+degraded+remapped+backfilling, last acting [8,34]
    pg 14.2f8 is stuck undersized for 163.373209, current state 
active+undersized+degraded+remapped+backfill_wait, last acting [61,28]
    pg 14.305 is stuck undersized for 723.892233, current state 
active+undersized+degraded+remapped+backfill_wait, last acting [9,40]
    pg 14.320 is stuck unclean for 123.788128, current state 
active+remapped+backfill_wait, last acting [62,6,5]
    pg 14.322 is stuck undersized for 437.856055, current 

Re: [ceph-users] Can't Start OSD

2015-03-23 Thread Gregory Farnum
, current state stale+active+clean, last acting [1] pg 7.5d9 is 
 stuck stale for 5954121.994181, current state stale+active+clean, last acting 
 [1] pg 7.395 is stuck stale for 5954121.993989, current state 
 stale+active+clean, last acting [1] pg 7.38e is stuck stale for 
 5954121.993988, current state stale+active+clean, last acting [1] pg 7.13a is 
 stuck stale for 5954121.993766, current state stale+active+clean, last acting 
 [1] pg 7.683 is stuck stale for 5954121.994255, current state 
 stale+active+clean, last acting [1] pg 7.439 is stuck stale for 
 5954121.994079, current state stale+active+clean, last acting [1]

 It’s osd id=1 that’s problematic, but I should have a replica of the data 
 somewhere else?

 Thanks!

 ~Noah

 On Mar 22, 2015, at 2:04 PM, Somnath Roy somnath@sandisk.com wrote:

 Are you seeing any error related to the disk (where OSD is mounted) in dmesg 
 ?
 Could be a leveldb corruption or ceph bug.
 Now, unfortunately not enough log in that portion of the code base to
 reveal exactly why we are not getting infoos object from leveldb :-(

 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of Noah Mehl
 Sent: Sunday, March 22, 2015 10:11 AM
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Can't Start OSD

 In production for over a year, and no upgrades.

 Thanks!

 ~Noah

 On Mar 22, 2015, at 1:01 PM, Somnath Roy somnath@sandisk.com wrote:

 Noah,
 Is this fresh installation or after upgrade ?

 It seems related to omap (leveldb) stuff.

 Thanks  Regards
 Somnath
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of Noah Mehl
 Sent: Sunday, March 22, 2015 9:34 AM
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Can't Start OSD

 I have an OSD that’s failing to start.  I can’t make heads or tails of the 
 error (pasted below).

 Thanks!

 ~Noah

 2015-03-22 16:32:39.265116 7f4da7fa0780  0 ceph version 0.67.4
 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid
 13483
 2015-03-22 16:32:39.269499 7f4da7fa0780  1
 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
 2015-03-22 16:32:39.269509 7f4da7fa0780  1
 filestore(/var/lib/ceph/osd/ceph-1)  disabling 'filestore replica
 fadvise' due to known issues with fadvise(DONTNEED) on xfs
 2015-03-22 16:32:39.450031 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported
 and appears to work
 2015-03-22 16:32:39.450069 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled
 via 'filestore fiemap' config option
 2015-03-22 16:32:39.450743 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs
 2015-03-22 16:32:39.499753 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully
 supported (by glibc and kernel)
 2015-03-22 16:32:39.500078 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 
 2015-03-22 16:32:40.765736 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD
 journal mode: btrfs not detected
 2015-03-22 16:32:40.777156 7f4da7fa0780  1 journal _open
 /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block
 size 4096 bytes, directio = 1, aio = 1
 2015-03-22 16:32:40.777278 7f4da7fa0780  1 journal _open
 /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block
 size 4096 bytes, directio = 1, aio = 1
 2015-03-22 16:32:40.778223 7f4da7fa0780  1 journal close
 /var/lib/ceph/osd/ceph-1/journal
 2015-03-22 16:32:41.066655 7f4da7fa0780  1
 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
 2015-03-22 16:32:41.150578 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported
 and appears to work
 2015-03-22 16:32:41.150624 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled
 via 'filestore fiemap' config option
 2015-03-22 16:32:41.151359 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs
 2015-03-22 16:32:41.225302 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully
 supported (by glibc and kernel)
 2015-03-22 16:32:41.225498 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 
 2015-03-22 16:32:42.375558 7f4da7fa0780  0
 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD
 journal mode: btrfs not detected
 2015-03-22 16:32:42.382958 7f4da7fa0780  1 journal _open
 /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block
 size 4096 bytes, directio = 1, aio = 1
 2015-03-22 16:32:42.383187 7f4da7fa0780  1 journal _open
 /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block
 size 4096 bytes, directio = 1, aio = 1
 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function
 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t,
 ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22
 16:32:43.075101
 osd/PG.cc: 2270: FAILED assert

Re: [ceph-users] Can't Start OSD

2015-03-22 Thread Somnath Roy
Noah,
Is this fresh installation or after upgrade ?

It seems related to omap (leveldb) stuff.

Thanks  Regards
Somnath
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah 
Mehl
Sent: Sunday, March 22, 2015 9:34 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Can't Start OSD

I have an OSD that’s failing to start.  I can’t make heads or tails of the 
error (pasted below).

Thanks!

~Noah

2015-03-22 16:32:39.265116 7f4da7fa0780  0 ceph version 0.67.4 
(ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483
2015-03-22 16:32:39.269499 7f4da7fa0780  1 filestore(/var/lib/ceph/osd/ceph-1) 
mount detected xfs
2015-03-22 16:32:39.269509 7f4da7fa0780  1 filestore(/var/lib/ceph/osd/ceph-1)  
disabling 'filestore replica fadvise' due to known issues with 
fadvise(DONTNEED) on xfs
2015-03-22 16:32:39.450031 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount FIEMAP ioctl is supported and appears to work
2015-03-22 16:32:39.450069 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-03-22 16:32:39.450743 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount did NOT detect btrfs
2015-03-22 16:32:39.499753 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount syncfs(2) syscall fully supported (by glibc and kernel)
2015-03-22 16:32:39.500078 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount found snaps 
2015-03-22 16:32:40.765736 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount: enabling WRITEAHEAD journal mode: btrfs not detected
2015-03-22 16:32:40.777156 7f4da7fa0780  1 journal _open 
/var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-03-22 16:32:40.777278 7f4da7fa0780  1 journal _open 
/var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-03-22 16:32:40.778223 7f4da7fa0780  1 journal close 
/var/lib/ceph/osd/ceph-1/journal
2015-03-22 16:32:41.066655 7f4da7fa0780  1 filestore(/var/lib/ceph/osd/ceph-1) 
mount detected xfs
2015-03-22 16:32:41.150578 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount FIEMAP ioctl is supported and appears to work
2015-03-22 16:32:41.150624 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-03-22 16:32:41.151359 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount did NOT detect btrfs
2015-03-22 16:32:41.225302 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount syncfs(2) syscall fully supported (by glibc and kernel)
2015-03-22 16:32:41.225498 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount found snaps 
2015-03-22 16:32:42.375558 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount: enabling WRITEAHEAD journal mode: btrfs not detected
2015-03-22 16:32:42.382958 7f4da7fa0780  1 journal _open 
/var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-03-22 16:32:42.383187 7f4da7fa0780  1 journal _open 
/var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static 
epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101
osd/PG.cc: 2270: FAILED assert(values.size() == 1)

 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
ceph::buffer::list*)+0x4d7) [0x70ebf7]
 2: (OSD::load_pgs()+0x14ce) [0x694efe]
 3: (OSD::init()+0x11be) [0x69cffe]
 4: (main()+0x1d09) [0x5c3509]
 5: (__libc_start_main()+0xed) [0x7f4da5bde76d]
 6: /usr/bin/ceph-osd() [0x5c6e1d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

--- begin dump of recent events ---
   -75 2015-03-22 16:32:39.259280 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command perfcounters_dump hook 0x1ae4010
   -74 2015-03-22 16:32:39.259373 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command 1 hook 0x1ae4010
   -73 2015-03-22 16:32:39.259393 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command perf dump hook 0x1ae4010
   -72 2015-03-22 16:32:39.259429 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command perfcounters_schema hook 0x1ae4010
   -71 2015-03-22 16:32:39.259445 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command 2 hook 0x1ae4010
   -70 2015-03-22 16:32:39.259453 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command perf schema hook 0x1ae4010
   -69 2015-03-22 16:32:39.259467 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command config show hook 0x1ae4010
   -68 2015-03-22 16:32:39.259481 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command config set hook 0x1ae4010
   -67 2015-03-22 16:32:39.259495 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command config get hook 0x1ae4010
   -66 2015-03-22 16:32

Re: [ceph-users] Can't Start OSD

2015-03-22 Thread Noah Mehl
In production for over a year, and no upgrades.

Thanks!

~Noah

 On Mar 22, 2015, at 1:01 PM, Somnath Roy somnath@sandisk.com wrote:
 
 Noah,
 Is this fresh installation or after upgrade ?
 
 It seems related to omap (leveldb) stuff.
 
 Thanks  Regards
 Somnath
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah 
 Mehl
 Sent: Sunday, March 22, 2015 9:34 AM
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Can't Start OSD
 
 I have an OSD that’s failing to start.  I can’t make heads or tails of the 
 error (pasted below).
 
 Thanks!
 
 ~Noah
 
 2015-03-22 16:32:39.265116 7f4da7fa0780  0 ceph version 0.67.4 
 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483
 2015-03-22 16:32:39.269499 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
 2015-03-22 16:32:39.269509 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1)  disabling 'filestore replica fadvise' 
 due to known issues with fadvise(DONTNEED) on xfs
 2015-03-22 16:32:39.450031 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and 
 appears to work
 2015-03-22 16:32:39.450069 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 
 'filestore fiemap' config option
 2015-03-22 16:32:39.450743 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs
 2015-03-22 16:32:39.499753 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported 
 (by glibc and kernel)
 2015-03-22 16:32:39.500078 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 
 2015-03-22 16:32:40.765736 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: 
 btrfs not detected
 2015-03-22 16:32:40.777156 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:40.777278 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:40.778223 7f4da7fa0780  1 journal close 
 /var/lib/ceph/osd/ceph-1/journal
 2015-03-22 16:32:41.066655 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
 2015-03-22 16:32:41.150578 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and 
 appears to work
 2015-03-22 16:32:41.150624 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 
 'filestore fiemap' config option
 2015-03-22 16:32:41.151359 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs
 2015-03-22 16:32:41.225302 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported 
 (by glibc and kernel)
 2015-03-22 16:32:41.225498 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 
 2015-03-22 16:32:42.375558 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: 
 btrfs not detected
 2015-03-22 16:32:42.382958 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:42.383187 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static 
 epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
 ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101
 osd/PG.cc: 2270: FAILED assert(values.size() == 1)
 
 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
 ceph::buffer::list*)+0x4d7) [0x70ebf7]
 2: (OSD::load_pgs()+0x14ce) [0x694efe]
 3: (OSD::init()+0x11be) [0x69cffe]
 4: (main()+0x1d09) [0x5c3509]
 5: (__libc_start_main()+0xed) [0x7f4da5bde76d]
 6: /usr/bin/ceph-osd() [0x5c6e1d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.
 
 --- begin dump of recent events ---
   -75 2015-03-22 16:32:39.259280 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command perfcounters_dump hook 0x1ae4010
   -74 2015-03-22 16:32:39.259373 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command 1 hook 0x1ae4010
   -73 2015-03-22 16:32:39.259393 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command perf dump hook 0x1ae4010
   -72 2015-03-22 16:32:39.259429 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command perfcounters_schema hook 0x1ae4010
   -71 2015-03-22 16:32:39.259445 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command 2 hook 0x1ae4010
   -70 2015-03-22 16:32:39.259453 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command perf schema hook 0x1ae4010
   -69 2015-03-22 16:32:39.259467 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command config show hook 0x1ae4010
   -68

[ceph-users] Can't Start OSD

2015-03-22 Thread Noah Mehl
I have an OSD that’s failing to start.  I can’t make heads or tails of the 
error (pasted below).

Thanks!

~Noah

2015-03-22 16:32:39.265116 7f4da7fa0780  0 ceph version 0.67.4 
(ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483
2015-03-22 16:32:39.269499 7f4da7fa0780  1 filestore(/var/lib/ceph/osd/ceph-1) 
mount detected xfs
2015-03-22 16:32:39.269509 7f4da7fa0780  1 filestore(/var/lib/ceph/osd/ceph-1)  
disabling 'filestore replica fadvise' due to known issues with 
fadvise(DONTNEED) on xfs
2015-03-22 16:32:39.450031 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount FIEMAP ioctl is supported and appears to work
2015-03-22 16:32:39.450069 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-03-22 16:32:39.450743 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount did NOT detect btrfs
2015-03-22 16:32:39.499753 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount syncfs(2) syscall fully supported (by glibc and kernel)
2015-03-22 16:32:39.500078 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount found snaps 
2015-03-22 16:32:40.765736 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount: enabling WRITEAHEAD journal mode: btrfs not detected
2015-03-22 16:32:40.777156 7f4da7fa0780  1 journal _open 
/var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-03-22 16:32:40.777278 7f4da7fa0780  1 journal _open 
/var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-03-22 16:32:40.778223 7f4da7fa0780  1 journal close 
/var/lib/ceph/osd/ceph-1/journal
2015-03-22 16:32:41.066655 7f4da7fa0780  1 filestore(/var/lib/ceph/osd/ceph-1) 
mount detected xfs
2015-03-22 16:32:41.150578 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount FIEMAP ioctl is supported and appears to work
2015-03-22 16:32:41.150624 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-03-22 16:32:41.151359 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount did NOT detect btrfs
2015-03-22 16:32:41.225302 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount syncfs(2) syscall fully supported (by glibc and kernel)
2015-03-22 16:32:41.225498 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount found snaps 
2015-03-22 16:32:42.375558 7f4da7fa0780  0 filestore(/var/lib/ceph/osd/ceph-1) 
mount: enabling WRITEAHEAD journal mode: btrfs not detected
2015-03-22 16:32:42.382958 7f4da7fa0780  1 journal _open 
/var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-03-22 16:32:42.383187 7f4da7fa0780  1 journal _open 
/var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static 
epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101
osd/PG.cc: 2270: FAILED assert(values.size() == 1)

 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
ceph::buffer::list*)+0x4d7) [0x70ebf7]
 2: (OSD::load_pgs()+0x14ce) [0x694efe]
 3: (OSD::init()+0x11be) [0x69cffe]
 4: (main()+0x1d09) [0x5c3509]
 5: (__libc_start_main()+0xed) [0x7f4da5bde76d]
 6: /usr/bin/ceph-osd() [0x5c6e1d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

--- begin dump of recent events ---
   -75 2015-03-22 16:32:39.259280 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command perfcounters_dump hook 0x1ae4010
   -74 2015-03-22 16:32:39.259373 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command 1 hook 0x1ae4010
   -73 2015-03-22 16:32:39.259393 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command perf dump hook 0x1ae4010
   -72 2015-03-22 16:32:39.259429 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command perfcounters_schema hook 0x1ae4010
   -71 2015-03-22 16:32:39.259445 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command 2 hook 0x1ae4010
   -70 2015-03-22 16:32:39.259453 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command perf schema hook 0x1ae4010
   -69 2015-03-22 16:32:39.259467 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command config show hook 0x1ae4010
   -68 2015-03-22 16:32:39.259481 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command config set hook 0x1ae4010
   -67 2015-03-22 16:32:39.259495 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command config get hook 0x1ae4010
   -66 2015-03-22 16:32:39.259505 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command log flush hook 0x1ae4010
   -65 2015-03-22 16:32:39.259519 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command log dump hook 0x1ae4010
   -64 2015-03-22 16:32:39.259536 7f4da7fa0780  5 asok(0x1aec1c0) 
register_command log reopen hook 0x1ae4010
   -63 2015-03-22 16:32:39.265116 

Re: [ceph-users] Can't Start OSD

2015-03-22 Thread Somnath Roy
Are you seeing any error related to the disk (where OSD is mounted) in dmesg ?
Could be a leveldb corruption or ceph bug.
Now, unfortunately not enough log in that portion of the code base to reveal 
exactly why we are not getting infoos object from leveldb :-(

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah 
Mehl
Sent: Sunday, March 22, 2015 10:11 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Can't Start OSD

In production for over a year, and no upgrades.

Thanks!

~Noah

 On Mar 22, 2015, at 1:01 PM, Somnath Roy somnath@sandisk.com wrote:
 
 Noah,
 Is this fresh installation or after upgrade ?
 
 It seems related to omap (leveldb) stuff.
 
 Thanks  Regards
 Somnath
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Noah 
 Mehl
 Sent: Sunday, March 22, 2015 9:34 AM
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Can't Start OSD
 
 I have an OSD that’s failing to start.  I can’t make heads or tails of the 
 error (pasted below).
 
 Thanks!
 
 ~Noah
 
 2015-03-22 16:32:39.265116 7f4da7fa0780  0 ceph version 0.67.4 
 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483
 2015-03-22 16:32:39.269499 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
 2015-03-22 16:32:39.269509 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1)  disabling 'filestore replica fadvise' 
 due to known issues with fadvise(DONTNEED) on xfs
 2015-03-22 16:32:39.450031 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and 
 appears to work
 2015-03-22 16:32:39.450069 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 
 'filestore fiemap' config option
 2015-03-22 16:32:39.450743 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs
 2015-03-22 16:32:39.499753 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported 
 (by glibc and kernel)
 2015-03-22 16:32:39.500078 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 
 2015-03-22 16:32:40.765736 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: 
 btrfs not detected
 2015-03-22 16:32:40.777156 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:40.777278 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:40.778223 7f4da7fa0780  1 journal close 
 /var/lib/ceph/osd/ceph-1/journal
 2015-03-22 16:32:41.066655 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
 2015-03-22 16:32:41.150578 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and 
 appears to work
 2015-03-22 16:32:41.150624 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 
 'filestore fiemap' config option
 2015-03-22 16:32:41.151359 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs
 2015-03-22 16:32:41.225302 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported 
 (by glibc and kernel)
 2015-03-22 16:32:41.225498 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 
 2015-03-22 16:32:42.375558 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: 
 btrfs not detected
 2015-03-22 16:32:42.382958 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:42.383187 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static 
 epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
 ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101
 osd/PG.cc: 2270: FAILED assert(values.size() == 1)
 
 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
 ceph::buffer::list*)+0x4d7) [0x70ebf7]
 2: (OSD::load_pgs()+0x14ce) [0x694efe]
 3: (OSD::init()+0x11be) [0x69cffe]
 4: (main()+0x1d09) [0x5c3509]
 5: (__libc_start_main()+0xed) [0x7f4da5bde76d]
 6: /usr/bin/ceph-osd() [0x5c6e1d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.
 
 --- begin dump of recent events ---
   -75 2015-03-22 16:32:39.259280 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command perfcounters_dump hook 0x1ae4010
   -74 2015-03-22 16:32:39.259373 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command 1 hook 0x1ae4010
   -73 2015-03-22 16:32:39.259393 7f4da7fa0780  5 asok

Re: [ceph-users] Can't Start OSD

2015-03-22 Thread Noah Mehl
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Can't Start OSD
 
 In production for over a year, and no upgrades.
 
 Thanks!
 
 ~Noah
 
 On Mar 22, 2015, at 1:01 PM, Somnath Roy somnath@sandisk.com wrote:
 
 Noah,
 Is this fresh installation or after upgrade ?
 
 It seems related to omap (leveldb) stuff.
 
 Thanks  Regards
 Somnath
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Noah Mehl
 Sent: Sunday, March 22, 2015 9:34 AM
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Can't Start OSD
 
 I have an OSD that’s failing to start.  I can’t make heads or tails of the 
 error (pasted below).
 
 Thanks!
 
 ~Noah
 
 2015-03-22 16:32:39.265116 7f4da7fa0780  0 ceph version 0.67.4 
 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 13483
 2015-03-22 16:32:39.269499 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
 2015-03-22 16:32:39.269509 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1)  disabling 'filestore replica fadvise' 
 due to known issues with fadvise(DONTNEED) on xfs
 2015-03-22 16:32:39.450031 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and 
 appears to work
 2015-03-22 16:32:39.450069 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 
 'filestore fiemap' config option
 2015-03-22 16:32:39.450743 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs
 2015-03-22 16:32:39.499753 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported 
 (by glibc and kernel)
 2015-03-22 16:32:39.500078 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 
 2015-03-22 16:32:40.765736 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: 
 btrfs not detected
 2015-03-22 16:32:40.777156 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:40.777278 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:40.778223 7f4da7fa0780  1 journal close 
 /var/lib/ceph/osd/ceph-1/journal
 2015-03-22 16:32:41.066655 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
 2015-03-22 16:32:41.150578 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported and 
 appears to work
 2015-03-22 16:32:41.150624 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled via 
 'filestore fiemap' config option
 2015-03-22 16:32:41.151359 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs
 2015-03-22 16:32:41.225302 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully supported 
 (by glibc and kernel)
 2015-03-22 16:32:41.225498 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 
 2015-03-22 16:32:42.375558 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: 
 btrfs not detected
 2015-03-22 16:32:42.382958 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:42.383187 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 'static 
 epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
 ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 16:32:43.075101
 osd/PG.cc: 2270: FAILED assert(values.size() == 1)
 
 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
 ceph::buffer::list*)+0x4d7) [0x70ebf7]
 2: (OSD::load_pgs()+0x14ce) [0x694efe]
 3: (OSD::init()+0x11be) [0x69cffe]
 4: (main()+0x1d09) [0x5c3509]
 5: (__libc_start_main()+0xed) [0x7f4da5bde76d]
 6: /usr/bin/ceph-osd() [0x5c6e1d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.
 
 --- begin dump of recent events ---
  -75 2015-03-22 16:32:39.259280 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command perfcounters_dump hook 0x1ae4010
  -74 2015-03-22 16:32:39.259373 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command 1 hook 0x1ae4010
  -73 2015-03-22 16:32:39.259393 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command perf dump hook 0x1ae4010
  -72 2015-03-22 16:32:39.259429 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command perfcounters_schema hook 0x1ae4010
  -71 2015-03-22 16:32:39.259445 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command 2 hook 0x1ae4010
  -70 2015-03-22 16:32:39.259453 7f4da7fa0780  5 asok(0x1aec1c0) 
 register_command perf schema hook 0x1ae4010
  -69 2015-03-22 16:32:39.259467 7f4da7fa0780  5 asok

Re: [ceph-users] Can't Start OSD

2015-03-22 Thread Somnath Roy
 for 5954121.994079, current state 
stale+active+clean, last acting [1]

It’s osd id=1 that’s problematic, but I should have a replica of the data 
somewhere else?

Thanks!

~Noah

 On Mar 22, 2015, at 2:04 PM, Somnath Roy somnath@sandisk.com wrote:
 
 Are you seeing any error related to the disk (where OSD is mounted) in dmesg ?
 Could be a leveldb corruption or ceph bug.
 Now, unfortunately not enough log in that portion of the code base to 
 reveal exactly why we are not getting infoos object from leveldb :-(
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Noah Mehl
 Sent: Sunday, March 22, 2015 10:11 AM
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Can't Start OSD
 
 In production for over a year, and no upgrades.
 
 Thanks!
 
 ~Noah
 
 On Mar 22, 2015, at 1:01 PM, Somnath Roy somnath@sandisk.com wrote:
 
 Noah,
 Is this fresh installation or after upgrade ?
 
 It seems related to omap (leveldb) stuff.
 
 Thanks  Regards
 Somnath
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Noah Mehl
 Sent: Sunday, March 22, 2015 9:34 AM
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Can't Start OSD
 
 I have an OSD that’s failing to start.  I can’t make heads or tails of the 
 error (pasted below).
 
 Thanks!
 
 ~Noah
 
 2015-03-22 16:32:39.265116 7f4da7fa0780  0 ceph version 0.67.4 
 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-osd, pid 
 13483
 2015-03-22 16:32:39.269499 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
 2015-03-22 16:32:39.269509 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1)  disabling 'filestore replica 
 fadvise' due to known issues with fadvise(DONTNEED) on xfs
 2015-03-22 16:32:39.450031 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported 
 and appears to work
 2015-03-22 16:32:39.450069 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled 
 via 'filestore fiemap' config option
 2015-03-22 16:32:39.450743 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs
 2015-03-22 16:32:39.499753 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully 
 supported (by glibc and kernel)
 2015-03-22 16:32:39.500078 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 
 2015-03-22 16:32:40.765736 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD 
 journal mode: btrfs not detected
 2015-03-22 16:32:40.777156 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block 
 size 4096 bytes, directio = 1, aio = 1
 2015-03-22 16:32:40.777278 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 2551: 5368709120 bytes, block 
 size 4096 bytes, directio = 1, aio = 1
 2015-03-22 16:32:40.778223 7f4da7fa0780  1 journal close 
 /var/lib/ceph/osd/ceph-1/journal
 2015-03-22 16:32:41.066655 7f4da7fa0780  1 
 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
 2015-03-22 16:32:41.150578 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is supported 
 and appears to work
 2015-03-22 16:32:41.150624 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount FIEMAP ioctl is disabled 
 via 'filestore fiemap' config option
 2015-03-22 16:32:41.151359 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount did NOT detect btrfs
 2015-03-22 16:32:41.225302 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount syncfs(2) syscall fully 
 supported (by glibc and kernel)
 2015-03-22 16:32:41.225498 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount found snaps 
 2015-03-22 16:32:42.375558 7f4da7fa0780  0 
 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD 
 journal mode: btrfs not detected
 2015-03-22 16:32:42.382958 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 1429: 5368709120 bytes, block 
 size 4096 bytes, directio = 1, aio = 1
 2015-03-22 16:32:42.383187 7f4da7fa0780  1 journal _open 
 /var/lib/ceph/osd/ceph-1/journal fd 1481: 5368709120 bytes, block 
 size 4096 bytes, directio = 1, aio = 1
 2015-03-22 16:32:43.076434 7f4da7fa0780 -1 osd/PG.cc: In function 
 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
 ceph::bufferlist*)' thread 7f4da7fa0780 time 2015-03-22 
 16:32:43.075101
 osd/PG.cc: 2270: FAILED assert(values.size() == 1)
 
 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t, 
 ceph::buffer::list*)+0x4d7) [0x70ebf7]
 2: (OSD::load_pgs()+0x14ce) [0x694efe]
 3: (OSD::init()+0x11be) [0x69cffe]
 4: (main()+0x1d09) [0x5c3509]
 5: (__libc_start_main()+0xed) [0x7f4da5bde76d]
 6: /usr/bin/ceph-osd() [0x5c6e1d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.
 
 --- begin dump of recent events ---
  -75 2015

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-28 Thread tuantb
 

Hi Craig Lewis, 

My pool have 300TB DATA, I can't recreate a new pool, then copying data
by ceph cp pool (take very long time). 

I upgraded Ceph to Giant (0.86), but still error :(( 

I think my proplem is objects misplaced (0.320%) 

# ceph pg 23.96 query
 num_objects_missing_on_primary: 0,
 num_objects_degraded: 0,
 NUM_OBJECTS_MISPLACED: 79,

 cluster xx-x-x-x
 health HEALTH_WARN 225 pgs degraded; 2 pgs repair; 225 pgs stuck
degraded; 263 pgs stuck unclean; 225 pgs stuck undersized; 225 pgs
undersized; recovery 308759/54799506 objects degraded (0.563%);
175270/54799506 objects misplaced (0.320%); 1/130 in osds are down; 
 flags noout,nodeep-scrub
 pgmap v28905830: 14973 pgs, 23 pools, 70255 GB data, 17838 kobjects
 206 TB used, 245 TB / 452 TB avail
 308759/54799506 objects degraded (0.563%); 175270/54799506 OBJECTS
MISPLACED (0.320%)
 14708 active+clean
 38 ACTIVE+REMAPPED
 225 ACTIVE+UNDERSIZED+DEGRADED    
 client io 35068 kB/s rd, 71815 kB/s wr, 4956 op/s 

- Checking in ceph log: 

2014-10-28 15:33:59.733177 7f6a7f1ab700 5 OSD.21 pg_epoch: 103718
pg[23.96( v 103713'171086 (103609'167229,103713'171086] local-les=103715
n=85 ec=25000 les/c 103715/103710 103714/103714/103236) [92,21,78] r=1
lpr=103714 pi=100280-103713/118 luod=0'0 crt=103713'171086 active] ENTER
STARTED/REPLICAACTIVE/REPNOTRECOVERING 

Then logging many failed log: (on many objects eg:
c03fe096/rbd_data.5348922ae8944a.306b,..) 

2014-10-28 15:33:59.343435 7f6a7e1a9700 5 -- op tracker -- seq: 1793,
time: 2014-10-28 15:33:59.343435, event: done, op: MOSDPGPush(23.96
103718
[PushOp(C03FE096/RBD_DATA.5348922AE8944A.306B/HEAD//24,
version: 103622'283374, data_included: [0~4194304], data_size: 0,
omap_header_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(c03fe096/rbd_data.5348922ae8944a.306b/head//24@103622'283374,
copy_subset: [0~4194304], clone_subset: {}), after_progress:
ObjectRecoveryProgress(!first, data_recovered_to:4194304,
data_complete:true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0,
data_complete:false, omap_recovered_to:,
omap_complete:false)),PushOp(4120f096/rbd_data.7a63d32ae8944a.0083/head//24,
version: 103679'295624, data_included: [0~4194304], data_size: 0,
omap_header_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(4120f096/rbd_data.7a63d32ae8944a.0083/head//24@103679'295624,
copy_subset: [0~4194304], clone_subset: {}), after_progress:
ObjectRecoveryProgress(!first, data_recovered_to:4194304,
data_complete:true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0,
data_complete:false, omap_recovered_to:, omap_complete:false))]) 

Thanks! 

--
Tuan
HaNoi-VietNam 

On 2014-10-28 01:35, Craig Lewis wrote: 

 My experience is that once you hit this bug, those PGs are gone. I tried 
 marking the primary OSD OUT, which caused this problem to move to the new 
 primary OSD. Luckily for me, my affected PGs were using replication state in 
 the secondary cluster. I ended up deleting the whole pool and recreating it. 
 
 Which pools are 7 and 23? It's possible that it's something that easy to 
 replace. 
 
 On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan tua...@vccloud.vn wrote:
 
 Hi Craig, Thanks for replying.
 When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 
 23.9c6, 23.63 can't recovery as pasted log.
 
 Those pgs are active+degraded state. 
 #ceph pg map 7.9d8 
 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start 
 osd.21 then pg 7.9d8 and three remain pgs to changed to state 
 active+recovering) . osd.21 still down after following logs:

 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-27 Thread Craig Lewis
My experience is that once you hit this bug, those PGs are gone.  I tried
marking the primary OSD OUT, which caused this problem to move to the new
primary OSD.  Luckily for me, my affected PGs were using replication state
in the secondary cluster.  I ended up deleting the whole pool and
recreating it.

Which pools are 7 and 23?  It's possible that it's something that easy to
replace.



On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan tua...@vccloud.vn wrote:

  Hi Craig, Thanks for replying.
 When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596,
 23.9c6, 23.63 can't recovery as pasted log.

 Those pgs are active+degraded state.
 #ceph pg map 7.9d8
 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49]  (When start
 osd.21 then pg 7.9d8 and three remain pgs  to changed to state
 active+recovering) . osd.21 still down after following logs:

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan

I send some related bugs:
(osd.21 not be able started)

 -8705 2014-10-25 14:41:04.345727 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] 
lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 
local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) 
[40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 
crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit 
Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296


 -1637 2014-10-25 14:41:14.326580 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] 
local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) 
[90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 
active] *enter Started/ReplicaActive/RepNotRecovering*


  -437 2014-10-25 14:41:15.042174 7f12ba42e700  5 *osd.21 *pg_epoch: 
102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] 
local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) 
[90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 
active] *enter **Started/ReplicaActive/RepNotRecovering*


Thanks!


On 10/25/2014 11:26 AM, Ta Ba Tuan wrote:

Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 
23.596, 23.9c6, 23.63 can't recovery as pasted log.


Those pgs are active+degraded state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When 
start osd.21 then pg 7.9d8 and three remain pgs  to changed to state 
active+recovering) . osd.21 still down after following logs:



2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds 
old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 
[Push
Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 
102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102
798'7794851, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complete
:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_rec

overed_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds 
old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 
102803 [Pus
hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, 
version: 102798'295732, data_included: [0~4194304], data_size: 
4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1
02798'295732, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_re

covered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds 
old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 
102803 [Pus
hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, 
version: 102798'66056, data_included: [0~4194304], data_size: 4194304, 
omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10
2798'66056, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complete:
true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_reco

vered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included 
below; oldest blocked for  54.967456 secs
2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds 
old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 
102803 [Pus
hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, 
version: 102748'145637, data_included: [0~4194304], data_size: 
4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1
02748'145637, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_re

covered_to:, omap_complete:false))]) 

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan
My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8 
has 4 objects unfound and apparently lost.


After I restart all ceph-data nodes,  I can't start osd.21, have many 
logs about pg 6.9d8 as:


 -440 2014-10-25 19:28:17.468161 7fec5731d700  5 -- op tracker -- seq: 
3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: MOSDPGPus
h(*6.9d8* 102856 
[PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, 
version: 102853'7800592, data_included: [0~4194304], data_size:
 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, 
recovery_info: ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844.
e871/head//6@102853'7800592, copy_subset: [0~4194304], clone_subset: 
{}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41
94304, data_complete:true, omap_recovered_to:, omap_complete:true), 
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
data_comp

lete:false, omap_recovered_to:, omap_complete:false))])

I think having some error objects. What'm I must do?,please!
Thanks!
--
Tuan
HaNoi-VietNam


On 10/25/2014 03:01 PM, Ta Ba Tuan wrote:

I send some related bugs:
(osd.21 not be able started)

 -8705 2014-10-25 14:41:04.345727 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] 
lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 
local-les=101780 n=4719 ec=164 les/c 102841/102838 
102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 
pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 
active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 
0.000170 1 0.000296


 -1637 2014-10-25 14:41:14.326580 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] 
local-les=102841 n=85 ec=25000 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 
luod=0'0 crt=102839'91984 active] *enter 
Started/ReplicaActive/RepNotRecovering*


  -437 2014-10-25 14:41:15.042174 7f12ba42e700  5 *osd.21 *pg_epoch: 
102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] 
local-les=102841 n=23 ec=25085 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 
luod=0'0 crt=102808'38419 active] *enter 
**Started/ReplicaActive/RepNotRecovering*


Thanks!


On 10/25/2014 11:26 AM, Ta Ba Tuan wrote:

Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 
23.596, 23.9c6, 23.63 can't recovery as pasted log.


Those pgs are active+degraded state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49]  (When 
start osd.21 then pg 7.9d8 and three remain pgs  to changed to state 
active+recovering) . osd.21 still down after following logs:



2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 
seconds old, received at 2014-10-25 10:57:17.580013: 
MOSDPGPush(*7.9d8 *102803 [Push
Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, 
version: 102798'7794851, data_included: [0~4194304], data_size: 
4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102
798'7794851, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complete
:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_rec

overed_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 
seconds old, received at 2014-10-25 10:57:18.140156: 
MOSDPGPush(*23.596* 102803 [Pus
hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, 
version: 102798'295732, data_included: [0~4194304], data_size: 
4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1
02798'295732, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_re

covered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 
seconds old, received at 2014-10-25 10:57:17.555048: 
MOSDPGPush(*23.9c6* 102803 [Pus
hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, 
version: 102798'66056, data_included: [0~4194304], data_size: 
4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10
2798'66056, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan

#ceph pg *6.9d8* query
...
  peer_info: [
{ peer: 49,
  pgid: 6.9d8,
  last_update: 102889'7801917,
  last_complete: 102889'7801917,
  log_tail: 102377'7792649,
  last_user_version: 7801879,
  last_backfill: MAX,
  purged_snaps: 
[1~7,9~44b,455~1f8,64f~63,6b3~3a,6ee~12f,81f~10,830~8,839~69b,ed7~7,edf~4,ee4~6f5,15da~f9,16d4~1f,16f5~7,16fd~4,1705~5

e,1764~7,1771~78,17eb~12,1800~2,1803~d,1812~3,181a~1,181c~a,1827~3b,1863~1,1865~1,1867~1,186b~e,187a~3,1881~1,1884~7,188c~1,188f~3,1894~5,189f~2,
18ab~1,18c6~1,1922~13,193d~1,1940~1,194a~1,1968~5,1975~1,1979~4,197e~4,1984~1,1987~11,199c~1,19a0~1,19a3~9,19ad~3,19b2~1,19b6~27,19de~8],
  history: { epoch_created: 164,
  last_epoch_started: 102888,
  last_epoch_clean: 102888,
  last_epoch_split: 0
  parent_split_bits: 0,
  last_scrub: 91654'7460936,
  last_scrub_stamp: 2014-10-10 10:36:25.433016,
  last_deep_scrub: 81667'5815892,
  last_deep_scrub_stamp: 2014-08-29 09:44:14.012219,
  last_clean_scrub_stamp: 2014-10-10 10:36:25.433016,
  log_size: 9229,
  ondisk_log_size: 9229,
  stats_invalid: 1,
  stat_sum: { num_bytes: 17870536192,
  num_objects: 4327,
  num_object_clones: 29,
  num_object_copies: 12981,*
**  num_objects_missing_on_primary: 4,*
  num_objects_degraded: 4,
  num_objects_unfound: 0,
  num_objects_dirty: 1092,
  num_whiteouts: 0,
  num_read: 4820626,
  num_read_kb: 59073045,
  num_write: 12748709,
  num_write_kb: 181630845,
  num_scrub_errors: 0,
  num_shallow_scrub_errors: 0,
  num_deep_scrub_errors: 0,
  num_objects_recovered: 135847,
  num_bytes_recovered: 562255538176,
  num_keys_recovered: 0,
  num_objects_omap: 0,
  num_objects_hit_set_archive: 0},


On 10/25/2014 07:40 PM, Ta Ba Tuan wrote:
My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 
6.9d8 has 4 objects unfound and apparently lost.


After I restart all ceph-data nodes,  I can't start osd.21, have many 
logs about pg 6.9d8 as:


 -440 2014-10-25 19:28:17.468161 7fec5731d700  5 -- op tracker -- 
seq: 3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: 
MOSDPGPus
h(*6.9d8* 102856 
[PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, 
version: 102853'7800592, data_included: [0~4194304], data_size:
 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, 
recovery_info: 
ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844.
e871/head//6@102853'7800592, copy_subset: [0~4194304], 
clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:41
94304, data_complete:true, omap_recovered_to:, omap_complete:true), 
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
data_comp

lete:false, omap_recovered_to:, omap_complete:false))])

I think having some error objects. What'm I must do?,please!
Thanks!
--
Tuan
HaNoi-VietNam


On 10/25/2014 03:01 PM, Ta Ba Tuan wrote:

I send some related bugs:
(osd.21 not be able started)

 -8705 2014-10-25 14:41:04.345727 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*6.5e1*( v 102843'11832159 
(102377'11822991,102843'11832159] lb 
c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 
local-les=101780 n=4719 ec=164 les/c 102841/102838 
102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 
pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 
active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 
0.000170 1 0.000296


 -1637 2014-10-25 14:41:14.326580 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] 
local-les=102841 n=85 ec=25000 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 
luod=0'0 crt=102839'91984 active] *enter 
Started/ReplicaActive/RepNotRecovering*


  -437 2014-10-25 14:41:15.042174 7f12ba42e700  5 *osd.21 *pg_epoch: 
102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] 
local-les=102841 n=23 ec=25085 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 
luod=0'0 crt=102808'38419 active] *enter 
**Started/ReplicaActive/RepNotRecovering*


Thanks!


On 10/25/2014 11:26 AM, Ta Ba Tuan wrote:

Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 
23.596, 23.9c6, 23.63 can't recovery as pasted log.


Those pgs are active+degraded state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49]  (When 
start osd.21 then pg 7.9d8 and three remain pgs  to changed to state 

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-24 Thread Craig Lewis
It looks like you're running into http://tracker.ceph.com/issues/5699

You're running 0.80.7, which has a fix for that bug.  From my reading of
the code, I believe the fix only prevents the issue from occurring.  It
doesn't work around or repair bad snapshots created on older versions of
Ceph.

Were any of the snapshots you're removing up created on older versions of
Ceph?  If they were all created on Firefly, then you should open a new
tracker issue, and try to get some help on IRC or the developers mailing
list.


On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan tua...@vccloud.vn wrote:

 Dear everyone

 I can't start osd.21, (attached log file).
 some pgs can't be repair. I'm using replicate 3 for my data pool.
 Feel some objects in those pgs be failed,

 I tried to delete some data that related above objects, but still not
 start osd.21
 and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86).

 Guide me to debug it, please! Thanks!

 --
 Tuan
 Ha Noi - VietNam










 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-24 Thread Ta Ba Tuan

Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 
23.9c6, 23.63 can't recovery as pasted log.


Those pgs are active+degraded state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start 
osd.21 then pg 7.9d8 and three remain pgs  to changed to state 
active+recovering) . osd.21 still down after following logs:



2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds 
old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push
Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 
102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102
798'7794851, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complete
:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, 
omap_rec

overed_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds 
old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus
hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 
102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1
02798'295732, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, 
omap_re

covered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds 
old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus
hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 
102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10
2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: 
ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:
true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, 
omap_reco

vered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included 
below; oldest blocked for  54.967456 secs
2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds 
old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 102803 [Pus
hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, version: 
102748'145637, data_included: [0~4194304], data_size: 4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1
02748'145637, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, 
omap_re

covered_to:, omap_complete:false))]) v2 currently no flag points reached

Thanks!
--
Tuan
HaNoi-VietNam

On 10/25/2014 05:07 AM, Craig Lewis wrote:

It looks like you're running into http://tracker.ceph.com/issues/5699

You're running 0.80.7, which has a fix for that bug.  From my reading 
of the code, I believe the fix only prevents the issue from 
occurring.  It doesn't work around or repair bad snapshots created on 
older versions of Ceph.


Were any of the snapshots you're removing up created on older versions 
of Ceph?  If they were all created on Firefly, then you should open a 
new tracker issue, and try to get some help on IRC or the developers 
mailing list.


On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan tua...@vccloud.vn 
mailto:tua...@vccloud.vn wrote:


Dear everyone

I can't start osd.21, (attached log file).
some pgs can't be repair. I'm using replicate 3 for my data pool.
Feel some objects in those pgs be failed,

I tried to delete some data that related above objects, but still
not start osd.21
and, removed osd.21, but other osds (eg: osd.86 down, not start
osd.86).

Guide me to debug it, please! Thanks!

--
Tuan
Ha Noi - VietNam










___
ceph-users mailing list

[ceph-users] Can't start osd- one osd alway be down.

2014-10-23 Thread Ta Ba Tuan

Dear everyone

I can't start osd.21, (attached log file).
some pgs can't be repair. I'm using replicate 3 for my data pool.
Feel some objects in those pgs be failed,

I tried to delete some data that related above objects, but still not 
start osd.21

and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86).

Guide me to debug it, please! Thanks!

--
Tuan
Ha Noi - VietNam









2014-10-24 11:10:53.036094 7f86c6fcb780  0 xfsfilestorebackend(/var/lib/ceph/osd/cloud-21) detect_feature: extsize is disabled by conf
2014-10-24 11:10:53.181392 7f86c6fcb780  0 filestore(/var/lib/ceph/osd/cloud-21) mount: WRITEAHEAD journal mode explicitly enabled in conf
2014-10-24 11:10:53.191499 7f86c6fcb780  1 journal _open /dev/sdi1 fd 24: 11998855168 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-24 11:11:03.794632 7f86c6fcb780  1 journal _open /dev/sdi1 fd 24: 11998855168 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-10-24 11:11:03.845410 7f86c6fcb780  0 cls cls/hello/cls_hello.cc:271: loading cls_hello
2014-10-24 11:11:04.174302 7f86c6fcb780  0 osd.21 101773 crush map has features 1107558400, adjusting msgr requires for clients
2014-10-24 11:11:04.174360 7f86c6fcb780  0 osd.21 101773 crush map has features 1107558400 was 8705, adjusting msgr requires for mons
2014-10-24 11:11:04.174373 7f86c6fcb780  0 osd.21 101773 crush map has features 1107558400, adjusting msgr requires for osds
2014-10-24 11:11:04.174402 7f86c6fcb780  0 osd.21 101773 load_pgs
2014-10-24 11:11:22.986057 7f86c6fcb780  0 osd.21 101773 load_pgs opened 281 pgs
2014-10-24 11:11:23.039971 7f86b6d2e700  0 osd.21 101773 ignoring osdmap until we have initialized
2014-10-24 11:11:23.040818 7f86b6d2e700  0 osd.21 101773 ignoring osdmap until we have initialized
2014-10-24 11:11:23.276236 7f86c6fcb780  0 osd.21 101773 done with init, starting boot process
2014-10-24 11:12:44.346474 7f865ca3c700  0 -- 192.168.1.2:6840/28594  172.30.1.81:0/4234900213 pipe(0x23c15000 sd=66 :6840 s=0 pgs=0 cs=0 l=0 c=0x246f96e0).accept peer addr is really 172.30.1.81:0/4234900213 (socket is 172.30.1.81:47697/0)
2014-10-24 11:15:27.767594 7f86a2505700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f86a2505700

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: /usr/bin/ceph-osd() [0x9c830a]
 2: (()+0xfcb0) [0x7f86c6009cb0]
 3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x8079e5]
 4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const)+0x44c) [0x82215c]
 5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base const, void const*)+0xc0) [0x867390]
 6: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_queued_events()+0xfb) [0x84d70b]
 7: (boost::statechart::state_machineReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocatorvoid, boost::statechart::null_exception_translator::process_event(boost::statechart::event_base const)+0x1e) [0x84d8de]
 8: (ReplicatedPG::snap_trimmer()+0x588) [0x7cc118]
 9: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x675f14]
 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xa9a366]
 11: (ThreadPool::WorkThread::entry()+0x10) [0xa9c380]
 12: (()+0x7e9a) [0x7f86c6001e9a]
 13: (clone()+0x6d) [0x7f86c4efb31d]

 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this.

--- begin dump of recent events ---
-1 2014-10-24 11:15:20.324218 7f869c4f9700  1 -- 192.168.1.2:0/28594 -- 192.168.1.3:6853/4658 -- osd_ping(ping e101783 stamp 2014-10-24 11:15:20.322565) v2 -- ?+0 0xe955e00 con 0x20af2b00
 - 2014-10-24 11:15:20.324268 7f869c4f9700  1 -- 192.168.1.2:0/28594 -- 192.168.1.3:6862/22372 -- osd_ping(ping e101783 stamp 2014-10-24 11:15:20.322565) v2 -- ?+0 0x22e2de00 con 0x20af2840
 -9998 2014-10-24 11:15:20.324313 7f869c4f9700  1 -- 192.168.1.2:0/28594 -- 192.168.1.3:6863/22372 -- osd_ping(ping e101783 stamp 2014-10-24 11:15:20.322565) v2 -- ?+0 0x22e2d700 con 0x20af26e0


 -9713 2014-10-24 11:15:20.365153 7f86ae51d700  5 -- op tracker -- , seq: 18573, time: 2014-10-24 11:15:20.365153, event: done, request: osd_op(client.7869019.0:6944380 rbd_data.451e822ae8944a.0128 [set-alloc-hint object_size 4194304 write_size 4194304,write 479232~4096] 6.b4cc39f6 snapc 18ee=[18ee] ack+ondisk+write e101783) v4
 -9712 2014-10-24 11:15:20.365266 7f86ae51d700  5 -- op tracker -- , seq: 18576, time: 2014-10-24 11:15:20.365266, event: done, request: osd_sub_op_reply(client.7869019.0:6944380 6.9f6 b4cc39f6/rbd_data.451e822ae8944a.0128/head//6 [] ondisk, result = 0) v2

 -9711 2014-10-24 

Re: [ceph-users] Can't start OSD

2014-08-08 Thread Karan Singh
Try to make these OSD IN

ceph osd in osd.12 osd.13 osd.14 osd.15

Then restart osd services 


- Karan Singh -

On 08 Aug 2014, at 00:55, O'Reilly, Dan daniel.orei...@dish.com wrote:

 # idweight  type name   up/down reweight
 -1  7.2 root default
 -2  1.8 host tm1cldosdl01
 0   0.45osd.0   up  1
 1   0.45osd.1   up  1
 2   0.45osd.2   up  1
 3   0.45osd.3   up  1
 -3  1.8 host tm1cldosdl02
 4   0.45osd.4   up  1
 5   0.45osd.5   up  1
 6   0.45osd.6   up  1
 7   0.45osd.7   up  1
 -4  1.8 host tm1cldosdl03
 8   0.45osd.8   up  1
 9   0.45osd.9   up  1
 10  0.45osd.10  up  1
 11  0.45osd.11  up  1
 -5  1.8 host tm1cldosdl04
 12  0.45osd.12  down0
 13  0.45osd.13  down0
 14  0.45osd.14  down0
 15  0.45osd.15  down0
 [ceph@tm1cldosdl04 ~]$ sudo /etc/init.d/ceph start osd.12
 /etc/init.d/ceph: osd.12 not found (/etc/ceph/ceph.conf defines , 
 /var/lib/ceph defines )
  
 What am I missing?  Specifically, what would need to be in ceph.conf or 
 /var/lib/ceph?
  
 Dan O'Reilly
 UNIX Systems Administration
 image001.jpg
 9601 S. Meridian Blvd.
 Englewood, CO 80112
 720-514-6293
  
  
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't start OSD

2014-08-08 Thread O'Reilly, Dan
Nope.  Nothing works.  This is VERY frustrating.

What happened:


-  I rebooted the box, simulating a system failure.

-  When the system came back up, ceph wasn't started, and the osd 
volumes weren't mounted.

-  I did a service ceph start osd and the ceph processes don't start

-  I did a ceph-deploy activate on the devices,  so they're mounted.  
service ceph start still doesn't start anything.

Right now:

# service ceph restart
=== osd.18 ===
=== osd.18 ===
Stopping Ceph osd.18 on tm1cldosdl04...done
=== osd.18 ===
create-or-move updated item name 'osd.18' weight 0.45 at location 
{host=tm1cldosdl04,root=default} to crush map
Starting Ceph osd.18 on tm1cldosdl04...
starting osd.18 at :/0 osd_data /var/lib/ceph/osd/ceph-18 
/var/lib/ceph/osd/ceph-18/journal
=== osd.17 ===
=== osd.17 ===
Stopping Ceph osd.17 on tm1cldosdl04...done
=== osd.17 ===
create-or-move updated item name 'osd.17' weight 0.45 at location 
{host=tm1cldosdl04,root=default} to crush map
Starting Ceph osd.17 on tm1cldosdl04...
starting osd.17 at :/0 osd_data /var/lib/ceph/osd/ceph-17 
/var/lib/ceph/osd/ceph-17/journal
=== osd.19 ===
=== osd.19 ===
Stopping Ceph osd.19 on tm1cldosdl04...done
=== osd.19 ===
create-or-move updated item name 'osd.19' weight 0.45 at location 
{host=tm1cldosdl04,root=default} to crush map
Starting Ceph osd.19 on tm1cldosdl04...
starting osd.19 at :/0 osd_data /var/lib/ceph/osd/ceph-19 
/var/lib/ceph/osd/ceph-19/journal
=== osd.16 ===
=== osd.16 ===
Stopping Ceph osd.16 on tm1cldosdl04...done
=== osd.16 ===
create-or-move updated item name 'osd.16' weight 0.45 at location 
{host=tm1cldosdl04,root=default} to crush map
Starting Ceph osd.16 on tm1cldosdl04...
starting osd.16 at :/0 osd_data /var/lib/ceph/osd/ceph-16 
/var/lib/ceph/osd/ceph-16/journal
[NEW:note: root@tm1cldosdl04 on parent: /root]
# ps -eaf|grep ceph
root  7528  6124  0 07:32 pts/000:00:00 grep ceph
[NEW:note: root@tm1cldosdl04 on parent: /root]
# ceph osd tree
# idweight  type name   up/down reweight
-1  9   root default
-2  1.8 host tm1cldosdl01
0   0.45osd.0   up  1
1   0.45osd.1   up  1
2   0.45osd.2   up  1
3   0.45osd.3   up  1
-3  1.8 host tm1cldosdl02
4   0.45osd.4   up  1
5   0.45osd.5   up  1
6   0.45osd.6   up  1
7   0.45osd.7   up  1
-4  1.8 host tm1cldosdl03
8   0.45osd.8   up  1
9   0.45osd.9   up  1
10  0.45osd.10  up  1
11  0.45osd.11  up  1
-5  3.6 host tm1cldosdl04
12  0.45osd.12  DNE
13  0.45osd.13  DNE
14  0.45osd.14  DNE
15  0.45osd.15  DNE
16  0.45osd.16  down0
17  0.45osd.17  down0
18  0.45osd.18  down0
19  0.45osd.19  down0

I'm missing something here.  I don't know if it's a config issue or what.  But 
the docs haven't helped me.

From: Karan Singh [mailto:karan.si...@csc.fi]
Sent: Friday, August 08, 2014 1:11 AM
To: O'Reilly, Dan
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Can't start OSD

Try to make these OSD IN

ceph osd in osd.12 osd.13 osd.14 osd.15

Then restart osd services


- Karan Singh -

On 08 Aug 2014, at 00:55, O'Reilly, Dan 
daniel.orei...@dish.commailto:daniel.orei...@dish.com wrote:


# idweight  type name   up/down reweight
-1  7.2 root default
-2  1.8 host tm1cldosdl01
0   0.45osd.0   up  1
1   0.45osd.1   up  1
2   0.45osd.2   up  1
3   0.45osd.3   up  1
-3  1.8 host tm1cldosdl02
4   0.45osd.4   up  1
5   0.45osd.5   up  1
6   0.45osd.6   up  1
7   0.45osd.7   up  1
-4  1.8 host tm1cldosdl03
8   0.45osd.8   up  1
9   0.45osd.9   up  1
10  0.45osd.10  up  1
11  0.45osd.11  up  1
-5  1.8 host tm1cldosdl04
12  0.45osd.12  down0
13  0.45osd.13  down0
14  0.45osd.14  down0
15  0.45osd.15  down0
[ceph@tm1cldosdl04 ~]$ sudo /etc/init.d/ceph start osd.12
/etc/init.d/ceph: osd.12 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph 
defines )

What am I missing?  Specifically, what would need to be in ceph.conf or 
/var

Re: [ceph-users] Can't start OSD

2014-08-08 Thread German Anders

How about the logs? Is something there?

ls /var/log/ceph/


German Anders



















--- Original message ---
Asunto: Re: [ceph-users] Can't start OSD
De: O'Reilly, Dan daniel.orei...@dish.com
Para: Karan Singh karan.si...@csc.fi
Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com
Fecha: Friday, 08/08/2014 10:53



Nope.  Nothing works.  This is VERY frustrating.

What happened:

-  I rebooted the box, simulating a system failure.
-  When the system came back up, ceph wasn’t started, and 
the osd volumes weren’t mounted.
-  I did a “service ceph start osd” and the ceph processes 
don’t start
-  I did a “ceph-deploy activate” on the devices,  so 
they’re mounted.  “service ceph start” still doesn’t start 
anything.


Right now:

# service ceph restart
=== osd.18 ===
=== osd.18 ===
Stopping Ceph osd.18 on tm1cldosdl04...done
=== osd.18 ===
create-or-move updated item name 'osd.18' weight 0.45 at location 
{host=tm1cldosdl04,root=default} to crush map

Starting Ceph osd.18 on tm1cldosdl04...
starting osd.18 at :/0 osd_data /var/lib/ceph/osd/ceph-18 
/var/lib/ceph/osd/ceph-18/journal

=== osd.17 ===
=== osd.17 ===
Stopping Ceph osd.17 on tm1cldosdl04...done
=== osd.17 ===
create-or-move updated item name 'osd.17' weight 0.45 at location 
{host=tm1cldosdl04,root=default} to crush map

Starting Ceph osd.17 on tm1cldosdl04...
starting osd.17 at :/0 osd_data /var/lib/ceph/osd/ceph-17 
/var/lib/ceph/osd/ceph-17/journal

=== osd.19 ===
=== osd.19 ===
Stopping Ceph osd.19 on tm1cldosdl04...done
=== osd.19 ===
create-or-move updated item name 'osd.19' weight 0.45 at location 
{host=tm1cldosdl04,root=default} to crush map

Starting Ceph osd.19 on tm1cldosdl04...
starting osd.19 at :/0 osd_data /var/lib/ceph/osd/ceph-19 
/var/lib/ceph/osd/ceph-19/journal

=== osd.16 ===
=== osd.16 ===
Stopping Ceph osd.16 on tm1cldosdl04...done
=== osd.16 ===
create-or-move updated item name 'osd.16' weight 0.45 at location 
{host=tm1cldosdl04,root=default} to crush map

Starting Ceph osd.16 on tm1cldosdl04...
starting osd.16 at :/0 osd_data /var/lib/ceph/osd/ceph-16 
/var/lib/ceph/osd/ceph-16/journal

[NEW:note: root@tm1cldosdl04 on parent: /root]
# ps -eaf|grep ceph
root  7528  6124  0 07:32 pts/000:00:00 grep ceph
[NEW:note: root@tm1cldosdl04 on parent: /root]
# ceph osd tree
# idweight  type name   up/down reweight
-1  9   root default
-2  1.8 host tm1cldosdl01
0   0.45osd.0   up  1
1   0.45osd.1   up  1
2   0.45osd.2   up  1
3   0.45osd.3   up  1
-3  1.8 host tm1cldosdl02
4   0.45osd.4   up  1
5   0.45osd.5   up  1
6   0.45osd.6   up  1
7   0.45osd.7   up  1
-4  1.8 host tm1cldosdl03
8   0.45osd.8   up  1
9   0.45osd.9   up  1
10  0.45osd.10  up  1
11  0.45osd.11  up  1
-5  3.6 host tm1cldosdl04
12  0.45osd.12  DNE
13  0.45osd.13  DNE
14  0.45osd.14  DNE
15  0.45osd.15  DNE
16  0.45osd.16  down0
17  0.45osd.17  down0
18  0.45osd.18  down0
19  0.45osd.19  down0

I’m missing something here.  I don’t know if it’s a config issue 
or what.  But the docs haven’t helped me.




From: Karan Singh [mailto:karan.si...@csc.fi]
Sent: Friday, August 08, 2014 1:11 AM
To: O'Reilly, Dan
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Can't start OSD


Try to make these OSD IN



ceph osd in osd.12 osd.13 osd.14 osd.15



Then restart osd services






- Karan Singh -



On 08 Aug 2014, at 00:55, O'Reilly, Dan daniel.orei...@dish.com 
wrote:






# idweight  type name   up/down reweight

-1  7.2 root default

-2  1.8 host tm1cldosdl01

0   0.45osd.0   up  1

1   0.45osd.1   up  1

2   0.45osd.2   up  1

3   0.45osd.3   up  1

-3  1.8 host tm1cldosdl02

4   0.45osd.4   up  1

5   0.45osd.5   up  1

6   0.45osd.6   up  1

7   0.45osd.7   up  1

-4  1.8 host tm1cldosdl03

8   0.45osd.8   up  1

9   0.45osd.9   up  1

10  0.45osd.10  up  1

11  0.45osd.11  up  1

-5  1.8 host tm1cldosdl04

12  0.45osd.12  down0

13  0.45

Re: [ceph-users] Can't start OSD

2014-08-08 Thread O'Reilly, Dan
I’m afraid I don’t know exactly how to interpret this, but after a reboot:

2014-08-08 08:48:44.616005 7f0c3b1447a0  0 ceph version 0.80.1 
(a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 2978
2014-08-08 08:48:44.635680 7f0c3b1447a0  0 filestore(/var/lib/ceph/osd/ceph-10) 
mount detected xfs (libxfs)
2014-08-08 08:48:44.635730 7f0c3b1447a0  1 filestore(/var/lib/ceph/osd/ceph-10) 
 disabling 'filestore replica fadvise' due to known issues with 
fadvise(DONTNEED) on xfs
2014-08-08 08:48:44.681911 7f0c3b1447a0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP 
ioctl is supported and appears to work
2014-08-08 08:48:44.681959 7f0c3b1447a0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP 
ioctl is disabled via 'filestore fiemap' config option
2014-08-08 08:48:44.748483 7f0c3b1447a0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: 
syscall(SYS_syncfs, fd) fully supported
2014-08-08 08:48:44.748605 7f0c3b1447a0  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_feature: extsize is 
supported
2014-08-08 08:48:44.889826 7f0c3b1447a0  0 filestore(/var/lib/ceph/osd/ceph-10) 
mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2014-08-08 08:48:45.064198 7f0c3b1447a0 -1 filestore(/var/lib/ceph/osd/ceph-10) 
mount failed to open journal /var/lib/ceph/osd/ceph-10/journal: (2) No such 
file or directory
2014-08-08 08:48:45.074220 7f0c3b1447a0 -1  ** ERROR: error converting store 
/var/lib/ceph/osd/ceph-10: (2) No such file or directory
2014-08-08 08:49:19.957725 7f2c40c1a7a0  0 ceph version 0.80.1 
(a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 4707
2014-08-08 08:49:19.973896 7f2c40c1a7a0  0 filestore(/var/lib/ceph/osd/ceph-10) 
mount detected xfs (libxfs)
2014-08-08 08:49:19.973931 7f2c40c1a7a0  1 filestore(/var/lib/ceph/osd/ceph-10) 
 disabling 'filestore replica fadvise' due to known issues with 
fadvise(DONTNEED) on xfs
2014-08-08 08:49:20.016413 7f2c40c1a7a0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP 
ioctl is supported and appears to work
2014-08-08 08:49:20.016444 7f2c40c1a7a0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP 
ioctl is disabled via 'filestore fiemap' config option
2014-08-08 08:49:20.083052 7f2c40c1a7a0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: 
syscall(SYS_syncfs, fd) fully supported
2014-08-08 08:49:20.083179 7f2c40c1a7a0  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_feature: extsize is 
supported
2014-08-08 08:49:20.134213 7f2c40c1a7a0  0 filestore(/var/lib/ceph/osd/ceph-10) 
mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2014-08-08 08:49:20.136710 7f2c40c1a7a0 -1 filestore(/var/lib/ceph/osd/ceph-10) 
mount failed to open journal /var/lib/ceph/osd/ceph-10/journal: (2) No such 
file or directory
2014-08-08 08:49:20.146797 7f2c40c1a7a0 -1  ** ERROR: error converting store 
/var/lib/ceph/osd/ceph-10: (2) No such file or directory

From: German Anders [mailto:gand...@despegar.com]
Sent: Friday, August 08, 2014 8:23 AM
To: O'Reilly, Dan
Cc: Karan Singh; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Can't start OSD

How about the logs? Is something there?

ls /var/log/ceph/

German Anders















--- Original message ---
Asunto: Re: [ceph-users] Can't start OSD
De: O'Reilly, Dan daniel.orei...@dish.commailto:daniel.orei...@dish.com
Para: Karan Singh karan.si...@csc.fimailto:karan.si...@csc.fi
Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Fecha: Friday, 08/08/2014 10:53


Nope.  Nothing works.  This is VERY frustrating.

What happened:

-  I rebooted the box, simulating a system failure.
-  When the system came back up, ceph wasn’t started, and the osd 
volumes weren’t mounted.
-  I did a “service ceph start osd” and the ceph processes don’t start
-  I did a “ceph-deploy activate” on the devices,  so they’re mounted.  
“service ceph start” still doesn’t start anything.

Right now:

# service ceph restart
=== osd.18 ===
=== osd.18 ===
Stopping Ceph osd.18 on tm1cldosdl04...done
=== osd.18 ===
create-or-move updated item name 'osd.18' weight 0.45 at location 
{host=tm1cldosdl04,root=default} to crush map
Starting Ceph osd.18 on tm1cldosdl04...
starting osd.18 at :/0 osd_data /var/lib/ceph/osd/ceph-18 
/var/lib/ceph/osd/ceph-18/journal
=== osd.17 ===
=== osd.17 ===
Stopping Ceph osd.17 on tm1cldosdl04...done
=== osd.17 ===
create-or-move updated item name 'osd.17' weight 0.45 at location 
{host=tm1cldosdl04,root=default} to crush map
Starting Ceph osd.17 on tm1cldosdl04...
starting osd.17 at :/0 osd_data /var/lib/ceph/osd/ceph-17 
/var/lib/ceph/osd/ceph-17/journal
=== osd.19 ===
=== osd.19 ===
Stopping Ceph osd.19 on tm1cldosdl04...done
=== osd.19 ===
create-or-move updated item name 'osd.19' weight 0.45 at location 
{host

Re: [ceph-users] Can't start OSD

2014-08-08 Thread Matt Harlum
Hi,

Can you run ls -lah /var/lib/ceph/osd/ceph-10/journal

It’s saying it can’t find the journal

Regards,
Matt 

On 9 Aug 2014, at 12:51 am, O'Reilly, Dan daniel.orei...@dish.com wrote:

 I’m afraid I don’t know exactly how to interpret this, but after a reboot:
  
 2014-08-08 08:48:44.616005 7f0c3b1447a0  0 ceph version 0.80.1 
 (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 2978
 2014-08-08 08:48:44.635680 7f0c3b1447a0  0 
 filestore(/var/lib/ceph/osd/ceph-10) mount detected xfs (libxfs)
 2014-08-08 08:48:44.635730 7f0c3b1447a0  1 
 filestore(/var/lib/ceph/osd/ceph-10)  disabling 'filestore replica fadvise' 
 due to known issues with fadvise(DONTNEED) on xfs
 2014-08-08 08:48:44.681911 7f0c3b1447a0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP 
 ioctl is supported and appears to work
 2014-08-08 08:48:44.681959 7f0c3b1447a0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP 
 ioctl is disabled via 'filestore fiemap' config option
 2014-08-08 08:48:44.748483 7f0c3b1447a0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: 
 syscall(SYS_syncfs, fd) fully supported
 2014-08-08 08:48:44.748605 7f0c3b1447a0  0 
 xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_feature: extsize is 
 supported
 2014-08-08 08:48:44.889826 7f0c3b1447a0  0 
 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode: 
 checkpoint is not enabled
 2014-08-08 08:48:45.064198 7f0c3b1447a0 -1 
 filestore(/var/lib/ceph/osd/ceph-10) mount failed to open journal 
 /var/lib/ceph/osd/ceph-10/journal: (2) No such file or directory
 2014-08-08 08:48:45.074220 7f0c3b1447a0 -1  ** ERROR: error converting store 
 /var/lib/ceph/osd/ceph-10: (2) No such file or directory
 2014-08-08 08:49:19.957725 7f2c40c1a7a0  0 ceph version 0.80.1 
 (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 4707
 2014-08-08 08:49:19.973896 7f2c40c1a7a0  0 
 filestore(/var/lib/ceph/osd/ceph-10) mount detected xfs (libxfs)
 2014-08-08 08:49:19.973931 7f2c40c1a7a0  1 
 filestore(/var/lib/ceph/osd/ceph-10)  disabling 'filestore replica fadvise' 
 due to known issues with fadvise(DONTNEED) on xfs
 2014-08-08 08:49:20.016413 7f2c40c1a7a0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP 
 ioctl is supported and appears to work
 2014-08-08 08:49:20.016444 7f2c40c1a7a0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP 
 ioctl is disabled via 'filestore fiemap' config option
 2014-08-08 08:49:20.083052 7f2c40c1a7a0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: 
 syscall(SYS_syncfs, fd) fully supported
 2014-08-08 08:49:20.083179 7f2c40c1a7a0  0 
 xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_feature: extsize is 
 supported
 2014-08-08 08:49:20.134213 7f2c40c1a7a0  0 
 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode: 
 checkpoint is not enabled
 2014-08-08 08:49:20.136710 7f2c40c1a7a0 -1 
 filestore(/var/lib/ceph/osd/ceph-10) mount failed to open journal 
 /var/lib/ceph/osd/ceph-10/journal: (2) No such file or directory
 2014-08-08 08:49:20.146797 7f2c40c1a7a0 -1  ** ERROR: error converting store 
 /var/lib/ceph/osd/ceph-10: (2) No such file or directory
  
 From: German Anders [mailto:gand...@despegar.com] 
 Sent: Friday, August 08, 2014 8:23 AM
 To: O'Reilly, Dan
 Cc: Karan Singh; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Can't start OSD
  
 How about the logs? Is something there?
 
 ls /var/log/ceph/
  
 German Anders
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 --- Original message --- 
 Asunto: Re: [ceph-users] Can't start OSD 
 De: O'Reilly, Dan daniel.orei...@dish.com 
 Para: Karan Singh karan.si...@csc.fi 
 Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com 
 Fecha: Friday, 08/08/2014 10:53
 
 
 Nope.  Nothing works.  This is VERY frustrating.
  
 What happened:
  
 -  I rebooted the box, simulating a system failure.
 -  When the system came back up, ceph wasn’t started, and the osd 
 volumes weren’t mounted.
 -  I did a “service ceph start osd” and the ceph processes don’t start
 -  I did a “ceph-deploy activate” on the devices,  so they’re 
 mounted.  “service ceph start” still doesn’t start anything.
  
 Right now:
  
 # service ceph restart
 === osd.18 ===
 === osd.18 ===
 Stopping Ceph osd.18 on tm1cldosdl04...done
 === osd.18 ===
 create-or-move updated item name 'osd.18' weight 0.45 at location 
 {host=tm1cldosdl04,root=default} to crush map
 Starting Ceph osd.18 on tm1cldosdl04...
 starting osd.18 at :/0 osd_data /var/lib/ceph/osd/ceph-18 
 /var/lib/ceph/osd/ceph-18/journal
 === osd.17 ===
 === osd.17 ===
 Stopping Ceph osd.17 on tm1cldosdl04...done
 === osd.17 ===
 create-or-move updated item name 'osd.17' weight 0.45 at location 
 {host=tm1cldosdl04,root=default} to crush map
 Starting Ceph osd.17 on tm1cldosdl04...
 starting osd.17 at :/0 osd_data /var/lib/ceph

[ceph-users] Can't start OSD

2014-08-07 Thread O'Reilly, Dan
# idweight  type name   up/down reweight
-1  7.2 root default
-2  1.8 host tm1cldosdl01
0   0.45osd.0   up  1
1   0.45osd.1   up  1
2   0.45osd.2   up  1
3   0.45osd.3   up  1
-3  1.8 host tm1cldosdl02
4   0.45osd.4   up  1
5   0.45osd.5   up  1
6   0.45osd.6   up  1
7   0.45osd.7   up  1
-4  1.8 host tm1cldosdl03
8   0.45osd.8   up  1
9   0.45osd.9   up  1
10  0.45osd.10  up  1
11  0.45osd.11  up  1
-5  1.8 host tm1cldosdl04
12  0.45osd.12  down0
13  0.45osd.13  down0
14  0.45osd.14  down0
15  0.45osd.15  down0
[ceph@tm1cldosdl04 ~]$ sudo /etc/init.d/ceph start osd.12
/etc/init.d/ceph: osd.12 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph 
defines )

What am I missing?  Specifically, what would need to be in ceph.conf or 
/var/lib/ceph?

Dan O'Reilly
UNIX Systems Administration
[cid:image001.jpg@01CFB257.7F6F8E70]
9601 S. Meridian Blvd.
Englewood, CO 80112
720-514-6293


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com