Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-18 Thread Malcolm Haak
I think I might have found the issue

Something is wrong with my crush map.

I was just attempting to modify it 

microserver-1:~ #  ceph osd getcrushmap -o /tmp/cm
got crush map from osdmap epoch 3937
microserver-1:~ # crushtool -d /tmp/cm -o /tmp/cm.txt
microserver-1:~ # vim /tmp/cm.txt 
microserver-1:~ # crushtool -c /tmp/cm.txt -o /tmp/cm.new
microserver-1:~ # ceph osd setcrushmap -i /tmp/cm.new 
Error EINVAL: Failed to parse crushmap: buffer::end_of_buffer
microserver-1:~ # crushtool -c /tmp/cm.txt -o /tmp/cm.new
microserver-1:~ # ceph osd setcrushmap -i /tmp/cm.new 
Error EPERM: Failed to parse crushmap: error running crushmap through 
crushtool: (1) Operation not permitted

It's like something is missing or broken from my crush map. This cluster has 
been around for at least two years and has been upgraded to each new version of 
ceph. 




-Original Message-
From: Malcolm Haak 
Sent: Wednesday, 18 March 2015 12:53 PM
To: Malcolm Haak; Joao Eduardo Luis; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Sorry to bump this one, but I have more hardware coming and I still cannot add 
another OSD to my cluster..

Does anybody have any clues?

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Malcolm Haak
Sent: Friday, 13 March 2015 10:05 AM
To: Joao Eduardo Luis; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Sorry about this,

I sent this at 1AM last night and went to bed, I didn't realise the log was far 
too long and the email had been blocked... 

I've reattached all the requested files and trimmed the body of the email. 

Thank you again for looking at this.

-Original Message-
From: Malcolm Haak
Sent: Friday, 13 March 2015 1:38 AM
To: 'Joao Eduardo Luis'; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Ok,

So, I've been doing things in the meantime and as such the osd is now 
requesting 3008 and 3009 instead of 2758/9 I've included the problem OSD's log 
file.

And attached all the osdmap's as requested.

Regards

Malcolm Haak

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao 
Eduardo Luis
Sent: Friday, 13 March 2015 1:02 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

On 03/12/2015 05:16 AM, Malcolm Haak wrote:
 Sorry about all the unrelated grep issues..

 So I've rebuilt and reinstalled and it's still broken.

 On the working node, even with the new packages, everything works.
 On the new broken node, I've added a mon and it works. But I still cannot 
 start an OSD on the new node.

 What else do you need from me? I'll get logs run any number of tests.

 I've got data in this cluster already, and it's full so I need to expand it, 
 I've already got hardware.

 Thanks in advance for even having a look

Sam mentioned to me on IRC that the next step would be to grab the offending 
osdmaps.  Easiest way for that will be to stop a monitor and run 
'ceph-monstore-tool' in order to obtain the full maps, and then use 
'ceph-kvstore-tool' to obtain incrementals.

Given the osd is crashing on version 2759, the following would be best:

(Assuming you have stopped a given monitor with id FOO, whose store is sitting 
at default path /var/lib/ceph/mon/ceph-FOO)

ceph-monstore-tool /var/lib/ceph/mon/ceph-FOO get osdmap -- --version
2758 --out /tmp/osdmap.full.2758

ceph-monstore-tool /var/lib/ceph/mon/ceph-FOO get osdmap -- --version
2759 --out /tmp/osdmap.full.2759

(please note the '--' between 'osdmap' and '--version', as that is required for 
the tool to do its thing)

and then

ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db get osdmap 2758 out 
/tmp/osdmap.inc.2758

ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db get osdmap 2759 out 
/tmp/osdmap.inc.2759

Cheers!

   -Joao




 -Original Message-
 From: Samuel Just [mailto:sj...@redhat.com]
 Sent: Wednesday, 11 March 2015 1:41 AM
 To: Malcolm Haak; jl...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to 
 existing cluster

 Joao, it looks like map 2759 is causing trouble, how would he get the 
 full and incremental maps for that out of the mons?
 -Sam

 On Tue, 2015-03-10 at 14:12 +, Malcolm Haak wrote:
 Hi Samuel,

 The sha1? I'm going to admit ignorance as to what you are looking for. They 
 are all running the same release if that is what you are asking.
 Same tarball built into rpms using rpmbuild on both nodes...
 Only difference being that the other node has been upgraded and the problem 
 node is fresh.

 added the requested config here is the command line output

 microserver-1:/etc # /etc/init.d/ceph start osd.3 
 ___
 ceph-users mailing list
 ceph-users

Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-12 Thread Malcolm Haak
Hi all,

So the init script issue is sorted.. my grep binary is not working correctly.  
I've replaced it and everything seems to be fine. 

Which now has me wondering if the binaries I generated are any good... the bad 
grep might have caused issues with the build...

I'm going to recompile after some more sanity testing..

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Malcolm Haak
Sent: Wednesday, 11 March 2015 8:56 PM
To: Samuel Just; jl...@redhat.com
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

I ran ceph-osd via the command line...

It's not really given me much more to go off...  Well except that it's hitting 
an early end of buffer for some reason.

Also I've hit another issue... 

The /etc/init.d/ceph script is not seeing my new mon (I decided to add more 
mon's to see if it would help since the mon map looks like it is the issue)

The script starts the mon fine. And the new mon (on the same host as this 
problem osd) appears to be good. 

The issue is when you do /etc/init.d/ceph status 

It tells you the mon.b is dead.. It seems to be one of the greps that is failing
Specifically 
grep -qwe -i.$daemon_id /proc/\$pid/cmdline
returns 1

What's odd is the same grep works on the other node for mon.a it just doesn't 
work on this node for mon.b

I'm wondering if there is something odd happening. 

Anyway here is the output of the manual start of ceph-osd


# /usr/bin/ceph-osd -i 3 --pid-file /var/run/ceph/osd.3.pid -c 
/etc/ceph/ceph.conf --cluster ceph -f
starting osd.3 at :/0 osd_data /var/lib/ceph/osd/ceph-3 
/var/lib/ceph/osd/ceph-3/journal
2015-03-11 20:38:56.401205 7f04221e6880 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio t   

o force use of aio anyway
2015-03-11 20:38:56.418747 7f04221e6880 -1 osd.3 2757 log_to_monitors 
{default=true}
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
  what():  buffer::end_of_buffer
*** Caught signal (Aborted) **
 in thread 7f041192a700
 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
 1: /usr/bin/ceph-osd() [0xac7cea]
 2: (()+0x10050) [0x7f04210f1050]
 3: (gsignal()+0x37) [0x7f041f5c40f7]
 4: (abort()+0x13a) [0x7f041f5c54ca]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f041fea9fe5]
 6: (()+0x63186) [0x7f041fea8186]
 7: (()+0x631b3) [0x7f041fea81b3]
 8: (()+0x633d2) [0x7f041fea83d2]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137) [0xc2cea7]
 10: (OSDMap::decode_classic(ceph::buffer::list::iterator)+0x605) [0xb7b7b5]
 11: (OSDMap::decode(ceph::buffer::list::iterator)+0x8c) [0xb7bebc]
 12: (OSDMap::decode(ceph::buffer::list)+0x3f) [0xb7dfbf]
 13: (OSD::handle_osd_map(MOSDMap*)+0xd37) [0x6cd9a7]
 14: (OSD::_dispatch(Message*)+0x3eb) [0x6d0afb]
 15: (OSD::ms_dispatch(Message*)+0x257) [0x6d1007]
 16: (DispatchQueue::entry()+0x649) [0xc6fe09]
 17: (DispatchQueue::DispatchThread::entry()+0xd) [0xb9dd7d]
 18: (()+0x83a4) [0x7f04210e93a4]
 19: (clone()+0x6d) [0x7f041f673a4d]
2015-03-11 20:38:56.471624 7f041192a700 -1 *** Caught signal (Aborted) **
 in thread 7f041192a700

 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
 1: /usr/bin/ceph-osd() [0xac7cea]
 2: (()+0x10050) [0x7f04210f1050]
 3: (gsignal()+0x37) [0x7f041f5c40f7]
 4: (abort()+0x13a) [0x7f041f5c54ca]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f041fea9fe5]
 6: (()+0x63186) [0x7f041fea8186]
 7: (()+0x631b3) [0x7f041fea81b3]
 8: (()+0x633d2) [0x7f041fea83d2]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137) [0xc2cea7]
 10: (OSDMap::decode_classic(ceph::buffer::list::iterator)+0x605) [0xb7b7b5]
 11: (OSDMap::decode(ceph::buffer::list::iterator)+0x8c) [0xb7bebc]
 12: (OSDMap::decode(ceph::buffer::list)+0x3f) [0xb7dfbf]
 13: (OSD::handle_osd_map(MOSDMap*)+0xd37) [0x6cd9a7]
 14: (OSD::_dispatch(Message*)+0x3eb) [0x6d0afb]
 15: (OSD::ms_dispatch(Message*)+0x257) [0x6d1007]
 16: (DispatchQueue::entry()+0x649) [0xc6fe09]
 17: (DispatchQueue::DispatchThread::entry()+0xd) [0xb9dd7d]
 18: (()+0x83a4) [0x7f04210e93a4]
 19: (clone()+0x6d) [0x7f041f673a4d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

  -308 2015-03-11 20:38:56.401205 7f04221e6880 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_for   

ce_aio to force use of aio anyway
   -77 2015-03-11 20:38:56.418747 7f04221e6880 -1 osd.3 2757 log_to_monitors 
{default=true}
 0 2015-03-11 20:38:56.471624 7f041192a700 -1 *** Caught signal (Aborted) 
**
 in thread 7f041192a700

 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
 1

Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-12 Thread Malcolm Haak
I've no idea if this helps. But I was looking in the meta file of osd.3 to see 
if things there made any sense.  I'm very much out of my depth.

To me this looks like a bug. Quite possibly a corner case, but bug none the 
less.

Anyway I've included my crush map and what look like the osdmap files out of 
the osd that wont start.

Cracking them open it appears that the new osd.3 is not in the map at all.. 
which might be correct, but I would have expected to see it in the layout. 

I've also added the current osdmap dump as well... 


If I'm asking in the wrong place, please let me know. I don't want to be 
wasting peoples time. 

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Malcolm Haak
Sent: Thursday, 12 March 2015 4:16 PM
To: Samuel Just; jl...@redhat.com
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Sorry about all the unrelated grep issues..

So I've rebuilt and reinstalled and it's still broken. 

On the working node, even with the new packages, everything works.
On the new broken node, I've added a mon and it works. But I still cannot start 
an OSD on the new node.

What else do you need from me? I'll get logs run any number of tests.

I've got data in this cluster already, and it's full so I need to expand it, 
I've already got hardware.

Thanks in advance for even having a look


-Original Message-
From: Samuel Just [mailto:sj...@redhat.com] 
Sent: Wednesday, 11 March 2015 1:41 AM
To: Malcolm Haak; jl...@redhat.com
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Joao, it looks like map 2759 is causing trouble, how would he get the
full and incremental maps for that out of the mons?
-Sam

On Tue, 2015-03-10 at 14:12 +, Malcolm Haak wrote:
 Hi Samuel,
 
 The sha1? I'm going to admit ignorance as to what you are looking for. They 
 are all running the same release if that is what you are asking. 
 Same tarball built into rpms using rpmbuild on both nodes... 
 Only difference being that the other node has been upgraded and the problem 
 node is fresh.
 
 added the requested config here is the command line output
 
 microserver-1:/etc # /etc/init.d/ceph start osd.3
 === osd.3 === 
 Mounting xfs on microserver-1:/var/lib/ceph/osd/ceph-3
 2015-03-11 01:00:13.492279 7f05b2f72700  1 -- :/0 messenger.start
 2015-03-11 01:00:13.492823 7f05b2f72700  1 -- :/1002795 -- 
 192.168.0.10:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 
 0x7f05ac0290b0 con 0x7f05ac027c40
 2015-03-11 01:00:13.510814 7f05b07ef700  1 -- 192.168.0.250:0/1002795 learned 
 my addr 192.168.0.250:0/1002795
 2015-03-11 01:00:13.527653 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 1  mon_map magic: 0 v1  191+0+0 (1112175541 
 0 0) 0x7f05aab0 con 0x7f05ac027c40
 2015-03-11 01:00:13.527899 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 2  auth_reply(proto 1 0 (0) Success) v1  
 24+0+0 (3859410672 0 0) 0x7f05ae70 con 0x7f05ac027c40
 2015-03-11 01:00:13.527973 7f05abfff700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7f05ac029730 
 con 0x7f05ac027c40
 2015-03-11 01:00:13.528124 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
 0x7f05ac029a50 con 0x7f05ac027c40
 2015-03-11 01:00:13.528265 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
 0x7f05ac029f20 con 0x7f05ac027c40
 2015-03-11 01:00:13.530359 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 3  mon_map magic: 0 v1  191+0+0 (1112175541 
 0 0) 0x7f05aab0 con 0x7f05ac027c40
 2015-03-11 01:00:13.530548 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 4  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.531114 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 5  osd_map(3277..3277 src has 2757..3277) v3 
  5366+0+0 (3110999244 0 0) 0x7f05a0002800 con 0x7f05ac027c40
 2015-03-11 01:00:13.531772 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.532186 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 7  osd_map(3277..3277 src has 2757..3277) v3 
  5366+0+0 (3110999244 0 0) 0x7f05a0001250 con 0x7f05ac027c40
 2015-03-11 01:00:13.532260 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 8  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.556748 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0

Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-12 Thread Joao Eduardo Luis

On 03/12/2015 05:16 AM, Malcolm Haak wrote:

Sorry about all the unrelated grep issues..

So I've rebuilt and reinstalled and it's still broken.

On the working node, even with the new packages, everything works.
On the new broken node, I've added a mon and it works. But I still cannot start 
an OSD on the new node.

What else do you need from me? I'll get logs run any number of tests.

I've got data in this cluster already, and it's full so I need to expand it, 
I've already got hardware.

Thanks in advance for even having a look


Sam mentioned to me on IRC that the next step would be to grab the 
offending osdmaps.  Easiest way for that will be to stop a monitor and 
run 'ceph-monstore-tool' in order to obtain the full maps, and then use 
'ceph-kvstore-tool' to obtain incrementals.


Given the osd is crashing on version 2759, the following would be best:

(Assuming you have stopped a given monitor with id FOO, whose store is 
sitting at default path /var/lib/ceph/mon/ceph-FOO)


ceph-monstore-tool /var/lib/ceph/mon/ceph-FOO get osdmap -- --version 
2758 --out /tmp/osdmap.full.2758


ceph-monstore-tool /var/lib/ceph/mon/ceph-FOO get osdmap -- --version 
2759 --out /tmp/osdmap.full.2759


(please note the '--' between 'osdmap' and '--version', as that is 
required for the tool to do its thing)


and then

ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db get osdmap 2758 
out /tmp/osdmap.inc.2758


ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db get osdmap 2759 
out /tmp/osdmap.inc.2759


Cheers!

  -Joao





-Original Message-
From: Samuel Just [mailto:sj...@redhat.com]
Sent: Wednesday, 11 March 2015 1:41 AM
To: Malcolm Haak; jl...@redhat.com
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Joao, it looks like map 2759 is causing trouble, how would he get the
full and incremental maps for that out of the mons?
-Sam

On Tue, 2015-03-10 at 14:12 +, Malcolm Haak wrote:

Hi Samuel,

The sha1? I'm going to admit ignorance as to what you are looking for. They are 
all running the same release if that is what you are asking.
Same tarball built into rpms using rpmbuild on both nodes...
Only difference being that the other node has been upgraded and the problem 
node is fresh.

added the requested config here is the command line output

microserver-1:/etc # /etc/init.d/ceph start osd.3
=== osd.3 ===
Mounting xfs on microserver-1:/var/lib/ceph/osd/ceph-3
2015-03-11 01:00:13.492279 7f05b2f72700  1 -- :/0 messenger.start
2015-03-11 01:00:13.492823 7f05b2f72700  1 -- :/1002795 -- 192.168.0.10:6789/0 
-- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f05ac0290b0 con 0x7f05ac027c40
2015-03-11 01:00:13.510814 7f05b07ef700  1 -- 192.168.0.250:0/1002795 learned 
my addr 192.168.0.250:0/1002795
2015-03-11 01:00:13.527653 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 1  mon_map magic: 0 v1  191+0+0 (1112175541 0 0) 
0x7f05aab0 con 0x7f05ac027c40
2015-03-11 01:00:13.527899 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 2  auth_reply(proto 1 0 (0) Success) v1  24+0+0 
(3859410672 0 0) 0x7f05ae70 con 0x7f05ac027c40
2015-03-11 01:00:13.527973 7f05abfff700  1 -- 192.168.0.250:0/1002795 -- 
192.168.0.10:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7f05ac029730 con 
0x7f05ac027c40
2015-03-11 01:00:13.528124 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
0x7f05ac029a50 con 0x7f05ac027c40
2015-03-11 01:00:13.528265 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
0x7f05ac029f20 con 0x7f05ac027c40
2015-03-11 01:00:13.530359 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 3  mon_map magic: 0 v1  191+0+0 (1112175541 0 0) 
0x7f05aab0 con 0x7f05ac027c40
2015-03-11 01:00:13.530548 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 4  mon_subscribe_ack(300s) v1  20+0+0 (3648139960 0 0) 
0x7f05afb0 con 0x7f05ac027c40
2015-03-11 01:00:13.531114 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 5  osd_map(3277..3277 src has 2757..3277) v3  5366+0+0 
(3110999244 0 0) 0x7f05a0002800 con 0x7f05ac027c40
2015-03-11 01:00:13.531772 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 (3648139960 0 0) 
0x7f05afb0 con 0x7f05ac027c40
2015-03-11 01:00:13.532186 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 7  osd_map(3277..3277 src has 2757..3277) v3  5366+0+0 
(3110999244 0 0) 0x7f05a0001250 con 0x7f05ac027c40
2015-03-11 01:00:13.532260 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 8  mon_subscribe_ack(300s) v1  20+0+0 (3648139960 0 0) 
0x7f05afb0 con 0x7f05ac027c40
2015-03-11 01:00:13.556748 7f05b2f72700  1

Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-12 Thread Malcolm Haak
Sorry about this,

I sent this at 1AM last night and went to bed, I didn't realise the log was far 
too long and the email had been blocked... 

I've reattached all the requested files and trimmed the body of the email. 

Thank you again for looking at this.

-Original Message-
From: Malcolm Haak 
Sent: Friday, 13 March 2015 1:38 AM
To: 'Joao Eduardo Luis'; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Ok,

So, I've been doing things in the meantime and as such the osd is now 
requesting 3008 and 3009 instead of 2758/9
I've included the problem OSD's log file.

And attached all the osdmap's as requested.

Regards

Malcolm Haak

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao 
Eduardo Luis
Sent: Friday, 13 March 2015 1:02 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

On 03/12/2015 05:16 AM, Malcolm Haak wrote:
 Sorry about all the unrelated grep issues..

 So I've rebuilt and reinstalled and it's still broken.

 On the working node, even with the new packages, everything works.
 On the new broken node, I've added a mon and it works. But I still cannot 
 start an OSD on the new node.

 What else do you need from me? I'll get logs run any number of tests.

 I've got data in this cluster already, and it's full so I need to expand it, 
 I've already got hardware.

 Thanks in advance for even having a look

Sam mentioned to me on IRC that the next step would be to grab the 
offending osdmaps.  Easiest way for that will be to stop a monitor and 
run 'ceph-monstore-tool' in order to obtain the full maps, and then use 
'ceph-kvstore-tool' to obtain incrementals.

Given the osd is crashing on version 2759, the following would be best:

(Assuming you have stopped a given monitor with id FOO, whose store is 
sitting at default path /var/lib/ceph/mon/ceph-FOO)

ceph-monstore-tool /var/lib/ceph/mon/ceph-FOO get osdmap -- --version 
2758 --out /tmp/osdmap.full.2758

ceph-monstore-tool /var/lib/ceph/mon/ceph-FOO get osdmap -- --version 
2759 --out /tmp/osdmap.full.2759

(please note the '--' between 'osdmap' and '--version', as that is 
required for the tool to do its thing)

and then

ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db get osdmap 2758 
out /tmp/osdmap.inc.2758

ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db get osdmap 2759 
out /tmp/osdmap.inc.2759

Cheers!

   -Joao




 -Original Message-
 From: Samuel Just [mailto:sj...@redhat.com]
 Sent: Wednesday, 11 March 2015 1:41 AM
 To: Malcolm Haak; jl...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing 
 cluster

 Joao, it looks like map 2759 is causing trouble, how would he get the
 full and incremental maps for that out of the mons?
 -Sam

 On Tue, 2015-03-10 at 14:12 +, Malcolm Haak wrote:
 Hi Samuel,

 The sha1? I'm going to admit ignorance as to what you are looking for. They 
 are all running the same release if that is what you are asking.
 Same tarball built into rpms using rpmbuild on both nodes...
 Only difference being that the other node has been upgraded and the problem 
 node is fresh.

 added the requested config here is the command line output

 microserver-1:/etc # /etc/init.d/ceph start osd.3
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


ceph-osd.3.log
Description: ceph-osd.3.log


osdmap.full.3008
Description: osdmap.full.3008


osdmap.full.3009
Description: osdmap.full.3009


osdmap.inc.3008
Description: osdmap.inc.3008


osdmap.inc.3009
Description: osdmap.inc.3009
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-11 Thread Malcolm Haak
Sorry about all the unrelated grep issues..

So I've rebuilt and reinstalled and it's still broken. 

On the working node, even with the new packages, everything works.
On the new broken node, I've added a mon and it works. But I still cannot start 
an OSD on the new node.

What else do you need from me? I'll get logs run any number of tests.

I've got data in this cluster already, and it's full so I need to expand it, 
I've already got hardware.

Thanks in advance for even having a look


-Original Message-
From: Samuel Just [mailto:sj...@redhat.com] 
Sent: Wednesday, 11 March 2015 1:41 AM
To: Malcolm Haak; jl...@redhat.com
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Joao, it looks like map 2759 is causing trouble, how would he get the
full and incremental maps for that out of the mons?
-Sam

On Tue, 2015-03-10 at 14:12 +, Malcolm Haak wrote:
 Hi Samuel,
 
 The sha1? I'm going to admit ignorance as to what you are looking for. They 
 are all running the same release if that is what you are asking. 
 Same tarball built into rpms using rpmbuild on both nodes... 
 Only difference being that the other node has been upgraded and the problem 
 node is fresh.
 
 added the requested config here is the command line output
 
 microserver-1:/etc # /etc/init.d/ceph start osd.3
 === osd.3 === 
 Mounting xfs on microserver-1:/var/lib/ceph/osd/ceph-3
 2015-03-11 01:00:13.492279 7f05b2f72700  1 -- :/0 messenger.start
 2015-03-11 01:00:13.492823 7f05b2f72700  1 -- :/1002795 -- 
 192.168.0.10:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 
 0x7f05ac0290b0 con 0x7f05ac027c40
 2015-03-11 01:00:13.510814 7f05b07ef700  1 -- 192.168.0.250:0/1002795 learned 
 my addr 192.168.0.250:0/1002795
 2015-03-11 01:00:13.527653 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 1  mon_map magic: 0 v1  191+0+0 (1112175541 
 0 0) 0x7f05aab0 con 0x7f05ac027c40
 2015-03-11 01:00:13.527899 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 2  auth_reply(proto 1 0 (0) Success) v1  
 24+0+0 (3859410672 0 0) 0x7f05ae70 con 0x7f05ac027c40
 2015-03-11 01:00:13.527973 7f05abfff700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7f05ac029730 
 con 0x7f05ac027c40
 2015-03-11 01:00:13.528124 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
 0x7f05ac029a50 con 0x7f05ac027c40
 2015-03-11 01:00:13.528265 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
 0x7f05ac029f20 con 0x7f05ac027c40
 2015-03-11 01:00:13.530359 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 3  mon_map magic: 0 v1  191+0+0 (1112175541 
 0 0) 0x7f05aab0 con 0x7f05ac027c40
 2015-03-11 01:00:13.530548 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 4  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.531114 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 5  osd_map(3277..3277 src has 2757..3277) v3 
  5366+0+0 (3110999244 0 0) 0x7f05a0002800 con 0x7f05ac027c40
 2015-03-11 01:00:13.531772 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.532186 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 7  osd_map(3277..3277 src has 2757..3277) v3 
  5366+0+0 (3110999244 0 0) 0x7f05a0001250 con 0x7f05ac027c40
 2015-03-11 01:00:13.532260 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 8  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.556748 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_command({prefix: get_command_descriptions} v 
 0) v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40
 2015-03-11 01:00:13.564968 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 9  mon_command_ack([{prefix: 
 get_command_descriptions}]=0  v0) v1  72+0+34995 (1092875540 0 
 1727986498) 0x7f05aa70 con 0x7f05ac027c40
 2015-03-11 01:00:13.770122 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_command({prefix: osd crush create-or-move, 
 args: [host=microserver-1, root=default], id: 3, weight: 1.81} v 0) 
 v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40
 2015-03-11 01:00:13.772299 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 10  mon_command_ack([{prefix: osd crush 
 create-or-move, args: [host=microserver-1, root=default], id: 3, 
 weight: 1.81}]=0 create-or-move updated item name 'osd.3' weight 1.81 at 
 location {host=microserver-1,root=default

Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-10 Thread Samuel Just
Can you reproduce this with

debug osd = 20
debug filestore = 20
debug ms = 1

on the crashing osd?  Also, what sha1 are the other osds and mons running?
-Sam

- Original Message -
From: Malcolm Haak malc...@sgi.com
To: ceph-users@lists.ceph.com
Sent: Tuesday, March 10, 2015 3:28:26 AM
Subject: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Hi all,

I've just attempted to add a new node and OSD to an existing ceph cluster (it's 
a small one I use as a NAS at home, not like the big production ones I normally 
work on) and it seems to be throwing some odd errors...

Just looking for where to poke it next... 

Log is below,

It's a two node cluster with 3 osd's in node A and one osd in the new node 
(It's going to have more eventually and node one will be retired after node 
three gets added)
And I've hit a weird snag.

I was running 0.80 but I ran into the 'Invalid Command' bug on the new node so 
I opted to jump to the latest code with the required patches already. 

Please let me know what else you need..

This is the log content when attempting to start the new OSD:

2015-03-10 19:28:48.795318 7f0774108880  0 ceph version 0.93 
(bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-osd, pid 10810
2015-03-10 19:28:48.817803 7f0774108880  0 filestore(/var/lib/ceph/osd/ceph-3) 
backend xfs (magic 0x58465342)
2015-03-10 19:28:48.866862 7f0774108880  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: FIEMAP ioctl 
is supported and appears to work
2015-03-10 19:28:48.866920 7f0774108880  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: FIEMAP ioctl 
is disabled via 'filestore fiemap' config option
2015-03-10 19:28:48.905069 7f0774108880  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)
2015-03-10 19:28:48.905467 7f0774108880  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_feature: extsize is 
supported and kernel 3.18.3-1-desktop = 3.5
2015-03-10 19:28:49.077872 7f0774108880  0 filestore(/var/lib/ceph/osd/ceph-3) 
mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-03-10 19:28:49.078321 7f0774108880 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
2015-03-10 19:28:49.078328 7f0774108880  1 journal _open 
/var/lib/ceph/osd/ceph-3/journal fd 19: 1073741824 bytes, block size 4096 
bytes, directio = 1, aio = 0
2015-03-10 19:28:49.079721 7f0774108880  1 journal _open 
/var/lib/ceph/osd/ceph-3/journal fd 19: 1073741824 bytes, block size 4096 
bytes, directio = 1, aio = 0
2015-03-10 19:28:49.080948 7f0774108880  0 cls cls/hello/cls_hello.cc:271: 
loading cls_hello
2015-03-10 19:28:49.094194 7f0774108880  0 osd.3 2757 crush map has features 
33816576, adjusting msgr requires for clients
2015-03-10 19:28:49.094211 7f0774108880  0 osd.3 2757 crush map has features 
33816576 was 8705, adjusting msgr requires for mons
2015-03-10 19:28:49.094217 7f0774108880  0 osd.3 2757 crush map has features 
33816576, adjusting msgr requires for osds
2015-03-10 19:28:49.094235 7f0774108880  0 osd.3 2757 load_pgs
2015-03-10 19:28:49.094279 7f0774108880  0 osd.3 2757 load_pgs opened 0 pgs
2015-03-10 19:28:49.095121 7f0774108880 -1 osd.3 2757 log_to_monitors 
{default=true}
2015-03-10 19:28:49.134104 7f0774108880  0 osd.3 2757 done with init, starting 
boot process
2015-03-10 19:28:49.149994 7f076384c700 -1 *** Caught signal (Aborted) **
 in thread 7f076384c700

 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
 1: /usr/bin/ceph-osd() [0xac7cea]
 2: (()+0x10050) [0x7f0773013050]
 3: (gsignal()+0x37) [0x7f07714e60f7]
 4: (abort()+0x13a) [0x7f07714e74ca]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f0771dcbfe5]
 6: (()+0x63186) [0x7f0771dca186]
 7: (()+0x631b3) [0x7f0771dca1b3]
 8: (()+0x633d2) [0x7f0771dca3d2]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137) [0xc2cea7]
 10: (OSDMap::decode_classic(ceph::buffer::list::iterator)+0x605) [0xb7b7b5]
 11: (OSDMap::decode(ceph::buffer::list::iterator)+0x8c) [0xb7bebc]
 12: (OSDMap::decode(ceph::buffer::list)+0x3f) [0xb7dfbf]
 13: (OSD::handle_osd_map(MOSDMap*)+0xd37) [0x6cd9a7]
 14: (OSD::_dispatch(Message*)+0x3eb) [0x6d0afb]
 15: (OSD::ms_dispatch(Message*)+0x257) [0x6d1007]
 16: (DispatchQueue::entry()+0x649) [0xc6fe09]
 17: (DispatchQueue::DispatchThread::entry()+0xd) [0xb9dd7d]
 18: (()+0x83a4) [0x7f077300b3a4]
 19: (clone()+0x6d) [0x7f0771595a4d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

--- begin dump of recent events ---
  -135 2015-03-10 19:28:48.790490 7f0774108880  5 asok(0x420) 
register_command perfcounters_dump hook 0x41b4030
  -134 2015-03-10 19:28:48.790565 7f0774108880  5 asok(0x420) 
register_command 1 hook 0x41b4030
  -133 2015-03-10 19:28:48.790571 7f0774108880  5 asok(0x420) 
register_command perf dump hook 0x41b4030
  -132 

Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-10 Thread Malcolm Haak
Hi Samuel,

The sha1? I'm going to admit ignorance as to what you are looking for. They are 
all running the same release if that is what you are asking. 
Same tarball built into rpms using rpmbuild on both nodes... 
Only difference being that the other node has been upgraded and the problem 
node is fresh.

added the requested config here is the command line output

microserver-1:/etc # /etc/init.d/ceph start osd.3
=== osd.3 === 
Mounting xfs on microserver-1:/var/lib/ceph/osd/ceph-3
2015-03-11 01:00:13.492279 7f05b2f72700  1 -- :/0 messenger.start
2015-03-11 01:00:13.492823 7f05b2f72700  1 -- :/1002795 -- 192.168.0.10:6789/0 
-- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f05ac0290b0 con 0x7f05ac027c40
2015-03-11 01:00:13.510814 7f05b07ef700  1 -- 192.168.0.250:0/1002795 learned 
my addr 192.168.0.250:0/1002795
2015-03-11 01:00:13.527653 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 1  mon_map magic: 0 v1  191+0+0 (1112175541 0 0) 
0x7f05aab0 con 0x7f05ac027c40
2015-03-11 01:00:13.527899 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 2  auth_reply(proto 1 0 (0) Success) v1  24+0+0 
(3859410672 0 0) 0x7f05ae70 con 0x7f05ac027c40
2015-03-11 01:00:13.527973 7f05abfff700  1 -- 192.168.0.250:0/1002795 -- 
192.168.0.10:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7f05ac029730 con 
0x7f05ac027c40
2015-03-11 01:00:13.528124 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
0x7f05ac029a50 con 0x7f05ac027c40
2015-03-11 01:00:13.528265 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
0x7f05ac029f20 con 0x7f05ac027c40
2015-03-11 01:00:13.530359 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 3  mon_map magic: 0 v1  191+0+0 (1112175541 0 0) 
0x7f05aab0 con 0x7f05ac027c40
2015-03-11 01:00:13.530548 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 4  mon_subscribe_ack(300s) v1  20+0+0 (3648139960 0 
0) 0x7f05afb0 con 0x7f05ac027c40
2015-03-11 01:00:13.531114 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 5  osd_map(3277..3277 src has 2757..3277) v3  
5366+0+0 (3110999244 0 0) 0x7f05a0002800 con 0x7f05ac027c40
2015-03-11 01:00:13.531772 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 (3648139960 0 
0) 0x7f05afb0 con 0x7f05ac027c40
2015-03-11 01:00:13.532186 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 7  osd_map(3277..3277 src has 2757..3277) v3  
5366+0+0 (3110999244 0 0) 0x7f05a0001250 con 0x7f05ac027c40
2015-03-11 01:00:13.532260 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 8  mon_subscribe_ack(300s) v1  20+0+0 (3648139960 0 
0) 0x7f05afb0 con 0x7f05ac027c40
2015-03-11 01:00:13.556748 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
192.168.0.10:6789/0 -- mon_command({prefix: get_command_descriptions} v 0) 
v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40
2015-03-11 01:00:13.564968 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 9  mon_command_ack([{prefix: 
get_command_descriptions}]=0  v0) v1  72+0+34995 (1092875540 0 
1727986498) 0x7f05aa70 con 0x7f05ac027c40
2015-03-11 01:00:13.770122 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
192.168.0.10:6789/0 -- mon_command({prefix: osd crush create-or-move, 
args: [host=microserver-1, root=default], id: 3, weight: 1.81} v 0) 
v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40
2015-03-11 01:00:13.772299 7f05abfff700  1 -- 192.168.0.250:0/1002795 == mon.0 
192.168.0.10:6789/0 10  mon_command_ack([{prefix: osd crush 
create-or-move, args: [host=microserver-1, root=default], id: 3, 
weight: 1.81}]=0 create-or-move updated item name 'osd.3' weight 1.81 at 
location {host=microserver-1,root=default} to crush map v3277) v1  256+0+0 
(1191546821 0 0) 0x7f05a0001000 con 0x7f05ac027c40
create-or-move updated item name 'osd.3' weight 1.81 at location 
{host=microserver-1,root=default} to crush map
2015-03-11 01:00:13.776891 7f05b2f72700  1 -- 192.168.0.250:0/1002795 mark_down 
0x7f05ac027c40 -- 0x7f05ac0239a0
2015-03-11 01:00:13.777212 7f05b2f72700  1 -- 192.168.0.250:0/1002795 
mark_down_all
2015-03-11 01:00:13.778120 7f05b2f72700  1 -- 192.168.0.250:0/1002795 shutdown 
complete.
Starting Ceph osd.3 on microserver-1...
microserver-1:/etc #


Log file


2015-03-11 01:00:13.876152 7f41a1ba4880  0 ceph version 0.93 
(bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-osd, pid 2840
2015-03-11 01:00:13.877059 7f41a1ba4880  1 accepter.accepter.bind my_inst.addr 
is 0.0.0.0:6800/2840 need_addr=1
2015-03-11 01:00:13.877111 7f41a1ba4880  1 accepter.accepter.bind my_inst.addr 
is 0.0.0.0:6801/2840 need_addr=1
2015-03-11 01:00:13.877140 7f41a1ba4880  1 accepter.accepter.bind my_inst.addr 

Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-10 Thread Samuel Just
Joao, it looks like map 2759 is causing trouble, how would he get the
full and incremental maps for that out of the mons?
-Sam

On Tue, 2015-03-10 at 14:12 +, Malcolm Haak wrote:
 Hi Samuel,
 
 The sha1? I'm going to admit ignorance as to what you are looking for. They 
 are all running the same release if that is what you are asking. 
 Same tarball built into rpms using rpmbuild on both nodes... 
 Only difference being that the other node has been upgraded and the problem 
 node is fresh.
 
 added the requested config here is the command line output
 
 microserver-1:/etc # /etc/init.d/ceph start osd.3
 === osd.3 === 
 Mounting xfs on microserver-1:/var/lib/ceph/osd/ceph-3
 2015-03-11 01:00:13.492279 7f05b2f72700  1 -- :/0 messenger.start
 2015-03-11 01:00:13.492823 7f05b2f72700  1 -- :/1002795 -- 
 192.168.0.10:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 
 0x7f05ac0290b0 con 0x7f05ac027c40
 2015-03-11 01:00:13.510814 7f05b07ef700  1 -- 192.168.0.250:0/1002795 learned 
 my addr 192.168.0.250:0/1002795
 2015-03-11 01:00:13.527653 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 1  mon_map magic: 0 v1  191+0+0 (1112175541 
 0 0) 0x7f05aab0 con 0x7f05ac027c40
 2015-03-11 01:00:13.527899 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 2  auth_reply(proto 1 0 (0) Success) v1  
 24+0+0 (3859410672 0 0) 0x7f05ae70 con 0x7f05ac027c40
 2015-03-11 01:00:13.527973 7f05abfff700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7f05ac029730 
 con 0x7f05ac027c40
 2015-03-11 01:00:13.528124 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
 0x7f05ac029a50 con 0x7f05ac027c40
 2015-03-11 01:00:13.528265 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
 0x7f05ac029f20 con 0x7f05ac027c40
 2015-03-11 01:00:13.530359 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 3  mon_map magic: 0 v1  191+0+0 (1112175541 
 0 0) 0x7f05aab0 con 0x7f05ac027c40
 2015-03-11 01:00:13.530548 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 4  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.531114 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 5  osd_map(3277..3277 src has 2757..3277) v3 
  5366+0+0 (3110999244 0 0) 0x7f05a0002800 con 0x7f05ac027c40
 2015-03-11 01:00:13.531772 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.532186 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 7  osd_map(3277..3277 src has 2757..3277) v3 
  5366+0+0 (3110999244 0 0) 0x7f05a0001250 con 0x7f05ac027c40
 2015-03-11 01:00:13.532260 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 8  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.556748 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_command({prefix: get_command_descriptions} v 
 0) v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40
 2015-03-11 01:00:13.564968 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 9  mon_command_ack([{prefix: 
 get_command_descriptions}]=0  v0) v1  72+0+34995 (1092875540 0 
 1727986498) 0x7f05aa70 con 0x7f05ac027c40
 2015-03-11 01:00:13.770122 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_command({prefix: osd crush create-or-move, 
 args: [host=microserver-1, root=default], id: 3, weight: 1.81} v 0) 
 v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40
 2015-03-11 01:00:13.772299 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 10  mon_command_ack([{prefix: osd crush 
 create-or-move, args: [host=microserver-1, root=default], id: 3, 
 weight: 1.81}]=0 create-or-move updated item name 'osd.3' weight 1.81 at 
 location {host=microserver-1,root=default} to crush map v3277) v1  
 256+0+0 (1191546821 0 0) 0x7f05a0001000 con 0x7f05ac027c40
 create-or-move updated item name 'osd.3' weight 1.81 at location 
 {host=microserver-1,root=default} to crush map
 2015-03-11 01:00:13.776891 7f05b2f72700  1 -- 192.168.0.250:0/1002795 
 mark_down 0x7f05ac027c40 -- 0x7f05ac0239a0
 2015-03-11 01:00:13.777212 7f05b2f72700  1 -- 192.168.0.250:0/1002795 
 mark_down_all
 2015-03-11 01:00:13.778120 7f05b2f72700  1 -- 192.168.0.250:0/1002795 
 shutdown complete.
 Starting Ceph osd.3 on microserver-1...
 microserver-1:/etc #
 
 
 Log file
 
 
 2015-03-11 01:00:13.876152 7f41a1ba4880  0 ceph version 0.93 
 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-osd, pid 2840
 2015-03-11 01:00:13.877059