Re: [ceph-users] RBD snapshots cause disproportionate performance degradation

2015-11-17 Thread Haomai Wang
Yes, it's a expected case. Actually if you use Hammer, you can enable
filestore_fiemap to use sparse copy which especially useful for rbd
snapshot copy. But keep in mind some old kernel are *broken* in
fiemap. CentOS 7 is only the distro I verfied fine to this feature.


On Wed, Nov 18, 2015 at 12:25 PM, Will Bryant  wrote:
> Hi,
>
> We’ve been running an all-SSD Ceph cluster for a few months now and generally 
> are very happy with it.
>
> However, we’ve noticed that if we create a snapshot of an RBD device, then 
> writing to the RBD goes massively slower than before we took the snapshot.  
> Similarly, we get poor performance if we make a clone of that snapshot and 
> write to it.
>
> For example, using fio to run a 2-worker 4kb synchronous random write 
> benchmark, we normally get about 5000 IOPS to RBD on our test-sized cluster 
> (Intel 3710, 10G networking, Ubuntu 14.04).  But as soon as I take a 
> snapshot, this goes down to about 100 IOPS, and with high variability - at 
> times 0 IOPS, 60 IOPS, or 300 IOPS.
>
> I realise that after a snapshot, any write will trigger a copy of the block, 
> which by default would be 4 MB of data - to minimize this effect I’ve reduced 
> the RBD order to 18 ie. 256 KB blocks.
>
> But shouldn’t that effect only degrade it to the same performance as we get 
> on a completely new RBD image that has no snapshots and no data?  For us that 
> is more like 1000-1500 IOPS ie. still at least 10x better than the 
> performance we get after a snapshot is taken.
>
> Is there something particularly inefficient about the copy-on-write block 
> implementation that makes it much worse than writing to fresh blocks?  Note 
> that we get this performance drop even if the other data on the blocks are 
> cached in memory, and since we’re using fast SSDs, the time to read in the 
> rest of the 256 KB should be negligible.
>
> We’re currently using Hammer but we also tested with Infernalis and it didn’t 
> seem any better.
>
> Cheers,
> Will
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD snapshots cause disproportionate performance degradation

2015-11-17 Thread Will Bryant
Hi,

We’ve been running an all-SSD Ceph cluster for a few months now and generally 
are very happy with it.

However, we’ve noticed that if we create a snapshot of an RBD device, then 
writing to the RBD goes massively slower than before we took the snapshot.  
Similarly, we get poor performance if we make a clone of that snapshot and 
write to it.

For example, using fio to run a 2-worker 4kb synchronous random write 
benchmark, we normally get about 5000 IOPS to RBD on our test-sized cluster 
(Intel 3710, 10G networking, Ubuntu 14.04).  But as soon as I take a snapshot, 
this goes down to about 100 IOPS, and with high variability - at times 0 IOPS, 
60 IOPS, or 300 IOPS.

I realise that after a snapshot, any write will trigger a copy of the block, 
which by default would be 4 MB of data - to minimize this effect I’ve reduced 
the RBD order to 18 ie. 256 KB blocks.

But shouldn’t that effect only degrade it to the same performance as we get on 
a completely new RBD image that has no snapshots and no data?  For us that is 
more like 1000-1500 IOPS ie. still at least 10x better than the performance we 
get after a snapshot is taken.

Is there something particularly inefficient about the copy-on-write block 
implementation that makes it much worse than writing to fresh blocks?  Note 
that we get this performance drop even if the other data on the blocks are 
cached in memory, and since we’re using fast SSDs, the time to read in the rest 
of the 256 KB should be negligible.

We’re currently using Hammer but we also tested with Infernalis and it didn’t 
seem any better.

Cheers,
Will
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] about PG_Number

2015-11-17 Thread Vickie ch
By the way, here is a useful tool to calculate pg.
http://ceph.com/pgcalc/



Best wishes,
Mika


2015-11-18 11:46 GMT+08:00 Vickie ch :

> Hi wah peng,
> Hope you don't  mind. Just for reference.
> A extreme case. If your ceph cluster have 3 osd disks on different osd
> server.
> Set pg number is 10240.(Just example)  That's mean all these pg will
> create on 3 disks.
> Lost one OSD also means a lot of pg lost too. It may bring some trouble
> for re-balance and recovery.
>
> In the other side, if you have 1 OSDs but only set pg = 8.
> That mean some disks have no chance to using.
>
> Best wishes,
> Mika
>
>
> 2015-11-13 16:26 GMT+08:00 wah peng :
>
>> why data lost happens? thanks.
>>
>> On 2015/11/13 星期五 16:13, Vickie ch wrote:
>>
>>> In the other side, pg number too large but OSD number too small that
>>> have a chance to cause data lost.
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] about PG_Number

2015-11-17 Thread Vickie ch
Hi wah peng,
Hope you don't  mind. Just for reference.
A extreme case. If your ceph cluster have 3 osd disks on different osd
server.
Set pg number is 10240.(Just example)  That's mean all these pg will create
on 3 disks.
Lost one OSD also means a lot of pg lost too. It may bring some trouble for
re-balance and recovery.

In the other side, if you have 1 OSDs but only set pg = 8.
That mean some disks have no chance to using.

Best wishes,
Mika


2015-11-13 16:26 GMT+08:00 wah peng :

> why data lost happens? thanks.
>
> On 2015/11/13 星期五 16:13, Vickie ch wrote:
>
>> In the other side, pg number too large but OSD number too small that
>> have a chance to cause data lost.
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can not create rbd image

2015-11-17 Thread Vickie ch
Hi ,
 Looks like your cluster have warning message of "2 near full osd(s)".

​Maybe ​try to extend osds first?
​




Best wishes,
Mika


2015-11-12 23:05 GMT+08:00 min fang :

> Hi cepher, I tried to use the following command to create a img, but
> unfortunately, the command hung for a long time until I broken it by
> crtl-z.
>
> rbd -p hello create img-003 --size 512
>
> so I checked the cluster status, and showed:
>
> cluster 0379cebd-b546-4954-b5d6-e13d08b7d2f1
>  health HEALTH_WARN
> 2 near full osd(s)
> too many PGs per OSD (320 > max 300)
>  monmap e2: 1 mons at {vl=192.168.90.253:6789/0}
> election epoch 1, quorum 0 vl
>  osdmap e37: 2 osds: 2 up, 2 in
>   pgmap v19544: 320 pgs, 3 pools, 12054 MB data, 3588 objects
> 657 GB used, 21867 MB / 714 GB avail
>  320 active+clean
>
> I did not see error message here could cause rbd create hang.
>
> I opened the client log and see:
>
> 2015-11-12 22:52:44.687491 7f89eced9780 20 librbd: create 0x7fff8f7b7800
> name = img-003 size = 536870912 old_format = 1 features = 0 order = 22
> stripe_unit = 0 stripe_count = 0
> 2015-11-12 22:52:44.687653 7f89eced9780  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6800/5472 -- osd_op(client.34321.0:1 img-003.rbd
> [stat] 2.8a047315 ack+read+known_if_redirected e37) v5 -- ?+0 0x28513d0 con
> 0x285
> 2015-11-12 22:52:44.688928 7f89e066b700  1 -- 192.168.90.253:0/1006121
> <== osd.1 192.168.90.253:6800/5472 1  osd_op_reply(1 img-003.rbd
> [stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6  178+0+0
> (3550830125 0 0) 0x7f89cae0 con 0x285
> 2015-11-12 22:52:44.689090 7f89eced9780  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- osd_op(client.34321.0:2 rbd_id.img-003
> [stat] 2.638c75a8 ack+read+known_if_redirected e37) v5 -- ?+0 0x2858330 con
> 0x2856f50
> 2015-11-12 22:52:44.690425 7f89e0469700  1 -- 192.168.90.253:0/1006121
> <== osd.0 192.168.90.253:6801/5344 1  osd_op_reply(2 rbd_id.img-003
> [stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6  181+0+0
> (1202435393 0 0) 0x7f89b8000ae0 con 0x2856f50
> 2015-11-12 22:52:44.690494 7f89eced9780  2 librbd: adding rbd image to
> directory...
> 2015-11-12 22:52:44.690544 7f89eced9780  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- osd_op(client.34321.0:3 rbd_directory
> [tmapup 0~0] 2.30a98c1c ondisk+write+known_if_redirected e37) v5 -- ?+0
> 0x2858920 con 0x2856f50
> 2015-11-12 22:52:59.687447 7f89e4074700  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6789/0 -- mon_subscribe({monmap=3+,osdmap=38}) v2 --
> ?+0 0x7f89bab0 con 0x2843b90
> 2015-11-12 22:52:59.687472 7f89e4074700  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bf40
> con 0x2856f50
> 2015-11-12 22:52:59.687887 7f89e3873700  1 -- 192.168.90.253:0/1006121
> <== mon.0 192.168.90.253:6789/0 11  mon_subscribe_ack(300s) v1 
> 20+0+0 (2867606018 0 0) 0x7f89d8001160 con 0x2843b90
> 2015-11-12 22:53:04.687593 7f89e4074700  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0
> con 0x2856f50
> 2015-11-12 22:53:09.687731 7f89e4074700  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0
> con 0x2856f50
> 2015-11-12 22:53:14.687844 7f89e4074700  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0
> con 0x2856f50
> 2015-11-12 22:53:19.687978 7f89e4074700  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0
> con 0x2856f50
> 2015-11-12 22:53:24.688116 7f89e4074700  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0
> con 0x2856f50
> 2015-11-12 22:53:29.688253 7f89e4074700  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0
> con 0x2856f50
> 2015-11-12 22:53:34.688389 7f89e4074700  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0
> con 0x2856f50
> 2015-11-12 22:53:39.688512 7f89e4074700  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0
> con 0x2856f50
> 2015-11-12 22:53:44.688636 7f89e4074700  1 -- 192.168.90.253:0/1006121
> --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0
> con 0x2856f50
>
>
> Looks to me, we are keeping ping magic process, and no completed.
>
> my ceph version is "ceph version 0.94.5
> (9764da52395923e0b32908d83a9f7304401fee43)"
>
> somebody can help me on this? Or still me to collect more debug
> information for analyzing.
>
> Thanks.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
c

Re: [ceph-users] rbd create => seg fault

2015-11-17 Thread Mark Kirkwood
On 13/11/15 21:21, Ilya Dryomov wrote:
> On Fri, Nov 13, 2015 at 5:25 AM, Mark Kirkwood
>  wrote:
>> When you do:
>>
>> $ rbd create
>>
> 
> This seg fault is not in any way tied to the kernel version.
> 
> The kernel isn't involved in creating rbd images or using them through
> librbd (or anything that uses librbd indirectly).  It's only if you try
> to use the kernel client (rbd map/unmap) that it starts to matter which
> kernel you are on.
> 

Arrg, you are indeed correct, I confused myself (ahem ...and others) there.

regards

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD Recovery Delay Start

2015-11-17 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

We are having a lot of trouble with the SSD OSDs for our cache tier
when they reboot. It causes massive blocked I/O when booting the OSD
and the entire cluster I/O nearly stalls even when the OSD is only
down for 60 seconds.

I have noticed that when the OSD starts it uses massive amounts of
RAM, for  the one minute test it used almost 8 GB, another one earlier
this morning used 14 GB, some last night were in the 10GB range.
During this time the process is not using much CPU, but the disks are
very busy writing a good 120-250 MB/s and hundreds to low thousand
IOPs. Once the memory usage gets down to about 1.5 GB blocked I/O
starts clearing slowly. At first I thought this was due to preloading
jemalloc, but it also happens without it.

Looking through [1] I thought osd recovery delay start set to 60
seconds or longer would allow the OSD to come up, join the cluster, do
any housekeeping before being in and trying to service I/O requests.
However, setting the value to 60 does nothing, we see recovery
operations start less than 30 seconds after the monitor shows the boot
message. The osd log does not show any kind of delay either.

Is there a bug here or am I understanding this option incorrectly?

What I'm looking for is something to delay any I/O until the peering
is completed, the PGs have been scanned, all of the house keeping is
done so that the only load on the OSD/disk is client/recovery I/O. I
don't want it to try to do both at the same time.

Once the OSD finally comes in and the blocked I/O clears, we can
manage backfilling and recovery without much impact to the cluster, it
is just the initial minutes of terror (dozens of blocked I/O > 500
seconds) that we can't figure out how to get rid of. I understand that
there will be some impact for recovery, but on our cluster that on
average does about 10K IOPs, we have less than 5K for 5 minutes (for a
single OSD that was down for 60 seconds). A host with two SSDs brought
our cluster to less than 2K IOPs for 15 minutes and took ten minutes
to get back to normal performance.

[1] http://docs.ceph.com/docs/v0.94/rados/configuration/osd-config-ref/#recovery

Thanks,
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWS9EvCRDmVDuy+mK58QAAzE0P/3jQt3RkDUetTyuu/E3v
wVwBtcxONs7RQHIEtamNk/eIoGsSS+PevsBK2hSvnIJWNZkhQ3U13HQQ7Hz1
awkVD3+nw72You09kC772MtAXOIcHDEQgzJHQGoxevLlJSRwIarzyMlkJqrP
g+WdAx+O3BjtOoPG+6SG1HMDqUjTw46yHkCC2iybjT9y7PBp6PZ8EN1GD+00
k2+FferROKg/VxKxwQmgWVlXIvnrSF2/bHuZeTOUybw7TWNt1q6ZSXr4ZZuY
1e0yUnj8lNMus3SC6Itdj9wBp6Ke1J4tdUZkWiTgMkK5Xykw6iAJCADPIrni
zck3SfI2XB8XXrNwvuEvuKyAleXAodPf/AbWQ9sfO88MoWFYZ3ibNbbIfAp4
SEKeZpipzxlvNCm/W2NiDD08jbcaYDqn6dj6fHSHvIelysRItLlojTXuAioZ
ORQ4JAxPnEfNCUtn/eAq46oVIjrmSPiHs2p2hMYjhANLNYz5tyAt/HNSHXzR
hnYH9y4TFIOyrB7JcAypkIKwiuGjmoMbR8RvF1hDEJRXAzj7rpePQ9FoNbU/
/uGIJlwSPEJ8UxK1TuqDJ13XvXfLbR+S1aPjt+Y5LOMYO0pFo4fsdtt8NakM
ayRsUysGM9n7hBRTWHV5zPg7MB3wjcFv2kaE5NZfwfMNbM2xXkdClYHJ53Zr
kbpr
=BJTH
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd create => seg fault

2015-11-17 Thread Artie Ziff
This was resolved when I found more versioned libs that were also out of
sync in the Python neighborhood: /usr/local/lib/python2.7/site-packages

This exercise motivated me to install multiple versions of Ceph in
versioned app directories, such as /usr/local/ceph-hammer and
/usr/local/ceph-infernalis with the intention of being able to switch back
and forth more rapidly. Almost got all the related Ceph programs properly
working but (IIRC) when creating pools there was a python lib called by
another python program could not find it's stuff. Perhaps, if I am not able
to sort it out then I will post a complete question about that in the
future.

Thanks everyone!
-az
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ms crc header: seeking info?

2015-11-17 Thread Artie Ziff
Thank you very much, Haomai, and others, too!

ms_crc_header was set consistently across nodes at all times.  :)
Root cause of my problems was the mismatched libraries being picked up when
shared libs load. Plain & simple. Next time, I should not allow errant
input from other "cooks in the kitchen"; whom I perceived to be more expert
then me. Heh. Not you guys! I assure you! 

Many thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSD Caching Mode Question

2015-11-17 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

We are inserting an SSD tier into our very busy cluster and I have a
question regrading writeback and forward modes.

Write back is the "normal" mode for RBD with VMs. When we put the tier
in writeback mode we see objects are being promoted and once the ratio
is reached objects are evicted, this works as expected. When we place
the tier into forward mode, we don't see any objects being evicted to
the base tier when they are written to as described in the manual [1].
Is this a bug? We are running 0.94.5.

Now, I usually like things to work they way they are described in the
manual, however this "bug" is a bit advantageous for us. It appears
that we don't have enough IOPs in the SSD tier to handle the steady
state (we still  have some more SSDs to add in, but it requires
shuffling hardware around). However, when we put the tier into forward
mode, the latency drops and we get much more performance from the Ceph
cluster. In write back we seem to be capped at about 9K IOPs accroding
to ceph -w with spikes up to about 15K. However in forward mode we can
hit 65K IOPs and have a stead state near 30K IOPs. I'm linking two
graphs to show what I'm describing (for some reason the graphs seem to
be half of what is reported by ceph -w). [2][3]

Does the promote/evict logic really add that much latency? It seems
that overall the tier performance can be very good. We are using three
hit sets with 10 minutes per set and all three sets have to have a
read to promote it (we don't want to promote isolated reads). Does
someone have some suggestions from getting the forward like
performance in writeback?

We have 35 1 TB Micron M600 drives ( 26K single thread direct sync 4K
random writes, 43K two thread test, we are already aware of the
potential power loss issue so you don't need to bring that up) in 3x
replication. Our current hot set is about 4.5TB and only shifts by
about 30% over a week's time. We have cache_target_full_ratio set to
0.55 so that we leave a good part of the drive empty for performance.
Also about 90% of our reads are in 10% of the working set and 80% of
our writes are in about 20% of the working set.

[1] 
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache
[2] http://robert.leblancnet.us/files/performance.png
[3] http://robert.leblancnet.us/files/promote_evict.png


Thanks,
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWS8qbCRDmVDuy+mK58QAADboQAL0tl1ZArL1zPFBf5lYh
xuYQyaWsoaOgdPvlsFhciSrh3VmdTkT9R3O6MZ61VEauKUHmoipE39KejPj3
dQMKKHYc+6VF1MoNoQbeml63jC3DJGBDhPOd+bQ7RE8GBaKM71JaWvvG5bgW
xLAZ7F+37jpHkp/9syrnb0wMxOtZ0xq/iW8Kt3lvSz5Qx6XNx5r78+H9Zr28
OO4xFK8JNfa3JK7RbYU3VeUZCRhhIk/Enb8NdpA0a2cT1meTKfHMDKlOWmT4
qrWIfptWdtADveq6xY2Kj92dFXVnwNfFjoIl4PXTwZtZM1RyAc1gy3qMBADI
BOvn5jdw1PmVYHpY9NH57vpKhn+5o6+FvW95baE5OFJ52NthkVp87LuutnKV
RNyy/cWEe2/Dc9QZdj3eXKjEcL5MYgM+P21THO2e7QQwD6GXnJWnsSTwsQOm
Qs6RqyE9RgdpabdThRzxWIuT8TJmBrDOovEulzFpBN3ZG8bsOrS/5pTmgamI
c8FyddhFgYsPwjMKEDvEbHTIPHx1tZ9hL5fjAwZQeMMCV3LWojAK33a0a602
JfSBj1dhICaULUFQT9f9yhd8/maYNpWogHb/zb3wolegcVP0UckcVxNEUxIf
hxpTvFV93BzUupaprn03Oje2qbSdY++9lZbBfkVChodEprM5oejiT158WBYr
Z3Af
=FP/u
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SL6/Centos6 rebuild question

2015-11-17 Thread Goncalo Borges


Dear All...

I was able to build el6 rpms for infernalis but they requires GCC 4.8.3. 
If you are interested in the details on how to do it, let me know.


There were two major issues:

- The build complains about a parsing configuration error on 
ceph-9.2.0/selinux/ceph.te


   /usr/bin/checkmodule: loading policy configuration from tmp/ceph.tmp
   ceph.te":108:ERROR 'syntax error' at token 'fsadm_manage_pid' on
   line 15442:
   fsadm_manage_pid(ceph_t

I've checked that if I comment that line, the build goes on but, for the 
time being, I just changed the ceph.spec to remove the selinux support.


- The build then fails with the error

   + /usr/lib/rpm/find-debuginfo.sh --strict-build-id 
/root/rpmbuild/BUILD/ceph-9.2.0
   extracting debug info from 
/root/rpmbuild/BUILDROOT/ceph-9.2.0-0.el6.x86_64/sbin/mount.ceph
   *** ERROR: No build ID note found in 
/root/rpmbuild/BUILDROOT/ceph-9.2.0-0.el6.x86_64/sbin/mount.ceph

   error: Bad exit status from /var/tmp/rpm-tmp.8iKDQ5 (%install)

   RPM build errors:
   Bad exit status from /var/tmp/rpm-tmp.8iKDQ5 (%install)

I worked around this inserting

%define  debug_package %{nil}

in the ceph.spec to avoid building debug packages which I do not really 
need.


Cheers
Goncalo


On 11/13/2015 08:27 PM, Goncalo Borges wrote:

Well... I misinterpreted the error. It is not systemd related but selinux 
related. I must be missing some selinux component. Will investigate better.



From: Goncalo Borges [goncalo.bor...@sydney.edu.au]
Sent: 13 November 2015 16:51
To: ceph-users@lists.ceph.com
Subject: SL6/Centos6 rebuild question

Dear Ceph Gurus...

I have tried to rebuild Ceph (9.2.0) in Centos6 with GCC 4.8 using the SRPM for 
Centos7.

I could easily start rebuilding Ceph after solving some dependencies issues. 
However, it fails right at the end with systemd related messages:
# rpmbuild --rebuild ceph-9.2.0-0.el7.src.rpm
(...)
build succeeded, 1 warning.
make[1]: Leaving directory `/root/rpmbuild/BUILD/ceph-9.2.0/man'
Making all in doc
make[1]: Entering directory `/root/rpmbuild/BUILD/ceph-9.2.0/doc'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/root/rpmbuild/BUILD/ceph-9.2.0/doc'
Making all in systemd
make[1]: Entering directory `/root/rpmbuild/BUILD/ceph-9.2.0/systemd'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/root/rpmbuild/BUILD/ceph-9.2.0/systemd'
Making all in selinux
make[1]: Entering directory `/root/rpmbuild/BUILD/ceph-9.2.0/selinux'
make -j1 -f /usr/share/selinux/devel/Makefile ceph.pp
cat: /selinux/mls: No such file or directory
make[2]: Entering directory `/root/rpmbuild/BUILD/ceph-9.2.0/selinux'
Compiling targeted ceph module
/usr/bin/checkmodule:  loading policy configuration from tmp/ceph.tmp
ceph.te":108:ERROR 'syntax error' at token 'fsadm_manage_pid' on line 15442:
fsadm_manage_pid(ceph_t)

/usr/bin/checkmodule:  error(s) encountered while parsing configuration
make[2]: *** [tmp/ceph.mod] Error 1
make[2]: Leaving directory `/root/rpmbuild/BUILD/ceph-9.2.0/selinux'
make[1]: *** [ceph.pp] Error 2
make[1]: Leaving directory `/root/rpmbuild/BUILD/ceph-9.2.0/selinux'
make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.3TtsUK (%build)

RPM build errors:
 Bad exit status from /var/tmp/rpm-tmp.3TtsUK (%build)
I remember that systemd support was introduced in the latest infernalis 
release, and I just wonder if that, somehow, breaks the backward compatibility 
with older systems.

Cheers
Goncalo




--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados_aio_cancel

2015-11-17 Thread Gregory Farnum
On Monday, November 16, 2015, min fang  wrote:

> Is this function used in detach rx buffer, and complete IO back to the
> caller?  From the code, I think this function will not interact with OSD or
> MON side, which means, we just cancel IO from client side. Am I right?
>
> Thanks.
>

Right. If the messages haven't been sent over the network they get tossed
out, otherwise we ignore the reply. The RADOS protocol doesn't itself
support op cancellation.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD pool and SATA pool

2015-11-17 Thread Nick Fisk
I prefer to use the crush hook location functionality to call a script like
this so OSD's are dynamically placed in the correct crush root on startup

 

https://gist.github.com/wido/5d26d88366e28e25e23d

 

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Michael Kuriger
Sent: 17 November 2015 22:17
To: Sean Redmond ; Nikola Ciprich

Cc: ceph-users 
Subject: Re: [ceph-users] SSD pool and SATA pool

 

Many thanks!

 

 




 

Michael Kuriger

Sr. Unix Systems Engineer

*   mk7...@yp.com |* 818-649-7235

 

From: Sean Redmond mailto:sean.redmo...@gmail.com>
>
Date: Tuesday, November 17, 2015 at 2:00 PM
To: Nikola Ciprich mailto:nikola.cipr...@linuxbox.cz> >
Cc: Michael Kuriger mailto:mk7...@yp.com> >,
"ceph-users@lists.ceph.com" mailto:ceph-users@lists.ceph.com> >
Subject: Re: [ceph-users] SSD pool and SATA pool

 

Hi, 

 

The below should help you:

 

http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the
-same-box/
 

 

Thanks

 

On Tue, Nov 17, 2015 at 9:58 PM, Nikola Ciprich mailto:nikola.cipr...@linuxbox.cz> > wrote:

I'm not an ceph expert, but I needed to use

osd crush update on start = false

in [osd] config section..

BR

nik


On Tue, Nov 17, 2015 at 08:53:37PM +, Michael Kuriger wrote:
> Hey everybody,
> I have 10 servers, each with 2 SSD drives, and 8 SATA drives.  Is it
possible to create 2 pools, one made up of SSD and one made up of SATA?  I
tried manually editing the crush map to do it, but the configuration doesn't
seem to persist reboots.  Any help would be very appreciated.
>
> Thanks!
>
> Mike
>
>

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com  
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


--
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214  
fax:+420 596 621 273  
mobil:  +420 777 093 799  
www.linuxbox.cz
 

mobil servis: +420 737 238 656  
email servis: ser...@linuxbox.cz  
-

___
ceph-users mailing list
ceph-users@lists.ceph.com  
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

 


 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD pool and SATA pool

2015-11-17 Thread Michael Kuriger
Many thanks!


[yp]



Michael Kuriger
Sr. Unix Systems Engineer
• mk7...@yp.com |• 818-649-7235


From: Sean Redmond mailto:sean.redmo...@gmail.com>>
Date: Tuesday, November 17, 2015 at 2:00 PM
To: Nikola Ciprich 
mailto:nikola.cipr...@linuxbox.cz>>
Cc: Michael Kuriger mailto:mk7...@yp.com>>, 
"ceph-users@lists.ceph.com" 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] SSD pool and SATA pool

Hi,

The below should help you:

http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/

Thanks

On Tue, Nov 17, 2015 at 9:58 PM, Nikola Ciprich 
mailto:nikola.cipr...@linuxbox.cz>> wrote:
I'm not an ceph expert, but I needed to use

osd crush update on start = false

in [osd] config section..

BR

nik


On Tue, Nov 17, 2015 at 08:53:37PM +, Michael Kuriger wrote:
> Hey everybody,
> I have 10 servers, each with 2 SSD drives, and 8 SATA drives.  Is it possible 
> to create 2 pools, one made up of SSD and one made up of SATA?  I tried 
> manually editing the crush map to do it, but the configuration doesn’t seem 
> to persist reboots.  Any help would be very appreciated.
>
> Thanks!
>
> Mike
>
>

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD pool and SATA pool

2015-11-17 Thread Sean Redmond
Hi,

The below should help you:

http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/

Thanks

On Tue, Nov 17, 2015 at 9:58 PM, Nikola Ciprich 
wrote:

> I'm not an ceph expert, but I needed to use
>
> osd crush update on start = false
>
> in [osd] config section..
>
> BR
>
> nik
>
>
> On Tue, Nov 17, 2015 at 08:53:37PM +, Michael Kuriger wrote:
> > Hey everybody,
> > I have 10 servers, each with 2 SSD drives, and 8 SATA drives.  Is it
> possible to create 2 pools, one made up of SSD and one made up of SATA?  I
> tried manually editing the crush map to do it, but the configuration
> doesn’t seem to persist reboots.  Any help would be very appreciated.
> >
> > Thanks!
> >
> > Mike
> >
> >
>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
>
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD pool and SATA pool

2015-11-17 Thread Nikola Ciprich
I'm not an ceph expert, but I needed to use

osd crush update on start = false

in [osd] config section..

BR

nik


On Tue, Nov 17, 2015 at 08:53:37PM +, Michael Kuriger wrote:
> Hey everybody,
> I have 10 servers, each with 2 SSD drives, and 8 SATA drives.  Is it possible 
> to create 2 pools, one made up of SSD and one made up of SATA?  I tried 
> manually editing the crush map to do it, but the configuration doesn’t seem 
> to persist reboots.  Any help would be very appreciated.
> 
> Thanks!
> 
> Mike
> 
> 

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpaZwgaH_fdn.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSD pool and SATA pool

2015-11-17 Thread Michael Kuriger
Hey everybody,
I have 10 servers, each with 2 SSD drives, and 8 SATA drives.  Is it possible 
to create 2 pools, one made up of SSD and one made up of SATA?  I tried 
manually editing the crush map to do it, but the configuration doesn’t seem to 
persist reboots.  Any help would be very appreciated.

Thanks!

Mike


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Performance output con Ceph IB with fio examples

2015-11-17 Thread German Anders
Hi cephers,

Is there anyone out there using Ceph (any version) with Infiniband FDR
topology network (both public and cluster), that could share some
performance results? To be more specific, running something like this on a
RBD volume mapped to a IB host:

# fio --rw=randread --bs=4m --numjobs=4 --iodepth=32 --runtime=22
--time_based --size=16777216k --loops=1 --ioengine=libaio --direct=1
--invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap
--group_reporting --exitall --name
dev-ceph-randread-4m-4thr-libaio-32iodepth-22sec
--filename=/mnt/rbdtest/test1

# fio --rw=randread --bs=1m --numjobs=4 --iodepth=32 --runtime=22
--time_based --size=16777216k --loops=1 --ioengine=libaio --direct=1
--invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap
--group_reporting --exitall --name
dev-ceph-randread-1m-4thr-libaio-32iodepth-22sec
--filename=/mnt/rbdtest/test2

# fio --rw=randwrite --bs=1m --numjobs=4 --iodepth=32 --runtime=22
--time_based --size=16777216k --loops=1 --ioengine=libaio --direct=1
--invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap
--group_reporting --exitall --name
dev-ceph-randwrite-1m-4thr-libaio-32iodepth-22sec
--filename=/mnt/rbdtest/test3

# fio --rw=randwrite --bs=4m --numjobs=4 --iodepth=32 --runtime=22
--time_based --size=16777216k --loops=1 --ioengine=libaio --direct=1
--invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap
--group_reporting --exitall --name
dev-ceph-randwrite-4m-4thr-libaio-32iodepth-22sec
--filename=/mnt/rbdtest/test4

will really appreciate the outputs.

Thanks in advance,

*German*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disaster recovery of monitor

2015-11-17 Thread Jose Tavares
Hi guys ...

Thanks a lot for your support.
I discovered what happened.

I had 2 monitors, osnode01 and osnode02.
I tried do add a 3rd by using ceph-deploy.

ceph-deploy was using a key different from the one used by my monitor
cluster.

So, I added osnode08 to the monitor cluster and it did not become part of
the quorum.
I removed it, and removed osnode02. The monitor count should be in an odd
number.

When I did that, my ceph stopped.
I readded osnode02 to the monitor cluster.
The thing is that I added using a wrong key. I don't know why ceph-deploy
started using a different key.

As suggested by Joao Eduardo, removing auth I could bring part of ceph up.
After that I troubleshooted this key problem, solved it, and know my whole
cluster is recovered and running fine ...

Thanks a lot.
Jose Tavares


On Tue, Nov 17, 2015 at 11:13 AM, Jose Tavares  wrote:

> Now I tried to inject the latest map I had.
> Also, I created a second monitor on osnode02, like I had before, using the
> same map.
> I started both monitors ...
>
> Logs from osnode01 show my content ... and then it started to show lines
> like
>
> 2015-11-17 10:56:26.515069 7fc73af67700  0 
> mon.osnode01@0(probing).data_health(1)
> update_stats avail 19% total 220 GB, used 178 GB, avail 43178 MB
>
> What does that mean?
> Attached are the logs.
>
> Thanks a lot.
> Jose Tavares
>
>
>
>
>
>
>
>
> On Tue, Nov 17, 2015 at 10:33 AM, Jose Tavares  wrote:
>
>>
>>
>> On Tue, Nov 17, 2015 at 7:27 AM, Joao Eduardo Luis  wrote:
>>
>>> On 11/17/2015 03:56 AM, Jose Tavares wrote:
>>> > The problem is that I think I don't have any good monitor anymore.
>>> > How do I know if the map I am trying is ok?
>>> >
>>> > I also saw in the logs that the primary mon was trying to contact a
>>> > removed mon at IP .112 .. So, I added .112 again ... and it didn't
>>> help.
>>> >
>>> > Attached are the logs of what is going on and some monmaps that I
>>> > capture that were from minutes before the cluster become inaccessible
>>> ..
>>> >
>>> > Should I try inject this monmaps in my primary mon to see if it can
>>> > recover the cluster?
>>> > Is it possible to see if this monmaps match my content?
>>>
>>> Without access to the actual store.db there's no way to ascertain if the
>>> store has any problems, and even then figuring out a potential
>>> corruption from just one monitor store.db would either be impossible or
>>> impractical.
>>>
>>
>> I posted my store.db in my previous answer ..
>>
>>
>>
>>>
>>> That said, from the log you attached it seems you only have issues with
>>> authentication: you have pgmaps from epoch 91923 through to 92589, you
>>> have an mds map (epoch 38), osdmaps at least through epoch 307, and 40
>>> versions for the auth keys.
>>>
>>> Somehow, though, your monitors are unable to authenticate each other. No
>>> way to tell if that was corruption or user error.
>>>
>>> You should be able to get your monitors back to speaking terms again
>>> simply by disabling cephx temporarily. Then you can figure out whatever
>>> you need to figure out in terms of monitor keys.
>>>
>>> Just update your ceph.conf with 'auth supported = none' and restart the
>>> monitors. See how it goes from there.
>>>
>>
>> I tried your suggestion and it didn't make any change to the results .. :(
>>
>> Thanks a lot.
>> Jose Tavares
>>
>>
>>
>>> HTH
>>>
>>>   -Joao
>>>
>>>
>>>
>>> >
>>> > Thanks a lot.
>>> > Jose Tavares
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Nov 16, 2015 at 10:48 PM, Nathan Harper
>>> > mailto:nathan.har...@cfms.org.uk>> wrote:
>>> >
>>> > I had to go through a similar process when we had a disaster which
>>> > destroyed one of our monitors.   I followed the process here:
>>> > REMOVING MONITORS FROM AN UNHEALTHY CLUSTER
>>> > 
>>> to
>>> > remove all but one monitor, which let me bring the cluster back up.
>>> >
>>> > As you are running an older version of Ceph than hammer, some of
>>> the
>>> > commands might differ (perhaps this might
>>> > help
>>> http://docs.ceph.com/docs/v0.80/rados/operations/add-or-rm-mons/)
>>> >
>>> >
>>> > --
>>> > *Nathan Harper*// IT Systems Architect
>>> >
>>> > *e: * nathan.har...@cfms.org.uk 
>>> > // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: *
>>> > www.cfms.org.uk  // Linkedin grey icon
>>> > scaled 
>>> > CFMS Services Ltd// Bristol & Bath Science Park // Dirac Crescent
>>> //
>>> > Emersons Green // Bristol // BS16 7FR
>>> >
>>> > CFMS Services Ltd is registered in England and Wales No 05742022 -
>>> a
>>> > subsidiary of CFMS Ltd
>>> > CFMS Services Ltd registered office // Victoria House // 51
>>> Victoria
>>> > Street // Bristol // BS1 6AD
>>> >
>>> > On 16 November 2015 at 16:50, Jose Tavares >> > > wrote:
>>> >
>>> >

Re: [ceph-users] Problem with infernalis el7 package

2015-11-17 Thread Ken Dreyer
You're right stijn, I apologize that we did not bump the release
number in this case. That would have been the correct thing to do, but
our build system simply isn't set up to do that easily, and we wanted
to get a fix out as soon as possible.

- Ken

On Wed, Nov 11, 2015 at 1:34 AM, Stijn De Weirdt
 wrote:
> did you recreate new rpms with same version/release? it would be better
> to make new rpms with different release (e.g. 9.2.0-1). we have
> snapshotted mirrors and nginx caches between ceph yum repo and the nodes
> that install the rpms, so cleaning the cache locally will not help.
>
> stijn
>
> On 11/11/2015 01:06 AM, Ken Dreyer wrote:
>> On Mon, Nov 9, 2015 at 6:03 PM, Bob R  wrote:
>>> We've got two problems trying to update our cluster to infernalis-
>>
>>
>> This was our bad. As indicated in http://tracker.ceph.com/issues/13746
>> , Alfredo rebuilt CentOS 7 infernalis packages, re-signed them, and
>> re-uploaded them to the same location on download.ceph.com. Please
>> clear your yum cache (`yum makecache`) and try again.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can't stop ceph

2015-11-17 Thread Ken Dreyer
The version of the documentation you were browsing was for "argonaut",
which is very old, and predates Upstart integration. Here's the
version of the docs for firefly (0.80.z), the version that you're
using on Ubuntu:

http://docs.ceph.com/docs/firefly/rados/operations/operating/

This version has the command you're looking for regarding stopping all OSDs:

  sudo stop ceph-osd-all

- Ken


On Mon, Nov 16, 2015 at 6:41 PM, Yonghua Peng  wrote:
> Thanks a lot. that works.
> Do you know how to stop all ceph-osd daemons via one command?
>
>
>
> On 2015/11/17 星期二 9:35, wd_hw...@wistron.com wrote:
>>
>> Hi,
>> You may try the following command
>> 'sudo stop ceph-mon id=ceph2'
>>
>> WD
>>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Yonghua Peng
>> Sent: Tuesday, November 17, 2015 9:34 AM
>> To: ceph-users@lists.ceph.com
>> Subject: [ceph-users] can't stop ceph
>>
>> Hello,
>>
>> My system is ubuntu 12.04, ceph 0.80.10 installed.
>> I followed the document here,
>> http://docs.ceph.com/docs/argonaut/init/
>>
>> But couldn't stop a mon daemon successfully.
>>
>> root@ceph2:~# ps -efw|grep ceph-
>> root  2763 1  0 Oct28 ?00:05:11 /usr/bin/ceph-mon
>> --cluster=ceph -i ceph2 -f
>> root  4299 1  0 Oct28 ?00:21:49 /usr/bin/ceph-osd
>> --cluster=ceph -i 0 -f
>> root  4703 1  0 Oct28 ?00:21:44 /usr/bin/ceph-osd
>> --cluster=ceph -i 1 -f
>> root 12353 1  0 Oct29 ?00:21:08 /usr/bin/ceph-osd
>> --cluster=ceph -i 2 -f
>> root 19143 17226  0 09:29 pts/400:00:00 grep --color=auto ceph-
>> root@ceph2:~#
>> root@ceph2:~# service ceph -v stop mon
>> root@ceph2:~# echo $?
>> 0
>> root@ceph2:~# ps -efw|grep ceph-
>> root  2763 1  0 Oct28 ?00:05:11 /usr/bin/ceph-mon
>> --cluster=ceph -i ceph2 -f
>> root  4299 1  0 Oct28 ?00:21:49 /usr/bin/ceph-osd
>> --cluster=ceph -i 0 -f
>> root  4703 1  0 Oct28 ?00:21:44 /usr/bin/ceph-osd
>> --cluster=ceph -i 1 -f
>> root 12353 1  0 Oct29 ?00:21:08 /usr/bin/ceph-osd
>> --cluster=ceph -i 2 -f
>> root 19184 17226  0 09:29 pts/400:00:00 grep --color=auto ceph-
>>
>>
>> Can you help? thanks.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> ---
>> This email contains confidential or legally privileged information and is
>> for the sole use of its intended recipient.
>> Any unauthorized review, use, copying or distribution of this email or the
>> content of this email is strictly prohibited.
>> If you are not the intended recipient, you may reply to the sender and
>> should delete this e-mail immediately.
>>
>> ---
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bcache and Ceph Question

2015-11-17 Thread German Anders
Hi all,

Is there any way to use bcache in an already configured Ceph cluster? I've
both OSD and Journals inside the same OSD daemon, and I want to try bcache
in front of the OSD daemon and also move in the bcache device the journal,
so for example I got:

/dev/sdc  --> SSD disk
/dev/sdc1  --> 1st partition GPT for Journal (5G)
/dev/sdc2  --> 2nd partition GPT for Journal (5G)
/dev/sdc3  --> 3rd partition GPT for Journal (5G)
/dev/sdc4  --> partition GPT for bcache device (200G)

Then relate */dev/sdf1*, */dev/sdg1 *and* /dev/sdh1* all *OSD daemons 2TB
each in size*, to each of the bcache partitions on the */dev/sdc* device.
Is possible or I will need to 'format' all the devices in order to do this
kind of procedure? any ideas? Or is any other better approach to this?

Thanks in advance,

Regards,

*German*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disaster recovery of monitor

2015-11-17 Thread Jose Tavares
Now I tried to inject the latest map I had.
Also, I created a second monitor on osnode02, like I had before, using the
same map.
I started both monitors ...

Logs from osnode01 show my content ... and then it started to show lines
like

2015-11-17 10:56:26.515069 7fc73af67700  0
mon.osnode01@0(probing).data_health(1)
update_stats avail 19% total 220 GB, used 178 GB, avail 43178 MB

What does that mean?
Attached are the logs.

Thanks a lot.
Jose Tavares








On Tue, Nov 17, 2015 at 10:33 AM, Jose Tavares  wrote:

>
>
> On Tue, Nov 17, 2015 at 7:27 AM, Joao Eduardo Luis  wrote:
>
>> On 11/17/2015 03:56 AM, Jose Tavares wrote:
>> > The problem is that I think I don't have any good monitor anymore.
>> > How do I know if the map I am trying is ok?
>> >
>> > I also saw in the logs that the primary mon was trying to contact a
>> > removed mon at IP .112 .. So, I added .112 again ... and it didn't help.
>> >
>> > Attached are the logs of what is going on and some monmaps that I
>> > capture that were from minutes before the cluster become inaccessible ..
>> >
>> > Should I try inject this monmaps in my primary mon to see if it can
>> > recover the cluster?
>> > Is it possible to see if this monmaps match my content?
>>
>> Without access to the actual store.db there's no way to ascertain if the
>> store has any problems, and even then figuring out a potential
>> corruption from just one monitor store.db would either be impossible or
>> impractical.
>>
>
> I posted my store.db in my previous answer ..
>
>
>
>>
>> That said, from the log you attached it seems you only have issues with
>> authentication: you have pgmaps from epoch 91923 through to 92589, you
>> have an mds map (epoch 38), osdmaps at least through epoch 307, and 40
>> versions for the auth keys.
>>
>> Somehow, though, your monitors are unable to authenticate each other. No
>> way to tell if that was corruption or user error.
>>
>> You should be able to get your monitors back to speaking terms again
>> simply by disabling cephx temporarily. Then you can figure out whatever
>> you need to figure out in terms of monitor keys.
>>
>> Just update your ceph.conf with 'auth supported = none' and restart the
>> monitors. See how it goes from there.
>>
>
> I tried your suggestion and it didn't make any change to the results .. :(
>
> Thanks a lot.
> Jose Tavares
>
>
>
>> HTH
>>
>>   -Joao
>>
>>
>>
>> >
>> > Thanks a lot.
>> > Jose Tavares
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Nov 16, 2015 at 10:48 PM, Nathan Harper
>> > mailto:nathan.har...@cfms.org.uk>> wrote:
>> >
>> > I had to go through a similar process when we had a disaster which
>> > destroyed one of our monitors.   I followed the process here:
>> > REMOVING MONITORS FROM AN UNHEALTHY CLUSTER
>> > 
>> to
>> > remove all but one monitor, which let me bring the cluster back up.
>> >
>> > As you are running an older version of Ceph than hammer, some of the
>> > commands might differ (perhaps this might
>> > help
>> http://docs.ceph.com/docs/v0.80/rados/operations/add-or-rm-mons/)
>> >
>> >
>> > --
>> > *Nathan Harper*// IT Systems Architect
>> >
>> > *e: * nathan.har...@cfms.org.uk 
>> > // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: *
>> > www.cfms.org.uk  // Linkedin grey icon
>> > scaled 
>> > CFMS Services Ltd// Bristol & Bath Science Park // Dirac Crescent //
>> > Emersons Green // Bristol // BS16 7FR
>> >
>> > CFMS Services Ltd is registered in England and Wales No 05742022 - a
>> > subsidiary of CFMS Ltd
>> > CFMS Services Ltd registered office // Victoria House // 51 Victoria
>> > Street // Bristol // BS1 6AD
>> >
>> > On 16 November 2015 at 16:50, Jose Tavares > > > wrote:
>> >
>> > Hi guys ...
>> > I need some help as my cluster seems to be corrupted.
>> >
>> > I saw here ..
>> >
>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg01919.html
>> > .. a msg from 2013 where Peter had a problem with his monitors.
>> >
>> > I had the same problem today when trying to add a new monitor,
>> > and than playing with monmap as the monitors were not entering
>> > the quorum. I'm using version 0.80.8.
>> >
>> > Right now my cluster won't start because of a corrupted monitor.
>> > Is it possible to remove all monitors and create just a new one
>> > without losing data? I have ~260GB of data with work from 2
>> weeks.
>> >
>> > What should I do? Do you recommend any specific procedure?
>> >
>> > Thanks a lot.
>> > Jose Tavares
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com 
>>

Re: [ceph-users] Disaster recovery of monitor

2015-11-17 Thread Joao Eduardo Luis
On 11/17/2015 12:27 PM, Jose Tavares wrote:
> My concern is about this log line 
> 
> 2015-11-17 10:11:16.143864 7f81e14aa700  0
> mon.osnode01@0(probing).data_health(0) update_stats avail 19% total 220
> GB, used 178 GB, avail 43194 MB
> 
> I use to have 7TB of available space with 263G of content replicated to
> ~800G .. 

That one line only refers to the disk backing the monitor, not the cluster.

  -Joao

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disaster recovery of monitor

2015-11-17 Thread Jose Tavares
On Tue, Nov 17, 2015 at 7:27 AM, Joao Eduardo Luis  wrote:

> On 11/17/2015 03:56 AM, Jose Tavares wrote:
> > The problem is that I think I don't have any good monitor anymore.
> > How do I know if the map I am trying is ok?
> >
> > I also saw in the logs that the primary mon was trying to contact a
> > removed mon at IP .112 .. So, I added .112 again ... and it didn't help.
> >
> > Attached are the logs of what is going on and some monmaps that I
> > capture that were from minutes before the cluster become inaccessible ..
> >
> > Should I try inject this monmaps in my primary mon to see if it can
> > recover the cluster?
> > Is it possible to see if this monmaps match my content?
>
> Without access to the actual store.db there's no way to ascertain if the
> store has any problems, and even then figuring out a potential
> corruption from just one monitor store.db would either be impossible or
> impractical.
>

I posted my store.db in my previous answer ..



>
> That said, from the log you attached it seems you only have issues with
> authentication: you have pgmaps from epoch 91923 through to 92589, you
> have an mds map (epoch 38), osdmaps at least through epoch 307, and 40
> versions for the auth keys.
>
> Somehow, though, your monitors are unable to authenticate each other. No
> way to tell if that was corruption or user error.
>
> You should be able to get your monitors back to speaking terms again
> simply by disabling cephx temporarily. Then you can figure out whatever
> you need to figure out in terms of monitor keys.
>
> Just update your ceph.conf with 'auth supported = none' and restart the
> monitors. See how it goes from there.
>

I tried your suggestion and it didn't make any change to the results .. :(

Thanks a lot.
Jose Tavares



> HTH
>
>   -Joao
>
>
>
> >
> > Thanks a lot.
> > Jose Tavares
> >
> >
> >
> >
> >
> > On Mon, Nov 16, 2015 at 10:48 PM, Nathan Harper
> > mailto:nathan.har...@cfms.org.uk>> wrote:
> >
> > I had to go through a similar process when we had a disaster which
> > destroyed one of our monitors.   I followed the process here:
> > REMOVING MONITORS FROM AN UNHEALTHY CLUSTER
> > 
> to
> > remove all but one monitor, which let me bring the cluster back up.
> >
> > As you are running an older version of Ceph than hammer, some of the
> > commands might differ (perhaps this might
> > help
> http://docs.ceph.com/docs/v0.80/rados/operations/add-or-rm-mons/)
> >
> >
> > --
> > *Nathan Harper*// IT Systems Architect
> >
> > *e: * nathan.har...@cfms.org.uk 
> > // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: *
> > www.cfms.org.uk  // Linkedin grey icon
> > scaled 
> > CFMS Services Ltd// Bristol & Bath Science Park // Dirac Crescent //
> > Emersons Green // Bristol // BS16 7FR
> >
> > CFMS Services Ltd is registered in England and Wales No 05742022 - a
> > subsidiary of CFMS Ltd
> > CFMS Services Ltd registered office // Victoria House // 51 Victoria
> > Street // Bristol // BS1 6AD
> >
> > On 16 November 2015 at 16:50, Jose Tavares  > > wrote:
> >
> > Hi guys ...
> > I need some help as my cluster seems to be corrupted.
> >
> > I saw here ..
> >
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg01919.html
> > .. a msg from 2013 where Peter had a problem with his monitors.
> >
> > I had the same problem today when trying to add a new monitor,
> > and than playing with monmap as the monitors were not entering
> > the quorum. I'm using version 0.80.8.
> >
> > Right now my cluster won't start because of a corrupted monitor.
> > Is it possible to remove all monitors and create just a new one
> > without losing data? I have ~260GB of data with work from 2
> weeks.
> >
> > What should I do? Do you recommend any specific procedure?
> >
> > Thanks a lot.
> > Jose Tavares
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disaster recovery of monitor

2015-11-17 Thread Jose Tavares
On Tue, Nov 17, 2015 at 6:32 AM, Wido den Hollander  wrote:

> On 11/17/2015 04:56 AM, Jose Tavares wrote:
> > The problem is that I think I don't have any good monitor anymore.
> > How do I know if the map I am trying is ok?
> >
>
> How do you mean there is no good monitor? Did you encounter a disk
> failure or something?
>

No.
Describing in detail what I did ...
After adding and removing monitors a few times to try to get them into the
quorum, I finished with just 1 mon and a monmap that was reflecting 3
monitors. The only remaining monitor got stuck.

After that, I took the monmap, removed the monitors and inject the monmap
back, following this suggestion ..
http://docs.ceph.com/docs/v0.78/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

Mon was stopped by the time I committed the changes.




>
> > I also saw in the logs that the primary mon was trying to contact a
> > removed mon at IP .112 .. So, I added .112 again ... and it didn't help.
> >
>
> "Added" again? You started that monitor?
>

Yes, I saw that after starting the monitor, logs pointed to .112, so I
started it again ..



>
> > Attached are the logs of what is going on and some monmaps that I
> > capture that were from minutes before the cluster become inaccessible ..
> >
>
> Isn't there a huge timedrift somewhere? Failing cephx authorization can
> also point at a huge timedrift on the clients and OSDs. Are you sure the
> time is correct?
>

The only timedrift could be from the injected monmap from some minutes
before .



> > Should I try inject this monmaps in my primary mon to see if it can
> > recover the cluster?
> > Is it possible to see if this monmaps match my content?
> >
>
> The monmaps probably didn't change that much. But a good Monitor also
> has the PGMaps, OSDMaps, etc. You need a lot more then just a monmap.
>
> But check the time first on those machines.
>

Times are ok ...

About store.db, I have the following ..

 osnode01:/var/lib/ceph/mon/ceph-osnode01 # ls -lR *
-rw-r--r-- 1 root root0 Nov 16 14:58 done
-rw-r--r-- 1 root root   77 Nov  3 17:43 keyring
-rw-r--r-- 1 root root0 Nov  3 17:43 systemd

store.db:
total 2560
-rw-r--r-- 1 root root 2105177 Nov 16 19:09 004629.ldb
-rw-r--r-- 1 root root  250057 Nov 16 19:09 004630.ldb
-rw-r--r-- 1 root root  215396 Nov 16 19:36 004632.ldb
-rw-r--r-- 1 root root 282 Nov 16 19:42 004637.ldb
-rw-r--r-- 1 root root   17428 Nov 16 19:54 004640.ldb
-rw-r--r-- 1 root root   0 Nov 17 10:21 004653.log
-rw-r--r-- 1 root root  16 Nov 17 10:21 CURRENT
-rw-r--r-- 1 root root   0 Nov  3 17:43 LOCK
-rw-r--r-- 1 root root 311 Nov 17 10:21 MANIFEST-004652
osnode01:/var/lib/ceph/mon/ceph-osnode01 #


My concern is about this log line 

2015-11-17 10:11:16.143864 7f81e14aa700  0
mon.osnode01@0(probing).data_health(0)
update_stats avail 19% total 220 GB, used 178 GB, avail 43194 MB

I use to have 7TB of available space with 263G of content replicated to
~800G ..

Thanks a lot ..
Jose Tavares






> Wido
>
> > Thanks a lot.
> > Jose Tavares
> >
> >
> >
> >
> >
> > On Mon, Nov 16, 2015 at 10:48 PM, Nathan Harper
> > mailto:nathan.har...@cfms.org.uk>> wrote:
> >
> > I had to go through a similar process when we had a disaster which
> > destroyed one of our monitors.   I followed the process here:
> > REMOVING MONITORS FROM AN UNHEALTHY CLUSTER
> > 
> to
> > remove all but one monitor, which let me bring the cluster back up.
> >
> > As you are running an older version of Ceph than hammer, some of the
> > commands might differ (perhaps this might
> > help
> http://docs.ceph.com/docs/v0.80/rados/operations/add-or-rm-mons/)
> >
> >
> > --
> > *Nathan Harper*// IT Systems Architect
> >
> > *e: * nathan.har...@cfms.org.uk 
> > // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: *
> > www.cfms.org.uk  // Linkedin grey icon
> > scaled 
> > CFMS Services Ltd// Bristol & Bath Science Park // Dirac Crescent //
> > Emersons Green // Bristol // BS16 7FR
> >
> "> CFMS Services Ltd is registered in England and Wales No 05742022 - a
> > subsidiary of CFMS Ltd
> > CFMS Services Ltd registered office // Victoria House // 51 Victoria
> > Street // Bristol // BS1 6AD
> >
> > On 16 November 2015 at 16:50, Jose Tavares  > > wrote:
> >
> > Hi guys ...
> > I need some help as my cluster seems to be corrupted.
> >
> > I saw here ..
> >
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg01919.html
> > .. a msg from 2013 where Peter had a problem with his monitors.
> >
> > I had the same problem today when trying to add a new monitor,
> > and than playing with monmap as the monitors were not entering
> > the 

Re: [ceph-users] Cannot mount CephFS after irreversible OSD lost

2015-11-17 Thread John Spray
On Tue, Nov 17, 2015 at 12:17 PM, Mykola Dvornik
 wrote:
> Dear John,
>
> Thanks for such a prompt reply!
>
> Seems like something happens on the mon side, since there are no
> mount-specific requests logged on the mds side (see below).
> FYI, some hours ago I've disabled auth completely, but it didn't help.
>
> The serialized metadata pool is 9.7G. I can try to compress it with 7z, then
> setup rssh account for you to scp/rsync it.
>
> debug mds = 20
> debug mon = 20

Don't worry about the mon logs.  That MDS log snippet appears to be
from several minutes earlier than the client's attempt to mount.

In these cases it's generally simpler if you truncate all the logs,
then attempt the mount, then send all the logs in full rather than
snippets, so that we can be sure nothing is missing.

Please also get the client log (use the fuse client with --debug-client=20).

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cannot mount CephFS after irreversible OSD lost

2015-11-17 Thread Mykola Dvornik
Dear John,

Thanks for such a prompt reply!

Seems like something happens on the mon side, since there are no
mount-specific requests logged on the mds side (see below).
FYI, some hours ago I've disabled auth completely, but it didn't help.

The serialized metadata pool is 9.7G. I can try to compress it with 7z,
then setup rssh account for you to scp/rsync it.

debug mds = 20
debug mon = 20

*grep CLI.ENT.IPA.DDR /var/log/ceph/ceph-mon.000-s-ragnarok.log*

2015-11-17 12:46:20.763049 7ffa90d11700 10 mon.000-s-ragnarok@0(leader) e1
ms_verify_authorizer xxx.xxx.xxx.xxx:0/137313644 client protocol 0
2015-11-17 12:46:20.763687 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader) e1
_ms_dispatch new session 0x5602b5178840 MonSession(unknown.0
xxx.xxx.xxx.xxx:0/137313644 is open)
2015-11-17 12:46:20.763699 7ffa8b2e7700 20 mon.000-s-ragnarok@0(leader) e1
caps
2015-11-17 12:46:20.763720 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader).auth
v5435 preprocess_query auth(proto 0 34 bytes epoch 0) from unknown.0
xxx.xxx.xxx.xxx:0/137313644
2015-11-17 12:46:20.763726 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader).auth
v5435 prep_auth() blob_size=34
2015-11-17 12:46:20.763738 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader).auth
v5435 AuthMonitor::assign_global_id m=auth(proto 0 34 bytes epoch 0)
mon=0/1 last_allocated=1614103 max_global_id=1624096
2015-11-17 12:46:20.763741 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader).auth
v5435 next_global_id should be 1614104
2015-11-17 12:46:20.763817 7ffa8b2e7700  2 mon.000-s-ragnarok@0(leader) e1
send_reply 0x5602b5350920 0x5602b535a480 auth_reply(proto 2 0 (0) Success)
v1
2015-11-17 12:46:20.764469 7ffa8b2e7700 20 mon.000-s-ragnarok@0(leader) e1
_ms_dispatch existing session 0x5602b5178840 for unknown.0
xxx.xxx.xxx.xxx:0/137313644
2015-11-17 12:46:20.764475 7ffa8b2e7700 20 mon.000-s-ragnarok@0(leader) e1
caps
2015-11-17 12:46:20.764492 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader).auth
v5435 preprocess_query auth(proto 2 32 bytes epoch 0) from unknown.0
xxx.xxx.xxx.xxx:0/137313644
2015-11-17 12:46:20.764497 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader).auth
v5435 prep_auth() blob_size=32
2015-11-17 12:46:20.764705 7ffa8b2e7700  2 mon.000-s-ragnarok@0(leader) e1
send_reply 0x5602b5350920 0x5602b535b680 auth_reply(proto 2 0 (0) Success)
v1
2015-11-17 12:46:20.765279 7ffa8b2e7700 20 mon.000-s-ragnarok@0(leader) e1
_ms_dispatch existing session 0x5602b5178840 for unknown.0
xxx.xxx.xxx.xxx:0/137313644
2015-11-17 12:46:20.765287 7ffa8b2e7700 20 mon.000-s-ragnarok@0(leader) e1
caps allow *
2015-11-17 12:46:20.765303 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader).auth
v5435 preprocess_query auth(proto 2 165 bytes epoch 0) from unknown.0
xxx.xxx.xxx.xxx:0/137313644
2015-11-17 12:46:20.765310 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader).auth
v5435 prep_auth() blob_size=165
2015-11-17 12:46:20.765532 7ffa8b2e7700  2 mon.000-s-ragnarok@0(leader) e1
send_reply 0x5602b5350920 0x5602b535a000 auth_reply(proto 2 0 (0) Success)
v1
2015-11-17 12:46:20.766113 7ffa8b2e7700 20 mon.000-s-ragnarok@0(leader) e1
_ms_dispatch existing session 0x5602b5178840 for unknown.0
xxx.xxx.xxx.xxx:0/137313644

*and then*

2015-11-17 12:48:20.767152 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader) e1
ms_handle_reset 0x5602b5913b80 xxx.xxx.xxx.xxx:0/137313644
2015-11-17 12:48:20.767167 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader) e1
reset/close on session unknown.0 xxx.xxx.xxx.xxx:0/137313644
2015-11-17 12:48:20.767173 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader) e1
remove_session 0x5602b5178840 unknown.0 xxx.xxx.xxx.xxx:0/137313644

*session-specific stuff*

2015-11-17 12:46:20.763817 7ffa8b2e7700  2 mon.000-s-ragnarok@0(leader) e1
send_reply 0x5602b5350920 0x5602b535a480 auth_reply(proto 2 0 (0) Success)
v1
2015-11-17 12:46:20.764705 7ffa8b2e7700  2 mon.000-s-ragnarok@0(leader) e1
send_reply 0x5602b5350920 0x5602b535b680 auth_reply(proto 2 0 (0) Success)
v1
2015-11-17 12:46:20.765532 7ffa8b2e7700  2 mon.000-s-ragnarok@0(leader) e1
send_reply 0x5602b5350920 0x5602b535a000 auth_reply(proto 2 0 (0) Success)
v1
2015-11-17 12:46:21.995713 7ffa8b2e7700  2 mon.000-s-ragnarok@0(leader) e1
send_reply 0x5602b5350920 0x5602b5278900 mdsbeacon(1614101/000-s-ragnarok
up:active seq 184 v9429) v4
2015-11-17 12:46:23.039318 7ffa8d109700  2 mon.000-s-ragnarok@0(leader) e1
send_reply 0x5602b5350920 0x5602b5388800 pg_stats_ack(1 pgs tid 389) v1
2015-11-17 12:47:24.056767 7ffa8d109700  2 mon.000-s-ragnarok@0(leader) e1
send_reply 0x5602b5350920 0x5602b5357400 pg_stats_ack(1 pgs tid 337) v1
2015-11-17 12:47:50.082888 7ffa8d109700  2 mon.000-s-ragnarok@0(leader) e1
send_reply 0x5602b5350920 0x5602b5cd6400 pg_stats_ack(2 pgs tid 263) v1

2015-11-17 12:46:20.763687 7ffa8b2e7700 10 mon.000-s-ragnarok@0(leader) e1
_ms_dispatch new session 0x5602b5178840 MonSession(unknown.0
xxx.xxx.xxx.xxx:0/137313644 is open)
2015-11-17 12:46:20.764469 7ffa8b2e7700 20 mon.000-s-ragnarok@0(leader) e1
_ms_dispatch existing session 0x5602b5178840 for unknown.0
xxx.xxx.xxx.xxx:0/137313644
2

Re: [ceph-users] restart all nodes

2015-11-17 Thread Wido den Hollander


On 17-11-15 11:07, Patrik Plank wrote:
> Hi,
> 
> 
> maybe a trivial question :-||
> 
> I have to shut down all my ceph nodes.
> 
> What's the best way to do this.
> 
> Can I just shut down all nodes or should i
> 
> first shut down the ceph process?
> 

First, set the noout flag in the monitors:

$ ceph osd set noout

Afterwards, you can shut down the OSDs and then the monitors.

When booting again, start the monitors first and then the OSDs.

Wido

> 
> best regards
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cannot mount CephFS after irreversible OSD lost

2015-11-17 Thread John Spray
On Tue, Nov 17, 2015 at 10:08 AM, Mykola Dvornik
 wrote:
> However when I've brought the mds back online the CephFS cannot be mounted
> anymore complaining on the client side 'mount error 5 = Input/output error'.
> Since mds was running just fine without any suspicious messages in its log,
> I've decided that something happened to its journal and CephFS disaster
> recovery is needed. I've stopped the mds and tried to make a backup of the
> journal. UnfortunatelyA, the tool crashed with the following output:

A journal corruption would be more likely to make the MDS fail to go
active.  It sounds like on your system the MDS is making it into an
active state, and staying up through a client mount, but the client
itself is failing to mount.  You should investigate the client mount
failure more closely to work out what's going wrong.

Run the MDS with "debug mds = 20" and a fuse client with "debug client
= 20", to gather evidence of how/why the client mount is actually
failing.

> cephfs-journal-tool journal export backup.bin
> journal is 1841503004303~12076
> *** buffer overflow detected ***: cephfs-journal-tool terminated
> === Backtrace: =
> /lib64/libc.so.6(__fortify_fail+0x37)[0x7f175ef12a57]
> /lib64/libc.so.6(+0x10bc10)[0x7f175ef10c10]
> /lib64/libc.so.6(+0x10b119)[0x7f175ef10119]
> /lib64/libc.so.6(_IO_vfprintf+0x2f00)[0x7f175ee4f430]
> /lib64/libc.so.6(__vsprintf_chk+0x88)[0x7f175ef101a8]
> /lib64/libc.so.6(__sprintf_chk+0x7d)[0x7f175ef100fd]
> cephfs-journal-tool(_ZN6Dumper4dumpEPKc+0x630)[0x7f1763374720]
> cephfs-journal-tool(_ZN11JournalTool14journal_exportERKSsb+0x294)[0x7f1763357874]
> cephfs-journal-tool(_ZN11JournalTool12main_journalERSt6vectorIPKcSaIS2_EE+0x105)[0x7f17633580c5]
> cephfs-journal-tool(_ZN11JournalTool4mainERSt6vectorIPKcSaIS2_EE+0x56e)[0x7f17633514de]
> cephfs-journal-tool(main+0x1de)[0x7f1763350d4e]
> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f175ee26af5]
> cephfs-journal-tool(+0x1ccae9)[0x7f1763356ae9]
> ...
> -3> 2015-11-17 10:43:00.874529 7f174db4b700  1 --
> xxx.xxx.xxx.xxx:6802/3019233561 <== osd.9 xxx.xxx.xxx.xxx:6808/13662 1 
> osd_op_reply(4 200.0006b309 [stat] v0'0 uv0 ack = -2 ((2) No such file or
> directory)) v6  179+0+0 (2303160312 0 0) 0x7f1767c719c0 con
> 0x7f1767d194a0

Oops, that's a bug.  http://tracker.ceph.com/issues/13816

> ...
>
> So I've used rados tool to export the cephfs_metadata pool, and then
> proceeded with
>
> cephfs-journal-tool event recover_dentries summary
> cephfs-journal-tool journal reset
> cephfs-table-tool all reset session
> ceph fs reset home --yes-i-really-mean-it
>
> After this manipulation, the cephfs-journal-tool journal export backup.rec
> worked, but wrote 48 bytes at around 1.8TB offset!

That's expected behaviour -- the journal has been reset to nothing,
but the write position is still where it was (the export function
writes a sparse file with location based on the journal offset in
ceph).

> Then I've brought mds back online, but CephFS is still non-mountable.
> I've tried to flush the journal with:
>
> ceph daemon mds.000-s-ragnarok flush journal

Yep, that's not going to do anything because there's little or nothing
in the journal after its reset.

> No luck. Then I've stopped mds and relaunched with
>
> ceph-mds -i 000-s-ragnarok --journal_check 0 --debug_mds=10 --debug_ms=100
>
> It persistently outputs this snippet for a couple of hours:
>
> 7faf0bd58700  7 mds.0.cache trim max=10  cur=17
> 7faf0bd58700 10 mds.0.cache trim_client_leases
> 7faf0bd58700  2 mds.0.cache check_memory_usage total 256288, rss 19116, heap
> 48056, malloc 1791 mmap 0, baseline 48056, buffers 0, 0 / 19 inodes have
> caps, 0 caps, 0 caps per inode
> 7faf0bd58700 10 mds.0.log trim 1 / 30 segments, 8 / -1 events, 0 (0)
> expiring, 0 (0) expired
> 7faf0bd58700 10 mds.0.log _trim_expired_segments waiting for
> 1841488226436/1841503004303 to expire
> 7faf0bd58700 10 mds.0.server find_idle_sessions.  laggy until 0.00
> 7faf0bd58700 10 mds.0.locker scatter_tick
> 7faf0bd58700 10 mds.0.cache find_stale_fragment_freeze
> 7faf0bd58700 10 mds.0.snap check_osd_map - version unchanged
> 7faf0b557700 10 mds.beacon.000-s-ragnarok _send up:active seq 12

The key bit of MDS debug output is the part from where a client tries to mount.

> So it appears to me that even despite 'cephfs-journal-tool journal reset',
> the journal is not wiped and its corruption blocks CephFS from being
> mounted.

Nope, I don't think that's the right conclusion: we don't know what's
broken yet.

> At the moment I am running 'cephfs-data-scan scan_extents cephfs_data'. I
> guess it won't help me much to back CephFS back online, but might fix some
> corrupted metadata.

It could help (although it's the subsequent scan_inodes step that
actually updates the metadata pool), but only if the reason the
clients aren't mounting is because there is missing metadata for e.g.
the root inode that is causing the client mount failure.

> So my questing is how to identify wh

[ceph-users] restart all nodes

2015-11-17 Thread Patrik Plank
Hi,



maybe a trivial question :-||

I have to shut down all my ceph nodes.

What's the best way to do this.

Can I just shut down all nodes or should i 

first shut down the ceph process?



best regards

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cannot mount CephFS after irreversible OSD lost

2015-11-17 Thread Mykola Dvornik

Dear ceph experts,

I've built and administrating 12 OSD ceph cluster (spanning over 3 
nodes) with replication count of 2. The ceph version is


ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)

The cluster hosts two pools (data and metadata) that are exported over 
CephFS.


At some point the OSDs approached 'full' state one of them got 
corrupted. The easiest solution was to remove/re-add the wiped OSD back.


It went fine, the cluster was recovering without issues. At the point 
of only 39 degraded objects left another OSD  corrupted (its peer 
actually). I was not able to recover it and I have made a hard decision 
to remove it, wipe and re-add back to the cluster. Since no backups 
have been made, the data corruption was expected.


To my surprise when all OSDs got back online and cluster started to 
recover, only one incomplete PG has been reported. I've worked around 
it by ssh'ing to the node that holds its primary OSDs and then 
exporting the corrupted pg with 'ceph-objectstore-tool --op export' 
marking it 'complete' afterwards. Once cluster recovered, I've imported 
the pg's data back to its primary OSD. The recovery then fully 
completed and at the moment 'ceph -s' gives me:


   cluster 7972d1e9-2843-41a3-a4e7-9889d9c75850
health HEALTH_WARN
   1 near full osd(s)
monmap e1: 1 mons at {000-s-ragnarok=xxx.xxx.xxx.xxx:6789/0}
   election epoch 1, quorum 0 000-s-ragnarok
mdsmap e9393: 1/1/0 up {0=000-s-ragnarok=up:active}
osdmap e185363: 12 osds: 12 up, 12 in
 pgmap v5599327: 1024 pgs, 2 pools, 7758 GB data, 22316 kobjects
   15804 GB used, 6540 GB / 22345 GB avail
   1020 active+clean
  4 active+clean+scrubbing+deep

However when I've brought the mds back online the CephFS cannot be 
mounted anymore complaining on the client side 'mount error 5 = 
Input/output error'. Since mds was running just fine without any 
suspicious messages in its log, I've decided that something happened to 
its journal and CephFS disaster recovery is needed. I've stopped the 
mds and tried to make a backup of the journal. UnfortunatelyA, the tool 
crashed with the following output:


cephfs-journal-tool journal export backup.bin
journal is 1841503004303~12076
*** buffer overflow detected ***: cephfs-journal-tool terminated
=== Backtrace: =
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f175ef12a57]
/lib64/libc.so.6(+0x10bc10)[0x7f175ef10c10]
/lib64/libc.so.6(+0x10b119)[0x7f175ef10119]
/lib64/libc.so.6(_IO_vfprintf+0x2f00)[0x7f175ee4f430]
/lib64/libc.so.6(__vsprintf_chk+0x88)[0x7f175ef101a8]
/lib64/libc.so.6(__sprintf_chk+0x7d)[0x7f175ef100fd]
cephfs-journal-tool(_ZN6Dumper4dumpEPKc+0x630)[0x7f1763374720]
cephfs-journal-tool(_ZN11JournalTool14journal_exportERKSsb+0x294)[0x7f1763357874]
cephfs-journal-tool(_ZN11JournalTool12main_journalERSt6vectorIPKcSaIS2_EE+0x105)[0x7f17633580c5]
cephfs-journal-tool(_ZN11JournalTool4mainERSt6vectorIPKcSaIS2_EE+0x56e)[0x7f17633514de]
cephfs-journal-tool(main+0x1de)[0x7f1763350d4e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f175ee26af5]
cephfs-journal-tool(+0x1ccae9)[0x7f1763356ae9]
...
-3> 2015-11-17 10:43:00.874529 7f174db4b700  1 -- 
xxx.xxx.xxx.xxx:6802/3019233561 <== osd.9 xxx.xxx.xxx.xxx:6808/13662 1 
 osd_op_reply(4 200.0006b309 [stat] v0'0 uv0 ack = -2 ((2) No such 
file or directory)) v6  179+0+0 (2303160312 0 0) 0x7f1767c719c0 con 
0x7f1767d194a0

...

So I've used rados tool to export the cephfs_metadata pool, and then 
proceeded with


cephfs-journal-tool event recover_dentries summary
cephfs-journal-tool journal reset
cephfs-table-tool all reset session
ceph fs reset home --yes-i-really-mean-it

After this manipulation, the cephfs-journal-tool journal export 
backup.rec worked, but wrote 48 bytes at around 1.8TB offset!


Then I've brought mds back online, but CephFS is still non-mountable.

I've tried to flush the journal with:

ceph daemon mds.000-s-ragnarok flush journal

No luck. Then I've stopped mds and relaunched with

ceph-mds -i 000-s-ragnarok --journal_check 0 --debug_mds=10 
--debug_ms=100


It persistently outputs this snippet for a couple of hours:

7faf0bd58700  7 mds.0.cache trim max=10  cur=17
7faf0bd58700 10 mds.0.cache trim_client_leases
7faf0bd58700  2 mds.0.cache check_memory_usage total 256288, rss 19116, 
heap 48056, malloc 1791 mmap 0, baseline 48056, buffers 0, 0 / 19 
inodes have caps, 0 caps, 0 caps per inode
7faf0bd58700 10 mds.0.log trim 1 / 30 segments, 8 / -1 events, 0 (0) 
expiring, 0 (0) expired
7faf0bd58700 10 mds.0.log _trim_expired_segments waiting for 
1841488226436/1841503004303 to expire

7faf0bd58700 10 mds.0.server find_idle_sessions.  laggy until 0.00
7faf0bd58700 10 mds.0.locker scatter_tick
7faf0bd58700 10 mds.0.cache find_stale_fragment_freeze
7faf0bd58700 10 mds.0.snap check_osd_map - version unchanged
7faf0b557700 10 mds.beacon.000-s-ragnarok _send up:active seq 12

So it appears to me that even despite 'cephfs-journal-too

Re: [ceph-users] Ceph Openstack deployment

2015-11-17 Thread Iban Cabrillo
Hi,
This is an old one. never used (I just delete it)...I have checked that ntp
is running fine.

Binary   Host
Zone StatusState Updated At
cinder-volume cloudvolume01@iscsi-cloudvolume01nova
enabled:-)   2015-11-17 09:57:28
cinder-volume cephvolume
nova enabled:-)   2015-11-17 09:57:34
cinder-scheduler cinder01
nova enabled:-)   2015-11-17 09:57:28

setting on HyperV

 echo $CEPH_ARGS
--keyring /etc/ceph/ceph.client.cinder.keyring --id cinder

I see now the rbd volumes correctly (without pass any argument)

01:~ # rbd ls -p volumes
test
volume-6e1c86d5-efb6-469a-bbad-58b1011507bf
volume-7da08f12-fb0f-4269-931a-d528c1507fee

But the attach is still failing (xen be: qdisk-51760: error: Could not open
'volumes/volume-6e1c86d5-efb6-469a-bbad-58b1011507bf': No such file or
directory)
..

I am not and OpenStack expert. The actual setup has one cinder-scheduler
and two backends (lvm, and Ceph). Should be the complete cinder
configuration ( [rbd] and [LVM] section on the three nodes (actually each
volume has only his section), I see that #enabled_backends option is
disable on scheduler conf and enable on volumes conf, each one of the with
only his backend, I mean rbd for cephvolume and lvm for cloudvolume )

And another question. Should be ceph, ceph-common installed on
cinder-scheduler too?

Kind Regards,

2015-11-16 17:47 GMT+01:00 M Ranga Swami Reddy :

> The below both should be in enabled stat:
> ==
> cinder-volumecephvolume   nova
> disabled   XXX   2015-10-02 08:33:06
> *cinder-volumecephvolume   nova
> enabled:-)   2015-11-16 12:01:38*
> *===*
>
> *Check if the NPT server is up and running correctly.*
> *if the above both cinder-volume should be in enabled state, then only you
> cinder commands will work.*
>
>
>
> On Mon, Nov 16, 2015 at 5:39 PM, Iban Cabrillo 
> wrote:
>
>> cephvolume:~ # cinder-manage service list (from cinder)
>> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/base.py:20:
>> DeprecationWarning: The oslo namespace package is deprecated. Please use
>> oslo_config instead.
>>   from oslo.config import cfg
>> 2015-11-16 13:01:42.203 23787 DEBUG oslo_db.api
>> [req-b2aece98-8f3d-4a2c-b50b-449281d8aeed - - - - -] Loading backend
>> 'sqlalchemy' from 'cinder.db.sqlalchemy.api' _load_backend
>> /usr/lib/python2.7/dist-packages/oslo_db/api.py:214
>> 2015-11-16 13:01:42.428 23787 DEBUG oslo_db.sqlalchemy.session
>> [req-b2aece98-8f3d-4a2c-b50b-449281d8aeed - - - - -] MySQL server mode set
>> to
>> STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
>> _check_effective_sql_mode
>> /usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/session.py:513
>> Binary   Host Zone
>> Status State Updated At
>> cinder-volumecloudvolume01@iscsi-cloudvolume01nova
>> enabled:-)   2015-11-16 12:01:36
>> cinder-scheduler cinder01 nova
>> enabledXXX   2015-10-05 18:44:25
>> cinder-scheduler cloud01  nova
>> enabledXXX   2015-10-29 13:05:42
>> cinder-volumecephvolume   nova
>> disabled   XXX   2015-10-02 08:33:06
>> *cinder-volumecephvolume   nova
>> enabled:-)   2015-11-16 12:01:38(this should be the
>> right one)*
>> cinder-volumecloudvolume01@iscsi-cloudvolume01nova
>> enabledXXX   2015-10-01 14:50:32
>> cinder-scheduler cinder01 nova
>> enabled:-)   2015-11-16 12:01:41
>>
>>
>> 2015-11-16 12:42 GMT+01:00 M Ranga Swami Reddy :
>>
>>> Hi,
>>> Can you share the output of below command:
>>>
>>> cinder-manage service list
>>>
>>>
>>> On Mon, Nov 16, 2015 at 4:45 PM, Iban Cabrillo 
>>> wrote:
>>>
 cloud:~ # cinder list

 +--+---+--+--+-+--+-+
 |  ID  |   Status  | Display Name |
 Size | Volume Type | Bootable | Attached to |

 +--+---+--+--+-+--+-+
 | 6e1c86d5-efb6-469a-bbad-58b1011507bf | available |  volumetest  |  5
   | rbd |  false   | |

 +--+---+--+--+-+--+-+
 cloud:~ # nova volume-attach 08f6fef5-7c98-445b-abfe-636c4c6fee89
 6e1c86d5-efb6-469a-bbad-58b1011507bf auto
 +--+--+
 | Property | Value|
 +--+--+
 | device   | /dev/xvdd|
 | id

Re: [ceph-users] Disaster recovery of monitor

2015-11-17 Thread Joao Eduardo Luis
On 11/17/2015 03:56 AM, Jose Tavares wrote:
> The problem is that I think I don't have any good monitor anymore.
> How do I know if the map I am trying is ok?
> 
> I also saw in the logs that the primary mon was trying to contact a
> removed mon at IP .112 .. So, I added .112 again ... and it didn't help.
> 
> Attached are the logs of what is going on and some monmaps that I
> capture that were from minutes before the cluster become inaccessible ..
> 
> Should I try inject this monmaps in my primary mon to see if it can
> recover the cluster?
> Is it possible to see if this monmaps match my content?

Without access to the actual store.db there's no way to ascertain if the
store has any problems, and even then figuring out a potential
corruption from just one monitor store.db would either be impossible or
impractical.

That said, from the log you attached it seems you only have issues with
authentication: you have pgmaps from epoch 91923 through to 92589, you
have an mds map (epoch 38), osdmaps at least through epoch 307, and 40
versions for the auth keys.

Somehow, though, your monitors are unable to authenticate each other. No
way to tell if that was corruption or user error.

You should be able to get your monitors back to speaking terms again
simply by disabling cephx temporarily. Then you can figure out whatever
you need to figure out in terms of monitor keys.

Just update your ceph.conf with 'auth supported = none' and restart the
monitors. See how it goes from there.

HTH

  -Joao



> 
> Thanks a lot.
> Jose Tavares
> 
> 
> 
> 
> 
> On Mon, Nov 16, 2015 at 10:48 PM, Nathan Harper
> mailto:nathan.har...@cfms.org.uk>> wrote:
> 
> I had to go through a similar process when we had a disaster which
> destroyed one of our monitors.   I followed the process here:
> REMOVING MONITORS FROM AN UNHEALTHY CLUSTER
>  to
> remove all but one monitor, which let me bring the cluster back up.  
> 
> As you are running an older version of Ceph than hammer, some of the
> commands might differ (perhaps this might
> help http://docs.ceph.com/docs/v0.80/rados/operations/add-or-rm-mons/)
> 
> 
> -- 
> *Nathan Harper*// IT Systems Architect
> 
> *e: * nathan.har...@cfms.org.uk 
> // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: *
> www.cfms.org.uk  // Linkedin grey icon
> scaled 
> CFMS Services Ltd// Bristol & Bath Science Park // Dirac Crescent //
> Emersons Green // Bristol // BS16 7FR
>  
> CFMS Services Ltd is registered in England and Wales No 05742022 - a
> subsidiary of CFMS Ltd
> CFMS Services Ltd registered office // Victoria House // 51 Victoria
> Street // Bristol // BS1 6AD
> 
> On 16 November 2015 at 16:50, Jose Tavares  > wrote:
> 
> Hi guys ...
> I need some help as my cluster seems to be corrupted.
> 
> I saw here .. 
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg01919.html
> .. a msg from 2013 where Peter had a problem with his monitors.
> 
> I had the same problem today when trying to add a new monitor,
> and than playing with monmap as the monitors were not entering
> the quorum. I'm using version 0.80.8.
> 
> Right now my cluster won't start because of a corrupted monitor.
> Is it possible to remove all monitors and create just a new one
> without losing data? I have ~260GB of data with work from 2 weeks.
> 
> What should I do? Do you recommend any specific procedure?
> 
> Thanks a lot.
> Jose Tavares
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] next ceph breizh camp

2015-11-17 Thread eric mourgaya
Hi,

The next ceph breizh camp will  take place  the 26th  November at
University of Nantes and begin at 10.00 AM, at:

IGARUN (institut de géographie), Salle réunion (991/992), 1er étage
Chemin de la Censive du Tertre, sur le campus Tertre de Nantes.

You can enroll yourself at :
http://doodle.com/poll/vtqum8wyk2dciqtf


have a good day,
-- 
Eric Mourgaya,


Respectons la planete!
Luttons contre la mediocrite!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disaster recovery of monitor

2015-11-17 Thread Wido den Hollander
On 11/17/2015 04:56 AM, Jose Tavares wrote:
> The problem is that I think I don't have any good monitor anymore.
> How do I know if the map I am trying is ok?
> 

How do you mean there is no good monitor? Did you encounter a disk
failure or something?

> I also saw in the logs that the primary mon was trying to contact a
> removed mon at IP .112 .. So, I added .112 again ... and it didn't help.
> 

"Added" again? You started that monitor?

> Attached are the logs of what is going on and some monmaps that I
> capture that were from minutes before the cluster become inaccessible ..
> 

Isn't there a huge timedrift somewhere? Failing cephx authorization can
also point at a huge timedrift on the clients and OSDs. Are you sure the
time is correct?

> Should I try inject this monmaps in my primary mon to see if it can
> recover the cluster?
> Is it possible to see if this monmaps match my content?
> 

The monmaps probably didn't change that much. But a good Monitor also
has the PGMaps, OSDMaps, etc. You need a lot more then just a monmap.

But check the time first on those machines.

Wido

> Thanks a lot.
> Jose Tavares
> 
> 
> 
> 
> 
> On Mon, Nov 16, 2015 at 10:48 PM, Nathan Harper
> mailto:nathan.har...@cfms.org.uk>> wrote:
> 
> I had to go through a similar process when we had a disaster which
> destroyed one of our monitors.   I followed the process here:
> REMOVING MONITORS FROM AN UNHEALTHY CLUSTER
>  to
> remove all but one monitor, which let me bring the cluster back up.  
> 
> As you are running an older version of Ceph than hammer, some of the
> commands might differ (perhaps this might
> help http://docs.ceph.com/docs/v0.80/rados/operations/add-or-rm-mons/)
> 
> 
> -- 
> *Nathan Harper*// IT Systems Architect
> 
> *e: * nathan.har...@cfms.org.uk 
> // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: *
> www.cfms.org.uk  // Linkedin grey icon
> scaled 
> CFMS Services Ltd// Bristol & Bath Science Park // Dirac Crescent //
> Emersons Green // Bristol // BS16 7FR
>  
"> CFMS Services Ltd is registered in England and Wales No 05742022 - a
> subsidiary of CFMS Ltd
> CFMS Services Ltd registered office // Victoria House // 51 Victoria
> Street // Bristol // BS1 6AD
> 
> On 16 November 2015 at 16:50, Jose Tavares  > wrote:
> 
> Hi guys ...
> I need some help as my cluster seems to be corrupted.
> 
> I saw here .. 
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg01919.html
> .. a msg from 2013 where Peter had a problem with his monitors.
> 
> I had the same problem today when trying to add a new monitor,
> and than playing with monmap as the monitors were not entering
> the quorum. I'm using version 0.80.8.
> 
> Right now my cluster won't start because of a corrupted monitor.
> Is it possible to remove all monitors and create just a new one
> without losing data? I have ~260GB of data with work from 2 weeks.
> 
> What should I do? Do you recommend any specific procedure?
> 
> Thanks a lot.
> Jose Tavares
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com