Re: [ceph-users] Ceph Maintenance

2017-08-01 Thread Richard Hesketh
On 01/08/17 12:41, Osama Hasebou wrote:
> Hi,
> 
> What would be the best possible and efficient way for big Ceph clusters when 
> maintenance needs to be performed ?
> 
> Lets say that we have 3 copies of data, and one of the servers needs to be 
> maintained, and maintenance might take 1-2 days due to some unprepared issues 
> that come up.
> 
> Setting node to no-out is a bit of risk since only 2 copies will be active. 
> So in that case what would be proper way taking down node to rebalance and 
> then perform maintanence , and in case one needs to being it back online 
> without rebalancing right away to check if its functioning properly or not as 
> a server 1st  and once all looks good, one can introduce  rebalancing again ?
> 
> 
> Thank you.
> 
> Regards,
> Ossi

The recommended practice would be to use "ceph osd crush reweight" to set the 
crush weight on the OSDs that will be down to 0. The cluster will then 
rebalance, and once it's HEALTH_OK again, you can take those OSDs offline 
without losing any redundancy (though you will need to ensure you have enough 
spare space in what's left of the cluster that you don't push disk usage too 
high on your other nodes).

When you're ready to bring them online again, make sure that you have 
"osd_crush_update_on_start = false" set in your ceph.conf so they don't 
potentially mess with their weights when they come back. Then they will be up 
but still at crush weight 0 so no data will be assigned to them. When you're 
happy everything's okay, use "ceph osd crush reweight" again to bring them back 
to their original weights. Lots of people like to do that in increments of 0.1 
weight at a time, so the recovery is staggered and doesn't impact your active 
I/O too much.

This assumes your crush layout is such that you can still have three replicas 
with one server missing.

Rich



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread Mike Jacobacci
Hi Vasu,

Thank you that is good to know!

I am running ceph version 10.2.3 and CentOS 7.2.1511 (Core) minimal.

Cheers,
Mike

On Tue, Nov 29, 2016 at 7:26 PM, Vasu Kulkarni  wrote:

> you can ignore that, its a known issue http://tracker.ceph.com/
> issues/15990
>
> regardless waht version of ceph are you running and what are the details
> of os version you updated to ?
>
> On Tue, Nov 29, 2016 at 7:12 PM, Mike Jacobacci  wrote:
>
>> Found some more info, but getting weird... All three OSD nodes shows the
>> same unknown cluster message on all the OSD disks.  I don't know where it
>> came from, all the nodes were configured using ceph-deploy on the admin
>> node.  In any case, the OSD's seem to be up and running, the health is ok.
>>
>> no ceph-disk@ services are running on any of the OSD nodes which I
>> didn't notice before and each node was setup the exact same, yet there are
>> different services listed under systemctl:
>>
>> OSD NODE 1:
>> Output in earlier email
>>
>> OSD NODE 2:
>>
>> ● ceph-disk@dev-sdb1.service
>> loaded failed failedCeph disk activation: /dev/sdb1
>>
>> ● ceph-disk@dev-sdb2.service
>> loaded failed failedCeph disk activation: /dev/sdb2
>>
>> ● ceph-disk@dev-sdb5.service
>> loaded failed failedCeph disk activation: /dev/sdb5
>>
>> ● ceph-disk@dev-sdc2.service
>> loaded failed failedCeph disk activation: /dev/sdc2
>>
>> ● ceph-disk@dev-sdc4.service
>> loaded failed failedCeph disk activation: /dev/sdc4
>>
>>
>> OSD NODE 3:
>>
>> ● ceph-disk@dev-sdb1.service
>> loaded failed failedCeph disk activation: /dev/sdb1
>>
>> ● ceph-disk@dev-sdb3.service
>> loaded failed failedCeph disk activation: /dev/sdb3
>>
>> ● ceph-disk@dev-sdb4.service
>> loaded failed failedCeph disk activation: /dev/sdb4
>>
>> ● ceph-disk@dev-sdb5.service
>> loaded failed failedCeph disk activation: /dev/sdb5
>>
>> ● ceph-disk@dev-sdc2.service
>> loaded failed failedCeph disk activation: /dev/sdc2
>>
>> ● ceph-disk@dev-sdc3.service
>> loaded failed failedCeph disk activation: /dev/sdc3
>>
>> ● ceph-disk@dev-sdc4.service
>> loaded failed failedCeph disk activation: /dev/sdc4
>>
>> From my understanding, the disks have already been activated... Should
>> these services even be running or enabled?
>>
>> Mike
>>
>>
>>
>> On Tue, Nov 29, 2016 at 6:33 PM, Mike Jacobacci  wrote:
>>
>>> Sorry about that... Here is the output of ceph-disk list:
>>>
>>> ceph-disk list
>>> /dev/dm-0 other, xfs, mounted on /
>>> /dev/dm-1 swap, swap
>>> /dev/dm-2 other, xfs, mounted on /home
>>> /dev/sda :
>>>  /dev/sda2 other, LVM2_member
>>>  /dev/sda1 other, xfs, mounted on /boot
>>> /dev/sdb :
>>>  /dev/sdb1 ceph journal
>>>  /dev/sdb2 ceph journal
>>>  /dev/sdb3 ceph journal
>>>  /dev/sdb4 ceph journal
>>>  /dev/sdb5 ceph journal
>>> /dev/sdc :
>>>  /dev/sdc1 ceph journal
>>>  /dev/sdc2 ceph journal
>>>  /dev/sdc3 ceph journal
>>>  /dev/sdc4 ceph journal
>>>  /dev/sdc5 ceph journal
>>> /dev/sdd :
>>>  /dev/sdd1 ceph data, active, unknown cluster
>>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.0
>>> /dev/sde :
>>>  /dev/sde1 ceph data, active, unknown cluster
>>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.1
>>> /dev/sdf :
>>>  /dev/sdf1 ceph data, active, unknown cluster
>>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.2
>>> /dev/sdg :
>>>  /dev/sdg1 ceph data, active, unknown cluster
>>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.3
>>> /dev/sdh :
>>>  /dev/sdh1 ceph data, active, unknown cluster
>>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.4
>>> /dev/sdi :
>>>  /dev/sdi1 ceph data, active, unknown cluster
>>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.5
>>> /dev/sdj :
>>>  /dev/sdj1 ceph data, active, unknown cluster
>>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.6
>>> /dev/sdk :
>>>  /dev/sdk1 ceph data, active, unknown cluster
>>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.7
>>> /dev/sdl :
>>>  /dev/sdl1 ceph data, active, unknown cluster
>>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.8
>>> /dev/sdm :
>>>  /dev/sdm1 ceph data, active, unknown cluster
>>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.9
>>>
>>>
>>>
>>> On Tue, Nov 29, 2016 at 6:32 PM, Mike Jacobacci 
>>> wrote:
>>>
 I forgot to add:


 On Tue, Nov 29, 2016 at 6:28 PM, Mike Jacobacci 
 wrote:

> So it looks like the journal partition is mounted:
>
> ls -lah /var/lib/ceph/osd/ceph-0/journal
> lrwxrwxrwx. 1 ceph ceph 9 Oct 10 16:11 /var/lib/ceph/osd/ceph-0/journal
> -> /dev/sdb1
>
> Here is the output of journalctl -xe when I try to start the
> ceph-diak@dev-sdb1 service:
>
> sh[17481]: mount_activate: Failed to activate
> sh[17481]: unmount: Unmounting /var/lib/ceph/tmp/mnt.m9ek7W
> sh[17481]: command_check_call: Running command: /bin/umount --
> /var/lib/ceph/tmp/mnt.m9ek7W

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread Vasu Kulkarni
you can ignore that, its a known issue http://tracker.ceph.com/issues/15990

regardless waht version of ceph are you running and what are the details of
os version you updated to ?

On Tue, Nov 29, 2016 at 7:12 PM, Mike Jacobacci  wrote:

> Found some more info, but getting weird... All three OSD nodes shows the
> same unknown cluster message on all the OSD disks.  I don't know where it
> came from, all the nodes were configured using ceph-deploy on the admin
> node.  In any case, the OSD's seem to be up and running, the health is ok.
>
> no ceph-disk@ services are running on any of the OSD nodes which I didn't
> notice before and each node was setup the exact same, yet there are
> different services listed under systemctl:
>
> OSD NODE 1:
> Output in earlier email
>
> OSD NODE 2:
>
> ● ceph-disk@dev-sdb1.service
>   loaded failed failedCeph disk activation: /dev/sdb1
>
> ● ceph-disk@dev-sdb2.service
>   loaded failed failedCeph disk activation: /dev/sdb2
>
> ● ceph-disk@dev-sdb5.service
>   loaded failed failedCeph disk activation: /dev/sdb5
>
> ● ceph-disk@dev-sdc2.service
>   loaded failed failedCeph disk activation: /dev/sdc2
>
> ● ceph-disk@dev-sdc4.service
>   loaded failed failedCeph disk activation: /dev/sdc4
>
>
> OSD NODE 3:
>
> ● ceph-disk@dev-sdb1.service
>   loaded failed failedCeph disk activation: /dev/sdb1
>
> ● ceph-disk@dev-sdb3.service
>   loaded failed failedCeph disk activation: /dev/sdb3
>
> ● ceph-disk@dev-sdb4.service
>   loaded failed failedCeph disk activation: /dev/sdb4
>
> ● ceph-disk@dev-sdb5.service
>   loaded failed failedCeph disk activation: /dev/sdb5
>
> ● ceph-disk@dev-sdc2.service
>   loaded failed failedCeph disk activation: /dev/sdc2
>
> ● ceph-disk@dev-sdc3.service
>   loaded failed failedCeph disk activation: /dev/sdc3
>
> ● ceph-disk@dev-sdc4.service
>   loaded failed failedCeph disk activation: /dev/sdc4
>
> From my understanding, the disks have already been activated... Should
> these services even be running or enabled?
>
> Mike
>
>
>
> On Tue, Nov 29, 2016 at 6:33 PM, Mike Jacobacci  wrote:
>
>> Sorry about that... Here is the output of ceph-disk list:
>>
>> ceph-disk list
>> /dev/dm-0 other, xfs, mounted on /
>> /dev/dm-1 swap, swap
>> /dev/dm-2 other, xfs, mounted on /home
>> /dev/sda :
>>  /dev/sda2 other, LVM2_member
>>  /dev/sda1 other, xfs, mounted on /boot
>> /dev/sdb :
>>  /dev/sdb1 ceph journal
>>  /dev/sdb2 ceph journal
>>  /dev/sdb3 ceph journal
>>  /dev/sdb4 ceph journal
>>  /dev/sdb5 ceph journal
>> /dev/sdc :
>>  /dev/sdc1 ceph journal
>>  /dev/sdc2 ceph journal
>>  /dev/sdc3 ceph journal
>>  /dev/sdc4 ceph journal
>>  /dev/sdc5 ceph journal
>> /dev/sdd :
>>  /dev/sdd1 ceph data, active, unknown cluster
>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.0
>> /dev/sde :
>>  /dev/sde1 ceph data, active, unknown cluster
>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.1
>> /dev/sdf :
>>  /dev/sdf1 ceph data, active, unknown cluster
>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.2
>> /dev/sdg :
>>  /dev/sdg1 ceph data, active, unknown cluster
>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.3
>> /dev/sdh :
>>  /dev/sdh1 ceph data, active, unknown cluster
>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.4
>> /dev/sdi :
>>  /dev/sdi1 ceph data, active, unknown cluster
>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.5
>> /dev/sdj :
>>  /dev/sdj1 ceph data, active, unknown cluster
>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.6
>> /dev/sdk :
>>  /dev/sdk1 ceph data, active, unknown cluster
>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.7
>> /dev/sdl :
>>  /dev/sdl1 ceph data, active, unknown cluster
>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.8
>> /dev/sdm :
>>  /dev/sdm1 ceph data, active, unknown cluster
>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.9
>>
>>
>>
>> On Tue, Nov 29, 2016 at 6:32 PM, Mike Jacobacci  wrote:
>>
>>> I forgot to add:
>>>
>>>
>>> On Tue, Nov 29, 2016 at 6:28 PM, Mike Jacobacci 
>>> wrote:
>>>
 So it looks like the journal partition is mounted:

 ls -lah /var/lib/ceph/osd/ceph-0/journal
 lrwxrwxrwx. 1 ceph ceph 9 Oct 10 16:11 /var/lib/ceph/osd/ceph-0/journal
 -> /dev/sdb1

 Here is the output of journalctl -xe when I try to start the
 ceph-diak@dev-sdb1 service:

 sh[17481]: mount_activate: Failed to activate
 sh[17481]: unmount: Unmounting /var/lib/ceph/tmp/mnt.m9ek7W
 sh[17481]: command_check_call: Running command: /bin/umount --
 /var/lib/ceph/tmp/mnt.m9ek7W
 sh[17481]: Traceback (most recent call last):
 sh[17481]: File "/usr/sbin/ceph-disk", line 9, in 
 sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts',
 'ceph-disk')()
 sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
 line 5011, in run
 sh[17481]: main(sys.argv[1:])
 sh[17481]: File 

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread Mike Jacobacci
Found some more info, but getting weird... All three OSD nodes shows the
same unknown cluster message on all the OSD disks.  I don't know where it
came from, all the nodes were configured using ceph-deploy on the admin
node.  In any case, the OSD's seem to be up and running, the health is ok.

no ceph-disk@ services are running on any of the OSD nodes which I didn't
notice before and each node was setup the exact same, yet there are
different services listed under systemctl:

OSD NODE 1:
Output in earlier email

OSD NODE 2:

● ceph-disk@dev-sdb1.service
loaded failed failedCeph disk activation: /dev/sdb1

● ceph-disk@dev-sdb2.service
loaded failed failedCeph disk activation: /dev/sdb2

● ceph-disk@dev-sdb5.service
loaded failed failedCeph disk activation: /dev/sdb5

● ceph-disk@dev-sdc2.service
loaded failed failedCeph disk activation: /dev/sdc2

● ceph-disk@dev-sdc4.service
loaded failed failedCeph disk activation: /dev/sdc4


OSD NODE 3:

● ceph-disk@dev-sdb1.service
loaded failed failedCeph disk activation: /dev/sdb1

● ceph-disk@dev-sdb3.service
loaded failed failedCeph disk activation: /dev/sdb3

● ceph-disk@dev-sdb4.service
loaded failed failedCeph disk activation: /dev/sdb4

● ceph-disk@dev-sdb5.service
loaded failed failedCeph disk activation: /dev/sdb5

● ceph-disk@dev-sdc2.service
loaded failed failedCeph disk activation: /dev/sdc2

● ceph-disk@dev-sdc3.service
loaded failed failedCeph disk activation: /dev/sdc3

● ceph-disk@dev-sdc4.service
loaded failed failedCeph disk activation: /dev/sdc4

>From my understanding, the disks have already been activated... Should
these services even be running or enabled?

Mike



On Tue, Nov 29, 2016 at 6:33 PM, Mike Jacobacci  wrote:

> Sorry about that... Here is the output of ceph-disk list:
>
> ceph-disk list
> /dev/dm-0 other, xfs, mounted on /
> /dev/dm-1 swap, swap
> /dev/dm-2 other, xfs, mounted on /home
> /dev/sda :
>  /dev/sda2 other, LVM2_member
>  /dev/sda1 other, xfs, mounted on /boot
> /dev/sdb :
>  /dev/sdb1 ceph journal
>  /dev/sdb2 ceph journal
>  /dev/sdb3 ceph journal
>  /dev/sdb4 ceph journal
>  /dev/sdb5 ceph journal
> /dev/sdc :
>  /dev/sdc1 ceph journal
>  /dev/sdc2 ceph journal
>  /dev/sdc3 ceph journal
>  /dev/sdc4 ceph journal
>  /dev/sdc5 ceph journal
> /dev/sdd :
>  /dev/sdd1 ceph data, active, unknown cluster 
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9,
> osd.0
> /dev/sde :
>  /dev/sde1 ceph data, active, unknown cluster 
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9,
> osd.1
> /dev/sdf :
>  /dev/sdf1 ceph data, active, unknown cluster 
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9,
> osd.2
> /dev/sdg :
>  /dev/sdg1 ceph data, active, unknown cluster 
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9,
> osd.3
> /dev/sdh :
>  /dev/sdh1 ceph data, active, unknown cluster 
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9,
> osd.4
> /dev/sdi :
>  /dev/sdi1 ceph data, active, unknown cluster 
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9,
> osd.5
> /dev/sdj :
>  /dev/sdj1 ceph data, active, unknown cluster 
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9,
> osd.6
> /dev/sdk :
>  /dev/sdk1 ceph data, active, unknown cluster 
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9,
> osd.7
> /dev/sdl :
>  /dev/sdl1 ceph data, active, unknown cluster 
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9,
> osd.8
> /dev/sdm :
>  /dev/sdm1 ceph data, active, unknown cluster 
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9,
> osd.9
>
>
>
> On Tue, Nov 29, 2016 at 6:32 PM, Mike Jacobacci  wrote:
>
>> I forgot to add:
>>
>>
>> On Tue, Nov 29, 2016 at 6:28 PM, Mike Jacobacci  wrote:
>>
>>> So it looks like the journal partition is mounted:
>>>
>>> ls -lah /var/lib/ceph/osd/ceph-0/journal
>>> lrwxrwxrwx. 1 ceph ceph 9 Oct 10 16:11 /var/lib/ceph/osd/ceph-0/journal
>>> -> /dev/sdb1
>>>
>>> Here is the output of journalctl -xe when I try to start the
>>> ceph-diak@dev-sdb1 service:
>>>
>>> sh[17481]: mount_activate: Failed to activate
>>> sh[17481]: unmount: Unmounting /var/lib/ceph/tmp/mnt.m9ek7W
>>> sh[17481]: command_check_call: Running command: /bin/umount --
>>> /var/lib/ceph/tmp/mnt.m9ek7W
>>> sh[17481]: Traceback (most recent call last):
>>> sh[17481]: File "/usr/sbin/ceph-disk", line 9, in 
>>> sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts',
>>> 'ceph-disk')()
>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>>> line 5011, in run
>>> sh[17481]: main(sys.argv[1:])
>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>>> line 4962, in main
>>> sh[17481]: args.func(args)
>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>>> line 4720, in 
>>> sh[17481]: func=lambda args: main_activate_space(name, args),
>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>>> line 3739, in main_activate_space
>>> sh[17481]: reactivate=args.reactivate,
>>> sh[17481]: File 

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread Mike Jacobacci
Sorry about that... Here is the output of ceph-disk list:

ceph-disk list
/dev/dm-0 other, xfs, mounted on /
/dev/dm-1 swap, swap
/dev/dm-2 other, xfs, mounted on /home
/dev/sda :
 /dev/sda2 other, LVM2_member
 /dev/sda1 other, xfs, mounted on /boot
/dev/sdb :
 /dev/sdb1 ceph journal
 /dev/sdb2 ceph journal
 /dev/sdb3 ceph journal
 /dev/sdb4 ceph journal
 /dev/sdb5 ceph journal
/dev/sdc :
 /dev/sdc1 ceph journal
 /dev/sdc2 ceph journal
 /dev/sdc3 ceph journal
 /dev/sdc4 ceph journal
 /dev/sdc5 ceph journal
/dev/sdd :
 /dev/sdd1 ceph data, active, unknown cluster
e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.0
/dev/sde :
 /dev/sde1 ceph data, active, unknown cluster
e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.1
/dev/sdf :
 /dev/sdf1 ceph data, active, unknown cluster
e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.2
/dev/sdg :
 /dev/sdg1 ceph data, active, unknown cluster
e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.3
/dev/sdh :
 /dev/sdh1 ceph data, active, unknown cluster
e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.4
/dev/sdi :
 /dev/sdi1 ceph data, active, unknown cluster
e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.5
/dev/sdj :
 /dev/sdj1 ceph data, active, unknown cluster
e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.6
/dev/sdk :
 /dev/sdk1 ceph data, active, unknown cluster
e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.7
/dev/sdl :
 /dev/sdl1 ceph data, active, unknown cluster
e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.8
/dev/sdm :
 /dev/sdm1 ceph data, active, unknown cluster
e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.9



On Tue, Nov 29, 2016 at 6:32 PM, Mike Jacobacci  wrote:

> I forgot to add:
>
>
> On Tue, Nov 29, 2016 at 6:28 PM, Mike Jacobacci  wrote:
>
>> So it looks like the journal partition is mounted:
>>
>> ls -lah /var/lib/ceph/osd/ceph-0/journal
>> lrwxrwxrwx. 1 ceph ceph 9 Oct 10 16:11 /var/lib/ceph/osd/ceph-0/journal
>> -> /dev/sdb1
>>
>> Here is the output of journalctl -xe when I try to start the
>> ceph-diak@dev-sdb1 service:
>>
>> sh[17481]: mount_activate: Failed to activate
>> sh[17481]: unmount: Unmounting /var/lib/ceph/tmp/mnt.m9ek7W
>> sh[17481]: command_check_call: Running command: /bin/umount --
>> /var/lib/ceph/tmp/mnt.m9ek7W
>> sh[17481]: Traceback (most recent call last):
>> sh[17481]: File "/usr/sbin/ceph-disk", line 9, in 
>> sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts',
>> 'ceph-disk')()
>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 5011, in run
>> sh[17481]: main(sys.argv[1:])
>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 4962, in main
>> sh[17481]: args.func(args)
>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 4720, in 
>> sh[17481]: func=lambda args: main_activate_space(name, args),
>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 3739, in main_activate_space
>> sh[17481]: reactivate=args.reactivate,
>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 3073, in mount_activate
>> sh[17481]: (osd_id, cluster) = activate(path, activate_key_template, init)
>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 3220, in activate
>> sh[17481]: ' with fsid %s' % ceph_fsid)
>> sh[17481]: ceph_disk.main.Error: Error: No cluster conf found in
>> /etc/ceph with fsid e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9
>> sh[17481]: Traceback (most recent call last):
>> sh[17481]: File "/usr/sbin/ceph-disk", line 9, in 
>> sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts',
>> 'ceph-disk')()
>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 5011, in run
>> sh[17481]: main(sys.argv[1:])
>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 4962, in main
>> sh[17481]: args.func(args)
>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 4399, in main_trigger
>> sh[17481]: raise Error('return code ' + str(ret))
>> sh[17481]: ceph_disk.main.Error: Error: return code 1
>> systemd[1]: ceph-disk@dev-sdb1.service: main process exited,
>> code=exited, status=1/FAILURE
>> systemd[1]: Failed to start Ceph disk activation: /dev/sdb1.
>>
>> I dont understand this error:
>> ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph with fsid
>> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9
>>
>> My fsid in ceph.conf is:
>> fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8
>>
>> I don't know why the fsid would change or be different. I thought I had a
>> basic cluster setup, I don't understand what's going wrong.
>>
>> Mike
>>
>> On Tue, Nov 29, 2016 at 5:15 PM, Mike Jacobacci  wrote:
>>
>>> Hi John,
>>>
>>> Thanks I wasn't sure if something happened to the journal partitions or
>>> not.
>>>
>>> Right now, the ceph-osd.0-9 services are back up and the cluster health
>>> is good, but none of the ceph-disk@dev-sd* services are running.   How
>>> can I get the Journal partitions mounted 

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread Mike Jacobacci
I forgot to add:


On Tue, Nov 29, 2016 at 6:28 PM, Mike Jacobacci  wrote:

> So it looks like the journal partition is mounted:
>
> ls -lah /var/lib/ceph/osd/ceph-0/journal
> lrwxrwxrwx. 1 ceph ceph 9 Oct 10 16:11 /var/lib/ceph/osd/ceph-0/journal
> -> /dev/sdb1
>
> Here is the output of journalctl -xe when I try to start the
> ceph-diak@dev-sdb1 service:
>
> sh[17481]: mount_activate: Failed to activate
> sh[17481]: unmount: Unmounting /var/lib/ceph/tmp/mnt.m9ek7W
> sh[17481]: command_check_call: Running command: /bin/umount --
> /var/lib/ceph/tmp/mnt.m9ek7W
> sh[17481]: Traceback (most recent call last):
> sh[17481]: File "/usr/sbin/ceph-disk", line 9, in 
> sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts',
> 'ceph-disk')()
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 5011, in run
> sh[17481]: main(sys.argv[1:])
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 4962, in main
> sh[17481]: args.func(args)
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 4720, in 
> sh[17481]: func=lambda args: main_activate_space(name, args),
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 3739, in main_activate_space
> sh[17481]: reactivate=args.reactivate,
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 3073, in mount_activate
> sh[17481]: (osd_id, cluster) = activate(path, activate_key_template, init)
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 3220, in activate
> sh[17481]: ' with fsid %s' % ceph_fsid)
> sh[17481]: ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph
> with fsid e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9
> sh[17481]: Traceback (most recent call last):
> sh[17481]: File "/usr/sbin/ceph-disk", line 9, in 
> sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts',
> 'ceph-disk')()
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 5011, in run
> sh[17481]: main(sys.argv[1:])
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 4962, in main
> sh[17481]: args.func(args)
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 4399, in main_trigger
> sh[17481]: raise Error('return code ' + str(ret))
> sh[17481]: ceph_disk.main.Error: Error: return code 1
> systemd[1]: ceph-disk@dev-sdb1.service: main process exited, code=exited,
> status=1/FAILURE
> systemd[1]: Failed to start Ceph disk activation: /dev/sdb1.
>
> I dont understand this error:
> ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph with fsid
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9
>
> My fsid in ceph.conf is:
> fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8
>
> I don't know why the fsid would change or be different. I thought I had a
> basic cluster setup, I don't understand what's going wrong.
>
> Mike
>
> On Tue, Nov 29, 2016 at 5:15 PM, Mike Jacobacci  wrote:
>
>> Hi John,
>>
>> Thanks I wasn't sure if something happened to the journal partitions or
>> not.
>>
>> Right now, the ceph-osd.0-9 services are back up and the cluster health
>> is good, but none of the ceph-disk@dev-sd* services are running.   How
>> can I get the Journal partitions mounted again?
>>
>> Cheers,
>> Mike
>>
>> On Tue, Nov 29, 2016 at 4:30 PM, John Petrini 
>> wrote:
>>
>>> Also, don't run sgdisk again; that's just for creating the journal
>>> partitions. ceph-disk is a service used for prepping disks, only the OSD
>>> services need to be running as far as I know. Are the ceph-osd@x.
>>> services running now that you've mounted the disks?
>>>
>>> ___
>>>
>>> John Petrini
>>>
>>> NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
>>>//   [image: Twitter]    [image:
>>> LinkedIn]    [image: Google Plus]
>>>    [image: Blog]
>>> 
>>> Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
>>> *P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
>>> jpetr...@coredial.com
>>>
>>> [image: Exceptional people. Proven Processes. Innovative Technology.
>>> Discover CoreDial - watch our video]
>>> 
>>>
>>> The information transmitted is intended only for the person or entity to
>>> which it is addressed and may contain confidential and/or privileged
>>> material. Any review, retransmission,  dissemination or other use of, or
>>> taking of any action in reliance upon, this information by persons or
>>> entities other than the intended recipient is prohibited. If you received
>>> this in error, please contact the sender and delete the material from any
>>> computer.
>>>
>>> On Tue, Nov 29, 2016 at 7:27 PM, John Petrini 
>>> 

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread Mike Jacobacci
So it looks like the journal partition is mounted:

ls -lah /var/lib/ceph/osd/ceph-0/journal
lrwxrwxrwx. 1 ceph ceph 9 Oct 10 16:11 /var/lib/ceph/osd/ceph-0/journal ->
/dev/sdb1

Here is the output of journalctl -xe when I try to start the
ceph-diak@dev-sdb1 service:

sh[17481]: mount_activate: Failed to activate
sh[17481]: unmount: Unmounting /var/lib/ceph/tmp/mnt.m9ek7W
sh[17481]: command_check_call: Running command: /bin/umount --
/var/lib/ceph/tmp/mnt.m9ek7W
sh[17481]: Traceback (most recent call last):
sh[17481]: File "/usr/sbin/ceph-disk", line 9, in 
sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts',
'ceph-disk')()
sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line
5011, in run
sh[17481]: main(sys.argv[1:])
sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line
4962, in main
sh[17481]: args.func(args)
sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line
4720, in 
sh[17481]: func=lambda args: main_activate_space(name, args),
sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line
3739, in main_activate_space
sh[17481]: reactivate=args.reactivate,
sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line
3073, in mount_activate
sh[17481]: (osd_id, cluster) = activate(path, activate_key_template, init)
sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line
3220, in activate
sh[17481]: ' with fsid %s' % ceph_fsid)
sh[17481]: ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph
with fsid e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9
sh[17481]: Traceback (most recent call last):
sh[17481]: File "/usr/sbin/ceph-disk", line 9, in 
sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts',
'ceph-disk')()
sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line
5011, in run
sh[17481]: main(sys.argv[1:])
sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line
4962, in main
sh[17481]: args.func(args)
sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line
4399, in main_trigger
sh[17481]: raise Error('return code ' + str(ret))
sh[17481]: ceph_disk.main.Error: Error: return code 1
systemd[1]: ceph-disk@dev-sdb1.service: main process exited, code=exited,
status=1/FAILURE
systemd[1]: Failed to start Ceph disk activation: /dev/sdb1.

I dont understand this error:
ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph with fsid
e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9

My fsid in ceph.conf is:
fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8

I don't know why the fsid would change or be different. I thought I had a
basic cluster setup, I don't understand what's going wrong.

Mike

On Tue, Nov 29, 2016 at 5:15 PM, Mike Jacobacci  wrote:

> Hi John,
>
> Thanks I wasn't sure if something happened to the journal partitions or
> not.
>
> Right now, the ceph-osd.0-9 services are back up and the cluster health is
> good, but none of the ceph-disk@dev-sd* services are running.   How can I
> get the Journal partitions mounted again?
>
> Cheers,
> Mike
>
> On Tue, Nov 29, 2016 at 4:30 PM, John Petrini 
> wrote:
>
>> Also, don't run sgdisk again; that's just for creating the journal
>> partitions. ceph-disk is a service used for prepping disks, only the OSD
>> services need to be running as far as I know. Are the ceph-osd@x.
>> services running now that you've mounted the disks?
>>
>> ___
>>
>> John Petrini
>>
>> NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
>>//   [image: Twitter]    [image:
>> LinkedIn]    [image: Google Plus]
>>    [image: Blog]
>> 
>> Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
>> *P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
>> jpetr...@coredial.com
>>
>> [image: Exceptional people. Proven Processes. Innovative Technology.
>> Discover CoreDial - watch our video]
>> 
>>
>> The information transmitted is intended only for the person or entity to
>> which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission,  dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipient is prohibited. If you received
>> this in error, please contact the sender and delete the material from any
>> computer.
>>
>> On Tue, Nov 29, 2016 at 7:27 PM, John Petrini 
>> wrote:
>>
>>> What command are you using to start your OSD's?
>>>
>>> ___
>>>
>>> John Petrini
>>>
>>> NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
>>>//   [image: Twitter]    [image:
>>> LinkedIn] 

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread Mike Jacobacci
Hi John,

Thanks I wasn't sure if something happened to the journal partitions or
not.

Right now, the ceph-osd.0-9 services are back up and the cluster health is
good, but none of the ceph-disk@dev-sd* services are running.   How can I
get the Journal partitions mounted again?

Cheers,
Mike

On Tue, Nov 29, 2016 at 4:30 PM, John Petrini  wrote:

> Also, don't run sgdisk again; that's just for creating the journal
> partitions. ceph-disk is a service used for prepping disks, only the OSD
> services need to be running as far as I know. Are the ceph-osd@x.
> services running now that you've mounted the disks?
>
> ___
>
> John Petrini
>
> NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
>//   [image: Twitter]    [image:
> LinkedIn]    [image: Google Plus]
>    [image: Blog]
> 
> Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
> *P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
> jpetr...@coredial.com
>
> [image: Exceptional people. Proven Processes. Innovative Technology.
> Discover CoreDial - watch our video]
> 
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission,  dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>
> On Tue, Nov 29, 2016 at 7:27 PM, John Petrini 
> wrote:
>
>> What command are you using to start your OSD's?
>>
>> ___
>>
>> John Petrini
>>
>> NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
>>//   [image: Twitter]    [image:
>> LinkedIn]    [image: Google Plus]
>>    [image: Blog]
>> 
>> Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
>> *P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
>> jpetr...@coredial.com
>>
>> [image: Exceptional people. Proven Processes. Innovative Technology.
>> Discover CoreDial - watch our video]
>> 
>>
>> The information transmitted is intended only for the person or entity to
>> which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission,  dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipient is prohibited. If you received
>> this in error, please contact the sender and delete the material from any
>> computer.
>>
>> On Tue, Nov 29, 2016 at 7:19 PM, Mike Jacobacci  wrote:
>>
>>> I was able to bring the osd's up by looking at my other OSD node which
>>> is the exact same hardware/disks and finding out which disks map.  But I
>>> still cant bring up any of the start ceph-disk@dev-sd* services... When
>>> I first installed the cluster and got the OSD's up, I had to run the
>>> following:
>>>
>>> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>>
>>> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>>
>>> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>>
>>> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>>
>>> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>>
>>> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>>
>>> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>>
>>> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>>
>>> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>>
>>> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>>
>>>
>>> Do i need to run that again?
>>>
>>>
>>> Cheers,
>>>
>>> Mike
>>>
>>> On Tue, Nov 29, 2016 at 4:13 PM, Sean Redmond 
>>> wrote:
>>>
 Normally they mount based upon the gpt label, if it's not working you
 can mount the disk under /mnt and then cat the file called whoami to find
 out the osd number

 On 29 Nov 2016 23:56, "Mike Jacobacci"  wrote:

> OK I am in some trouble now and would love some help!  After updating
> none of the OSDs on the node will come back up:
>
> ● ceph-disk@dev-sdb1.service
>  loaded failed failedCeph disk activation: /dev/sdb1
> ● ceph-disk@dev-sdb2.service
>  loaded failed failedCeph disk activation: /dev/sdb2
> ● 

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread John Petrini
Also, don't run sgdisk again; that's just for creating the journal
partitions. ceph-disk is a service used for prepping disks, only the OSD
services need to be running as far as I know. Are the ceph-osd@x. services
running now that you've mounted the disks?

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter]    [image: LinkedIn]
   [image: Google Plus]
   [image: Blog]

Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]


The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Tue, Nov 29, 2016 at 7:27 PM, John Petrini  wrote:

> What command are you using to start your OSD's?
>
> ___
>
> John Petrini
>
> NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
>//   [image: Twitter]    [image:
> LinkedIn]    [image: Google Plus]
>    [image: Blog]
> 
> Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
> *P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
> jpetr...@coredial.com
>
> [image: Exceptional people. Proven Processes. Innovative Technology.
> Discover CoreDial - watch our video]
> 
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission,  dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>
> On Tue, Nov 29, 2016 at 7:19 PM, Mike Jacobacci  wrote:
>
>> I was able to bring the osd's up by looking at my other OSD node which is
>> the exact same hardware/disks and finding out which disks map.  But I still
>> cant bring up any of the start ceph-disk@dev-sd* services... When I
>> first installed the cluster and got the OSD's up, I had to run the
>> following:
>>
>> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>
>> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>
>> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>
>> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>
>> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>
>> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>
>> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>
>> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>
>> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>
>> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>
>>
>> Do i need to run that again?
>>
>>
>> Cheers,
>>
>> Mike
>>
>> On Tue, Nov 29, 2016 at 4:13 PM, Sean Redmond 
>> wrote:
>>
>>> Normally they mount based upon the gpt label, if it's not working you
>>> can mount the disk under /mnt and then cat the file called whoami to find
>>> out the osd number
>>>
>>> On 29 Nov 2016 23:56, "Mike Jacobacci"  wrote:
>>>
 OK I am in some trouble now and would love some help!  After updating
 none of the OSDs on the node will come back up:

 ● ceph-disk@dev-sdb1.service
loaded failed failedCeph disk activation: /dev/sdb1
 ● ceph-disk@dev-sdb2.service
loaded failed failedCeph disk activation: /dev/sdb2
 ● ceph-disk@dev-sdb3.service
loaded failed failedCeph disk activation: /dev/sdb3
 ● ceph-disk@dev-sdb4.service
loaded failed failedCeph disk activation: /dev/sdb4
 ● ceph-disk@dev-sdb5.service
loaded failed failedCeph disk activation: /dev/sdb5
 ● ceph-disk@dev-sdc1.service
loaded failed failedCeph disk activation: /dev/sdc1
 ● ceph-disk@dev-sdc2.service
loaded failed failedCeph disk activation: /dev/sdc2
 ● ceph-disk@dev-sdc3.service
loaded failed 

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread John Petrini
What command are you using to start your OSD's?

___

John Petrini

NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
//   [image:
Twitter]    [image: LinkedIn]
   [image: Google Plus]
   [image: Blog]

Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
*P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
jpetr...@coredial.com

[image: Exceptional people. Proven Processes. Innovative Technology.
Discover CoreDial - watch our video]


The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission,  dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.

On Tue, Nov 29, 2016 at 7:19 PM, Mike Jacobacci  wrote:

> I was able to bring the osd's up by looking at my other OSD node which is
> the exact same hardware/disks and finding out which disks map.  But I still
> cant bring up any of the start ceph-disk@dev-sd* services... When I first
> installed the cluster and got the OSD's up, I had to run the following:
>
> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>
> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>
> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>
> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>
> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>
> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>
> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>
> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>
> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>
> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>
>
> Do i need to run that again?
>
>
> Cheers,
>
> Mike
>
> On Tue, Nov 29, 2016 at 4:13 PM, Sean Redmond 
> wrote:
>
>> Normally they mount based upon the gpt label, if it's not working you can
>> mount the disk under /mnt and then cat the file called whoami to find out
>> the osd number
>>
>> On 29 Nov 2016 23:56, "Mike Jacobacci"  wrote:
>>
>>> OK I am in some trouble now and would love some help!  After updating
>>> none of the OSDs on the node will come back up:
>>>
>>> ● ceph-disk@dev-sdb1.service
>>>loaded failed failedCeph disk activation: /dev/sdb1
>>> ● ceph-disk@dev-sdb2.service
>>>loaded failed failedCeph disk activation: /dev/sdb2
>>> ● ceph-disk@dev-sdb3.service
>>>loaded failed failedCeph disk activation: /dev/sdb3
>>> ● ceph-disk@dev-sdb4.service
>>>loaded failed failedCeph disk activation: /dev/sdb4
>>> ● ceph-disk@dev-sdb5.service
>>>loaded failed failedCeph disk activation: /dev/sdb5
>>> ● ceph-disk@dev-sdc1.service
>>>loaded failed failedCeph disk activation: /dev/sdc1
>>> ● ceph-disk@dev-sdc2.service
>>>loaded failed failedCeph disk activation: /dev/sdc2
>>> ● ceph-disk@dev-sdc3.service
>>>loaded failed failedCeph disk activation: /dev/sdc3
>>> ● ceph-disk@dev-sdc4.service
>>>loaded failed failedCeph disk activation: /dev/sdc4
>>> ● ceph-disk@dev-sdc5.service
>>>loaded failed failedCeph disk activation: /dev/sdc5
>>> ● ceph-disk@dev-sdd1.service
>>>loaded failed failedCeph disk activation: /dev/sdd1
>>> ● ceph-disk@dev-sde1.service
>>>loaded failed failedCeph disk activation: /dev/sde1
>>> ● ceph-disk@dev-sdf1.service
>>>loaded failed failedCeph disk activation: /dev/sdf1
>>> ● ceph-disk@dev-sdg1.service
>>>loaded failed failedCeph disk activation: /dev/sdg1
>>> ● ceph-disk@dev-sdh1.service
>>>loaded failed failedCeph disk activation: /dev/sdh1
>>> ● ceph-disk@dev-sdi1.service
>>>loaded failed failedCeph disk activation: /dev/sdi1
>>> ● ceph-disk@dev-sdj1.service
>>>loaded failed failedCeph disk activation: /dev/sdj1
>>> ● ceph-disk@dev-sdk1.service
>>>loaded failed failedCeph disk activation: /dev/sdk1
>>> ● ceph-disk@dev-sdl1.service
>>>loaded failed failedCeph disk activation: /dev/sdl1
>>> ● ceph-disk@dev-sdm1.service
>>>loaded failed failedCeph disk activation: /dev/sdm1
>>> ● ceph-osd@0.service
>>>loaded failed failedCeph object storage daemon
>>> ● ceph-osd@1.service
>>>loaded failed failedCeph object storage daemon
>>> ● ceph-osd@2.service
>>>loaded failed failedCeph object storage daemon
>>> ● ceph-osd@3.service
>>>loaded failed failedCeph object storage daemon
>>> ● ceph-osd@4.service
>>>loaded 

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread Mike Jacobacci
I was able to bring the osd's up by looking at my other OSD node which is
the exact same hardware/disks and finding out which disks map.  But I still
cant bring up any of the start ceph-disk@dev-sd* services... When I first
installed the cluster and got the OSD's up, I had to run the following:

# sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb

# sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb

# sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb

# sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb

# sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb

# sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc

# sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc

# sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc

# sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc

# sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc


Do i need to run that again?


Cheers,

Mike

On Tue, Nov 29, 2016 at 4:13 PM, Sean Redmond 
wrote:

> Normally they mount based upon the gpt label, if it's not working you can
> mount the disk under /mnt and then cat the file called whoami to find out
> the osd number
>
> On 29 Nov 2016 23:56, "Mike Jacobacci"  wrote:
>
>> OK I am in some trouble now and would love some help!  After updating
>> none of the OSDs on the node will come back up:
>>
>> ● ceph-disk@dev-sdb1.service
>>  loaded failed failedCeph disk activation: /dev/sdb1
>> ● ceph-disk@dev-sdb2.service
>>  loaded failed failedCeph disk activation: /dev/sdb2
>> ● ceph-disk@dev-sdb3.service
>>  loaded failed failedCeph disk activation: /dev/sdb3
>> ● ceph-disk@dev-sdb4.service
>>  loaded failed failedCeph disk activation: /dev/sdb4
>> ● ceph-disk@dev-sdb5.service
>>  loaded failed failedCeph disk activation: /dev/sdb5
>> ● ceph-disk@dev-sdc1.service
>>  loaded failed failedCeph disk activation: /dev/sdc1
>> ● ceph-disk@dev-sdc2.service
>>  loaded failed failedCeph disk activation: /dev/sdc2
>> ● ceph-disk@dev-sdc3.service
>>  loaded failed failedCeph disk activation: /dev/sdc3
>> ● ceph-disk@dev-sdc4.service
>>  loaded failed failedCeph disk activation: /dev/sdc4
>> ● ceph-disk@dev-sdc5.service
>>  loaded failed failedCeph disk activation: /dev/sdc5
>> ● ceph-disk@dev-sdd1.service
>>  loaded failed failedCeph disk activation: /dev/sdd1
>> ● ceph-disk@dev-sde1.service
>>  loaded failed failedCeph disk activation: /dev/sde1
>> ● ceph-disk@dev-sdf1.service
>>  loaded failed failedCeph disk activation: /dev/sdf1
>> ● ceph-disk@dev-sdg1.service
>>  loaded failed failedCeph disk activation: /dev/sdg1
>> ● ceph-disk@dev-sdh1.service
>>  loaded failed failedCeph disk activation: /dev/sdh1
>> ● ceph-disk@dev-sdi1.service
>>  loaded failed failedCeph disk activation: /dev/sdi1
>> ● ceph-disk@dev-sdj1.service
>>  loaded failed failedCeph disk activation: /dev/sdj1
>> ● ceph-disk@dev-sdk1.service
>>  loaded failed failedCeph disk activation: /dev/sdk1
>> ● ceph-disk@dev-sdl1.service
>>  loaded failed failedCeph disk activation: /dev/sdl1
>> ● ceph-disk@dev-sdm1.service
>>  loaded failed failedCeph disk activation: /dev/sdm1
>> ● ceph-osd@0.service
>>  loaded failed failedCeph object storage daemon
>> ● ceph-osd@1.service
>>  loaded failed failedCeph object storage daemon
>> ● ceph-osd@2.service
>>  loaded failed failedCeph object storage daemon
>> ● ceph-osd@3.service
>>  loaded failed failedCeph object storage daemon
>> ● ceph-osd@4.service
>>  loaded failed failedCeph object storage daemon
>> ● ceph-osd@5.service
>>  loaded failed failedCeph object storage daemon
>> ● ceph-osd@6.service
>>  loaded failed failedCeph object storage daemon
>> ● ceph-osd@7.service
>>  loaded failed failedCeph object storage daemon
>> ● ceph-osd@8.service
>>  loaded failed failedCeph object storage daemon
>> ● ceph-osd@9.service
>>  loaded failed failedCeph object storage daemon
>>
>> I did some searching and saw that the issue is that the disks aren't
>> mounting... My question is how can I mount them correctly again (note sdb
>> and sdc are ssd for cache)? I am not sure which disk maps to ceph-osd@0
>> and so on.  Also, can I add them to /etc/fstab to work around?
>>
>> Cheers,
>> Mike
>>
>> On Tue, Nov 29, 2016 at 10:41 AM, Mike Jacobacci 
>> wrote:
>>
>>> Hello,
>>>
>>> I would like to install OS updates on the ceph cluster and activate a
>>> second 10gb port on the OSD nodes, so I wanted to verify the correct steps
>>> to perform maintenance on the cluster.  We are only using rbd to back our
>>> xenserver vm's at this point, and our cluster consists of 3 OSD nodes, 3
>>> Mon nodes and 1 admin node...  So would this be the correct steps:
>>>
>>> 1. Shut down VM's?
>>> 2. run "ceph osd set noout" on admin node
>>> 3. install updates on each monitoring node and reboot one at a time.
>>> 4. install updates on 

Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread Mike Jacobacci
OK I am in some trouble now and would love some help!  After updating none
of the OSDs on the node will come back up:

● ceph-disk@dev-sdb1.service
 loaded failed failedCeph disk activation: /dev/sdb1
● ceph-disk@dev-sdb2.service
 loaded failed failedCeph disk activation: /dev/sdb2
● ceph-disk@dev-sdb3.service
 loaded failed failedCeph disk activation: /dev/sdb3
● ceph-disk@dev-sdb4.service
 loaded failed failedCeph disk activation: /dev/sdb4
● ceph-disk@dev-sdb5.service
 loaded failed failedCeph disk activation: /dev/sdb5
● ceph-disk@dev-sdc1.service
 loaded failed failedCeph disk activation: /dev/sdc1
● ceph-disk@dev-sdc2.service
 loaded failed failedCeph disk activation: /dev/sdc2
● ceph-disk@dev-sdc3.service
 loaded failed failedCeph disk activation: /dev/sdc3
● ceph-disk@dev-sdc4.service
 loaded failed failedCeph disk activation: /dev/sdc4
● ceph-disk@dev-sdc5.service
 loaded failed failedCeph disk activation: /dev/sdc5
● ceph-disk@dev-sdd1.service
 loaded failed failedCeph disk activation: /dev/sdd1
● ceph-disk@dev-sde1.service
 loaded failed failedCeph disk activation: /dev/sde1
● ceph-disk@dev-sdf1.service
 loaded failed failedCeph disk activation: /dev/sdf1
● ceph-disk@dev-sdg1.service
 loaded failed failedCeph disk activation: /dev/sdg1
● ceph-disk@dev-sdh1.service
 loaded failed failedCeph disk activation: /dev/sdh1
● ceph-disk@dev-sdi1.service
 loaded failed failedCeph disk activation: /dev/sdi1
● ceph-disk@dev-sdj1.service
 loaded failed failedCeph disk activation: /dev/sdj1
● ceph-disk@dev-sdk1.service
 loaded failed failedCeph disk activation: /dev/sdk1
● ceph-disk@dev-sdl1.service
 loaded failed failedCeph disk activation: /dev/sdl1
● ceph-disk@dev-sdm1.service
 loaded failed failedCeph disk activation: /dev/sdm1
● ceph-osd@0.service
 loaded failed failedCeph object storage daemon
● ceph-osd@1.service
 loaded failed failedCeph object storage daemon
● ceph-osd@2.service
 loaded failed failedCeph object storage daemon
● ceph-osd@3.service
 loaded failed failedCeph object storage daemon
● ceph-osd@4.service
 loaded failed failedCeph object storage daemon
● ceph-osd@5.service
 loaded failed failedCeph object storage daemon
● ceph-osd@6.service
 loaded failed failedCeph object storage daemon
● ceph-osd@7.service
 loaded failed failedCeph object storage daemon
● ceph-osd@8.service
 loaded failed failedCeph object storage daemon
● ceph-osd@9.service
 loaded failed failedCeph object storage daemon

I did some searching and saw that the issue is that the disks aren't
mounting... My question is how can I mount them correctly again (note sdb
and sdc are ssd for cache)? I am not sure which disk maps to ceph-osd@0 and
so on.  Also, can I add them to /etc/fstab to work around?

Cheers,
Mike

On Tue, Nov 29, 2016 at 10:41 AM, Mike Jacobacci  wrote:

> Hello,
>
> I would like to install OS updates on the ceph cluster and activate a
> second 10gb port on the OSD nodes, so I wanted to verify the correct steps
> to perform maintenance on the cluster.  We are only using rbd to back our
> xenserver vm's at this point, and our cluster consists of 3 OSD nodes, 3
> Mon nodes and 1 admin node...  So would this be the correct steps:
>
> 1. Shut down VM's?
> 2. run "ceph osd set noout" on admin node
> 3. install updates on each monitoring node and reboot one at a time.
> 4. install updates on OSD nodes and activate second 10gb port, reboot one
> OSD node at a time
> 5. once all nodes back up, run "ceph osd unset noout"
> 6. bring VM's back online
>
> Does this sound correct?
>
>
> Cheers,
> Mike
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Maintenance

2016-11-29 Thread David Turner
Everything is correct except for shutting down the VM's.  There is no need for 
downtime during this upgrade.  As long as your cluster comes back to health_ok 
(or just showing that the noout flag is set and nothing else), then you are 
free to move on to the next node.



[cid:imagea428dd.JPG@82c9251a.418f8338]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Mike 
Jacobacci [mi...@flowjo.com]
Sent: Tuesday, November 29, 2016 11:41 AM
To: ceph-users
Subject: [ceph-users] Ceph Maintenance

Hello,

I would like to install OS updates on the ceph cluster and activate a second 
10gb port on the OSD nodes, so I wanted to verify the correct steps to perform 
maintenance on the cluster.  We are only using rbd to back our xenserver vm's 
at this point, and our cluster consists of 3 OSD nodes, 3 Mon nodes and 1 admin 
node...  So would this be the correct steps:

1. Shut down VM's?
2. run "ceph osd set noout" on admin node
3. install updates on each monitoring node and reboot one at a time.
4. install updates on OSD nodes and activate second 10gb port, reboot one OSD 
node at a time
5. once all nodes back up, run "ceph osd unset noout"
6. bring VM's back online

Does this sound correct?


Cheers,
Mike

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com