Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-19 Thread Simon Leinen
cephmailinglist  writes:
> e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> [...]

> [...] Also at that time one of our pools got a lot of extra data,
> those files where stored with root permissions since we did not
> restarted the Ceph daemons yet, the 'find' in step e found so much
> files that xargs (the shell) could not handle it (too many arguments).

I've always found it disappointing that xargs behaves like this on many
GNU/Linux distributions.  I always thought xargs's main purpose in life
was to know how many arguments can safely be passed to a process...

Anyway, you should be able to limit the number of arguments per
invocation by adding something like "-n 100" to the xargs command line.

Thanks for sharing your upgrade experiences!
-- 
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-14 Thread George Mihaiescu
Hi,

We initially upgraded from Hammer to Jewel while keeping the ownership
unchanged, by adding  "setuser match path =
/var/lib/ceph/$type/$cluster-$id" in ceph.conf


Later, we used the following steps to change from running as root to
running as ceph.

On the storage nodes, we ran the following command that doesn't change
permissions, but caches the filesystem (based on
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-November/006013.html
)

find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -print | xargs -P12 -n1
chown -R root:root

Set noout:
ceph osd set noout

On Storage node:
Edited "/etc/ceph/ceph.conf" and commented out #setuser match path =
/var/lib/ceph/$type/$cluster-$id
stop ceph-osd-all
find /var/lib/ceph/osd -maxdepth 1 -mindepth 1 -print | xargs -P12 -n1
chown -R ceph:ceph
chown -R ceph:ceph /var/lib/ceph/
start ceph-osd-all

Check that all the Ceph OSD processes are running:
ps aux | grep ceph | egrep –v grep

Unset "noout":
ceph osd unset noout

Wait till ceph is healthy again and continue with the next storage node.

The OSDs were down for about 2 min because we ran the find command before
hand and used xargs with 12 parallel processes, so recovery time was quick
as well.

We have more than 850 OSDs and the entire process went pretty smooth by
doing one storage server at a time.



On Tue, Mar 14, 2017 at 3:27 AM, Richard Arends 
wrote:

> On 03/13/2017 02:02 PM, Christoph Adomeit wrote:
>
> Christoph,
>
> Thanks for the detailed upgrade report.
>>
>> We have another scenario: We have allready upgraded to jewel 10.2.6 but
>> we are still running all our monitors and osd daemons as root using the
>> setuser match path directive.
>>
>> What would be the recommended way to have all daemons running as
>> ceph:ceph user ?
>>
>> Could we chown -R the monitor and osd data directories under
>> /var/lib/ceph one by one while keeping up service ?
>>
>
> Yes. To minimize the down time, you can do the chown twice. Once before
> restarting the daemons, while they are running with root user permissions.
> Then stop the daemons, do the chown again, but then only on the changed
> files (find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph)
> and start the Ceph daemons with setuser and setgroup set to ceph
>
>
>
> --
> With regards,
>
> Richard Arends.
> Snow BV / http://snow.nl
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-14 Thread Richard Arends

On 03/13/2017 02:02 PM, Christoph Adomeit wrote:

Christoph,


Thanks for the detailed upgrade report.

We have another scenario: We have allready upgraded to jewel 10.2.6 but
we are still running all our monitors and osd daemons as root using the
setuser match path directive.

What would be the recommended way to have all daemons running as ceph:ceph user 
?

Could we chown -R the monitor and osd data directories under /var/lib/ceph one 
by one while keeping up service ?


Yes. To minimize the down time, you can do the chown twice. Once before 
restarting the daemons, while they are running with root user 
permissions. Then stop the daemons, do the chown again, but then only on 
the changed files (find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  
chown ceph:ceph) and start the Ceph daemons with setuser and setgroup 
set to ceph




--
With regards,

Richard Arends.
Snow BV / http://snow.nl

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-14 Thread Richard Arends

On 03/12/2017 07:54 PM, Florian Haas wrote:

Florian,



For others following this thread who still have the hammer→jewel upgrade
ahead: there is a ceph.conf option you can use here; no need to fiddle
with the upstart scripts.

setuser match path = /var/lib/ceph/$type/$cluster-$id

Ah, i did not know this option. Good tip!


--
With regards,

Richard Arends.
Snow BV / http://snow.nl

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-13 Thread Christoph Adomeit
Thanks for the detailed upgrade report.

We have another scenario: We have allready upgraded to jewel 10.2.6 but 
we are still running all our monitors and osd daemons as root using the
setuser match path directive. 

What would be the recommended way to have all daemons running as ceph:ceph user 
?

Could we chown -R the monitor and osd data directories under /var/lib/ceph one 
by one while keeping up service ?

Thanks
  Christoph

On Sat, Mar 11, 2017 at 12:21:38PM +0100, cephmailingl...@mosibi.nl wrote:
> Hello list,
> 
> A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this
> email we want to share our experiences.
> 
-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-13 Thread Piotr Dałek

On 03/13/2017 11:07 AM, Dan van der Ster wrote:

On Sat, Mar 11, 2017 at 12:21 PM,  wrote:


The next and biggest problem we encountered had to do with the CRC errors on 
the OSD map. On every map update, the OSDs that were not upgraded yet, got that 
CRC error and asked the monitor for a full OSD map instead of just a delta 
update. At first we did not understand what exactly happened, we ran the 
upgrade per node using a script and in that script we watch the state of the 
cluster and when the cluster is healthy again, we upgrade the next host. Every 
time we started the script (skipping the already upgraded hosts) the first 
host(s) upgraded without issues and then we got blocked I/O on the cluster. The 
blocked I/O went away within a minute of 2 (not measured). After investigation 
we found out that the blocked I/O happened when nodes where asking the monitor 
for a (full) OSD map and that resulted shortly in a full saturated network link 
on our monitor.



Thanks for the detailed upgrade report. I wanted to zoom in on this
CRC/fullmap issue because it could be quite disruptive for us when we
upgrade from hammer to jewel.

I've read various reports that the fool proof way to avoid the full
map DoS would be to upgrade all OSDs to jewel before the mon's.
Did anyone have success with that workaround? I'm cc'ing Bryan because
he knows this issue very well.


With https://github.com/ceph/ceph/pull/13131 merged into 10.2.6, this issue 
shouldn't be a problem (at least we don't see it anymore).


--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-13 Thread Dan van der Ster
On Sat, Mar 11, 2017 at 12:21 PM,  wrote:
>
> The next and biggest problem we encountered had to do with the CRC errors on 
> the OSD map. On every map update, the OSDs that were not upgraded yet, got 
> that CRC error and asked the monitor for a full OSD map instead of just a 
> delta update. At first we did not understand what exactly happened, we ran 
> the upgrade per node using a script and in that script we watch the state of 
> the cluster and when the cluster is healthy again, we upgrade the next host. 
> Every time we started the script (skipping the already upgraded hosts) the 
> first host(s) upgraded without issues and then we got blocked I/O on the 
> cluster. The blocked I/O went away within a minute of 2 (not measured). After 
> investigation we found out that the blocked I/O happened when nodes where 
> asking the monitor for a (full) OSD map and that resulted shortly in a full 
> saturated network link on our monitor.


Thanks for the detailed upgrade report. I wanted to zoom in on this
CRC/fullmap issue because it could be quite disruptive for us when we
upgrade from hammer to jewel.

I've read various reports that the fool proof way to avoid the full
map DoS would be to upgrade all OSDs to jewel before the mon's.
Did anyone have success with that workaround? I'm cc'ing Bryan because
he knows this issue very well.

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Christian Balzer

Hello,

On Sun, 12 Mar 2017 19:54:10 +0100 Florian Haas wrote:

> On Sat, Mar 11, 2017 at 12:21 PM,  wrote:
> > The upgrade of our biggest cluster, nr 4, did not go without
> > problems. Since we where expecting a lot of "failed to encode map
> > e with expected crc" messages, we disabled clog to monitors
> > with 'ceph tell osd.* injectargs -- --clog_to_monitors=false' so our
> > monitors would not choke in those messages. The upgrade of the
> > monitors did go as expected, without any problem, the problems
> > started when we started the upgrade of the OSDs. In the upgrade
> > procedure, we had to change the ownership of the files from root to
> > the user ceph and that process was taking so long on our cluster that
> > completing the upgrade would take more then a week. We decided to
> > keep the permissions as they where for now, so in the upstart init
> > script /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup
> > ceph' to  '--setuser root --setgroup root' and fix that OSD by OSD
> > after the upgrade was completely done  
> 
> For others following this thread who still have the hammer→jewel upgrade
> ahead: there is a ceph.conf option you can use here; no need to fiddle
> with the upstart scripts.
> 
> setuser match path = /var/lib/ceph/$type/$cluster-$id
>

Yes, I was thinking about mentioning this, too.
Alas in my experience with a wonky test cluster this failed with MDS,
maybe because of an odd name, maybe because nobody ever tested it.
MONs and OSDs were fine.
 
> What this will do is it will check which user owns files in the
> respective directories, and then start your Ceph daemons under the
> appropriate user and group IDs. In other words, if you enable this and
> you upgrade from Hammer to Jewel, and your files are still owned by
> root, your daemons will also continue run as root:root (as they did in
> hammer). Then, you can stop your OSDs, run the recursive chown, and
> restart the OSDs one-by-one. When they come back up, they will just
> automatically switch to running as ceph:ceph.
> 
Though if you have external journals and didn't use ceph-deploy, you're
boned with the whole ceph:ceph approach.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Christian Balzer

Hello,

On Sun, 12 Mar 2017 19:52:12 +1000 Brad Hubbard wrote:

> On Sun, Mar 12, 2017 at 6:36 AM, Christian Theune  
> wrote:
> > Hi,
> >
> > thanks for that report! Glad to hear a mostly happy report. I’m still on the
> > fence … ;)
> >
> > I have had reports that Qemu (librbd connections) will require
> > updates/restarts before upgrading. What was your experience on that side?
> > Did you upgrade the clients? Did you start using any of the new RBD
> > features, like fast diff?  
> 
> You don't need to restart qemu-kvm instances *before* upgrading but
> you do need to restart or migrate them *after* updating. The updated
> binaries are only loaded into the qemu process address space at
> start-up so to load the newly installed binaries (libraries) you need
> to restart or do a migration to an upgraded host.
> 

Well, the OP wrote about live migration problems, but those were not in the
qemu part of things but libvirt/openstack related.

To wit, I did upgrade a test cluster from hammer to Jewel and live
migration under ganeti worked fine.

I've also not seen any problems on other instances that since have not
been restarted, nor would I hope that an upgrade from one stable version
to the next should EVER require such a step (at least immediately). 

Christian

> >
> > What’s your experience with load/performance after the upgrade? Found any
> > new issues that indicate shifted hotspots?
> >
> > Cheers and thanks again,
> > Christian
> >
> > On Mar 11, 2017, at 12:21 PM, cephmailingl...@mosibi.nl wrote:
> >
> > Hello list,
> >
> > A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this
> > email we want to share our experiences.
> >
> >
> > We have four clusters:
> >
> > 1) Test cluster for all the fun things, completely virtual.
> >
> > 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal
> >
> > 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage
> >
> > 4) Main cluster (used for our custom software stack and openstack): 5
> > monitors and 1917 OSDs. 8 PB storage
> >
> >
> > All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph
> > packages from ceph.com. On every cluster we upgraded the monitors first and
> > after that, the OSDs. Our backup cluster is the only cluster that also
> > serves S3 via the RadosGW and that service is upgraded at the same time as
> > the OSDs in that cluster. The upgrade of clusters 1, 2 and 3 went without
> > any problem, just an apt-get upgrade on every component. We did  see the
> > message "failed to encode map e with expected crc", but that
> > message disappeared when all the OSDs where upgraded.
> >
> > The upgrade of our biggest cluster, nr 4, did not go without problems. Since
> > we where expecting a lot of "failed to encode map e with expected
> > crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs
> > -- --clog_to_monitors=false' so our monitors would not choke in those
> > messages. The upgrade of the monitors did go as expected, without any
> > problem, the problems started when we started the upgrade of the OSDs. In
> > the upgrade procedure, we had to change the ownership of the files from root
> > to the user ceph and that process was taking so long on our cluster that
> > completing the upgrade would take more then a week. We decided to keep the
> > permissions as they where for now, so in the upstart init script
> > /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to
> > '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade
> > was completely done
> >
> > On cluster 3 (backup) we could change the permissions in a shorter time with
> > the following procedure:
> >
> > a) apt-get -y install ceph-common
> > b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do
> > echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t
> > c) (wait for all the chown's to complete)
> > d) stop ceph-all
> > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> > f) start ceph-all
> >
> > This procedure did not work on our main (4) cluster because the load on the
> > OSDs became 100% in step b and that resulted in blocked I/O on some virtual
> > instances in the Openstack cluster. Also at that time one of our pools got a
> > lot of extra data, those files where stored with root permissions since we
> > did not restarted the Ceph daemons yet, the 'find' in step e found so much
> > files that xargs (the shell) could not handle it (too many arguments). At
> > that time we decided to keep the permissions on root in the upgrade phase.
> >
> > The next and biggest problem we encountered had to do with the CRC errors on
> > the OSD map. On every map update, the OSDs that were not upgraded yet, got
> > that CRC error and asked the monitor for a full OSD map instead of just a
> > delta update. At first we did not understand what exactly happened, we ran
> > the upgrade 

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Florian Haas
On Sat, Mar 11, 2017 at 12:21 PM,  wrote:
> The upgrade of our biggest cluster, nr 4, did not go without
> problems. Since we where expecting a lot of "failed to encode map
> e with expected crc" messages, we disabled clog to monitors
> with 'ceph tell osd.* injectargs -- --clog_to_monitors=false' so our
> monitors would not choke in those messages. The upgrade of the
> monitors did go as expected, without any problem, the problems
> started when we started the upgrade of the OSDs. In the upgrade
> procedure, we had to change the ownership of the files from root to
> the user ceph and that process was taking so long on our cluster that
> completing the upgrade would take more then a week. We decided to
> keep the permissions as they where for now, so in the upstart init
> script /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup
> ceph' to  '--setuser root --setgroup root' and fix that OSD by OSD
> after the upgrade was completely done

For others following this thread who still have the hammer→jewel upgrade
ahead: there is a ceph.conf option you can use here; no need to fiddle
with the upstart scripts.

setuser match path = /var/lib/ceph/$type/$cluster-$id

What this will do is it will check which user owns files in the
respective directories, and then start your Ceph daemons under the
appropriate user and group IDs. In other words, if you enable this and
you upgrade from Hammer to Jewel, and your files are still owned by
root, your daemons will also continue run as root:root (as they did in
hammer). Then, you can stop your OSDs, run the recursive chown, and
restart the OSDs one-by-one. When they come back up, they will just
automatically switch to running as ceph:ceph.

Cheers,
Florian



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Brad Hubbard
On Sun, Mar 12, 2017 at 6:36 AM, Christian Theune  wrote:
> Hi,
>
> thanks for that report! Glad to hear a mostly happy report. I’m still on the
> fence … ;)
>
> I have had reports that Qemu (librbd connections) will require
> updates/restarts before upgrading. What was your experience on that side?
> Did you upgrade the clients? Did you start using any of the new RBD
> features, like fast diff?

You don't need to restart qemu-kvm instances *before* upgrading but
you do need to restart or migrate them *after* updating. The updated
binaries are only loaded into the qemu process address space at
start-up so to load the newly installed binaries (libraries) you need
to restart or do a migration to an upgraded host.

>
> What’s your experience with load/performance after the upgrade? Found any
> new issues that indicate shifted hotspots?
>
> Cheers and thanks again,
> Christian
>
> On Mar 11, 2017, at 12:21 PM, cephmailingl...@mosibi.nl wrote:
>
> Hello list,
>
> A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this
> email we want to share our experiences.
>
>
> We have four clusters:
>
> 1) Test cluster for all the fun things, completely virtual.
>
> 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal
>
> 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage
>
> 4) Main cluster (used for our custom software stack and openstack): 5
> monitors and 1917 OSDs. 8 PB storage
>
>
> All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph
> packages from ceph.com. On every cluster we upgraded the monitors first and
> after that, the OSDs. Our backup cluster is the only cluster that also
> serves S3 via the RadosGW and that service is upgraded at the same time as
> the OSDs in that cluster. The upgrade of clusters 1, 2 and 3 went without
> any problem, just an apt-get upgrade on every component. We did  see the
> message "failed to encode map e with expected crc", but that
> message disappeared when all the OSDs where upgraded.
>
> The upgrade of our biggest cluster, nr 4, did not go without problems. Since
> we where expecting a lot of "failed to encode map e with expected
> crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs
> -- --clog_to_monitors=false' so our monitors would not choke in those
> messages. The upgrade of the monitors did go as expected, without any
> problem, the problems started when we started the upgrade of the OSDs. In
> the upgrade procedure, we had to change the ownership of the files from root
> to the user ceph and that process was taking so long on our cluster that
> completing the upgrade would take more then a week. We decided to keep the
> permissions as they where for now, so in the upstart init script
> /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to
> '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade
> was completely done
>
> On cluster 3 (backup) we could change the permissions in a shorter time with
> the following procedure:
>
> a) apt-get -y install ceph-common
> b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do
> echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t
> c) (wait for all the chown's to complete)
> d) stop ceph-all
> e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> f) start ceph-all
>
> This procedure did not work on our main (4) cluster because the load on the
> OSDs became 100% in step b and that resulted in blocked I/O on some virtual
> instances in the Openstack cluster. Also at that time one of our pools got a
> lot of extra data, those files where stored with root permissions since we
> did not restarted the Ceph daemons yet, the 'find' in step e found so much
> files that xargs (the shell) could not handle it (too many arguments). At
> that time we decided to keep the permissions on root in the upgrade phase.
>
> The next and biggest problem we encountered had to do with the CRC errors on
> the OSD map. On every map update, the OSDs that were not upgraded yet, got
> that CRC error and asked the monitor for a full OSD map instead of just a
> delta update. At first we did not understand what exactly happened, we ran
> the upgrade per node using a script and in that script we watch the state of
> the cluster and when the cluster is healthy again, we upgrade the next host.
> Every time we started the script (skipping the already upgraded hosts) the
> first host(s) upgraded without issues and then we got blocked I/O on the
> cluster. The blocked I/O went away within a minute of 2 (not measured).
> After investigation we found out that the blocked I/O happened when nodes
> where asking the monitor for a (full) OSD map and that resulted shortly in a
> full saturated network link on our monitor.
>
> In the next graph the statistics for one of our Ceph monitor is shown. Our
> hosts are equipped with 10 gbit/s NIC's and every time at 

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread cephmailinglist

On 03/11/2017 09:49 PM, Udo Lembke wrote:

Hi Udo,

Perhaps would an "find /var/lib/ceph/ ! -uid 64045 -exec chown
ceph:ceph" do an better job?!


We did exactly that (and also tried other combinations) and that is a 
workaround for the 'argument too long' problem, but then it would call 
an exec for every file it finds. All those forks took forever... :)



--
With regards,

Richard Arends.
Snow BV / http://snow.nl

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread cephmailinglist

On 03/11/2017 09:36 PM, Christian Theune wrote:

Hello,

I have had reports that Qemu (librbd connections) will require 
updates/restarts before upgrading. What was your experience on that 
side? Did you upgrade the clients? Did you start using any of the new 
RBD features, like fast diff?


We have two types of clients, 1) Openstack hosts and components like 
Cinder and 2) clients that use librbd (from Java and C). We combine Ceph 
and Openstack on the same host, meaning that when we upgraded Ceph for 
the OSDs, the libraries for Openstack was updated at the same time. The 
other type of clients where already using the Jewel libraries and 
binaries for some time. We did not changed anything on the clients, so 
we are not using the newly introduced features (yet)


What’s your experience with load/performance after the upgrade? Found 
any new issues that indicate shifted hotspots?


We did not see any difference.

--
With regards,

Richard Arends.
Snow BV / http://snow.nl

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-11 Thread Udo Lembke
Hi,

thanks for the usefull infos.


On 11.03.2017 12:21, cephmailingl...@mosibi.nl wrote:
>
> Hello list,
>
> A week ago we upgraded our Ceph clusters from Hammer to Jewel and with
> this email we want to share our experiences.
>
> ...
>
>
> e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> ... the 'find' in step e found so much files that xargs (the shell)
> could not handle it (too many arguments). At that time we decided to
> keep the permissions on root in the upgrade phase.
>
>
Perhaps would an "find /var/lib/ceph/ ! -uid 64045 -exec chown
ceph:ceph" do an better job?!

Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-11 Thread Christian Theune
Hi,

thanks for that report! Glad to hear a mostly happy report. I’m still on the 
fence … ;)

I have had reports that Qemu (librbd connections) will require updates/restarts 
before upgrading. What was your experience on that side? Did you upgrade the 
clients? Did you start using any of the new RBD features, like fast diff?

What’s your experience with load/performance after the upgrade? Found any new 
issues that indicate shifted hotspots?

Cheers and thanks again,
Christian

> On Mar 11, 2017, at 12:21 PM, cephmailingl...@mosibi.nl wrote:
> 
> Hello list,
> 
> A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this 
> email we want to share our experiences.
> 
> We have four clusters:
> 
> 1) Test cluster for all the fun things, completely virtual.
> 
> 2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal
> 
> 3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage
> 
> 4) Main cluster (used for our custom software stack and openstack): 5 
> monitors and 1917 OSDs. 8 PB storage
> 
> 
> All the clusters are running on Ubuntu 14.04 LTS and we use the Ceph packages 
> from ceph.com. On every cluster we upgraded the monitors first and after 
> that, the OSDs. Our backup cluster is the only cluster that also serves S3 
> via the RadosGW and that service is upgraded at the same time as the OSDs in 
> that cluster. The upgrade of clusters 1, 2 and 3 went without any problem, 
> just an apt-get upgrade on every component. We did  see the message "failed 
> to encode map e with expected crc", but that message disappeared 
> when all the OSDs where upgraded.
> The upgrade of our biggest cluster, nr 4, did not go without problems. Since 
> we where expecting a lot of "failed to encode map e with expected 
> crc" messages, we disabled clog to monitors with 'ceph tell osd.* injectargs 
> -- --clog_to_monitors=false' so our monitors would not choke in those 
> messages. The upgrade of the monitors did go as expected, without any 
> problem, the problems started when we started the upgrade of the OSDs. In the 
> upgrade procedure, we had to change the ownership of the files from root to 
> the user ceph and that process was taking so long on our cluster that 
> completing the upgrade would take more then a week. We decided to keep the 
> permissions as they where for now, so in the upstart init script 
> /etc/init/ceph-osd.conf, we changed '--setuser ceph --setgroup ceph' to  
> '--setuser root --setgroup root' and fix that OSD by OSD after the upgrade 
> was completely done
> 
> On cluster 3 (backup) we could change the permissions in a shorter time with 
> the following procedure:
> 
> a) apt-get -y install ceph-common
> b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P; do 
> echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t
> c) (wait for all the chown's to complete)
> d) stop ceph-all
> e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
> f) start ceph-all
> 
> This procedure did not work on our main (4) cluster because the load on the 
> OSDs became 100% in step b and that resulted in blocked I/O on some virtual 
> instances in the Openstack cluster. Also at that time one of our pools got a 
> lot of extra data, those files where stored with root permissions since we 
> did not restarted the Ceph daemons yet, the 'find' in step e found so much 
> files that xargs (the shell) could not handle it (too many arguments). At 
> that time we decided to keep the permissions on root in the upgrade phase.
> 
> The next and biggest problem we encountered had to do with the CRC errors on 
> the OSD map. On every map update, the OSDs that were not upgraded yet, got 
> that CRC error and asked the monitor for a full OSD map instead of just a 
> delta update. At first we did not understand what exactly happened, we ran 
> the upgrade per node using a script and in that script we watch the state of 
> the cluster and when the cluster is healthy again, we upgrade the next host. 
> Every time we started the script (skipping the already upgraded hosts) the 
> first host(s) upgraded without issues and then we got blocked I/O on the 
> cluster. The blocked I/O went away within a minute of 2 (not measured). After 
> investigation we found out that the blocked I/O happened when nodes where 
> asking the monitor for a (full) OSD map and that resulted shortly in a full 
> saturated network link on our monitor.
> 
> In the next graph the statistics for one of our Ceph monitor is shown. Our 
> hosts are equipped with 10 gbit/s NIC's and every time at the highest peaks, 
> the problems occurred. We could work around this problem by waiting four 
> minutes between every host and after that time (14:20) we did not have any 
> issues any more. Of course the number of not upgraded OSDs decreased, so the 
> number of full OSD map requests also got smaller in time.
> 
> 
> 
> 
> The day after the upgrade we had issues with