[ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Hi,

I'm trying to add CEPH as Primary Storage, but my libvirt 0.10.2 (CentOS
6.5) does some complaints:
-  internal error missing backend for pool type 8

Is it possible that the libvirt 0.10.2 (shipped with CentOS 6.5) was not
compiled with RBD support ?
Can't find how to check this...

I'm able to use qemu-img to create rbd images etc...

Here is cloudstack-agent DEBUG output, all seems fine...

pool type='rbd'
name1e119e4c-20d1-3fbc-a525-a5771944046d/name
uuid1e119e4c-20d1-3fbc-a525-a5771944046d/uuid
source
host name='10.44.253.10' port='6789'/
namecloudstack/name
auth username='cloudstack' type='ceph'
secret uuid='1e119e4c-20d1-3fbc-a525-a5771944046d'/
/auth
/source
/pool

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Thank you very much Wido,
any suggestion on compiling libvirt with support (I already found a way) or
perhaps use some prebuilt , that you would recommend ?

Best


On 28 April 2014 13:25, Wido den Hollander w...@42on.com wrote:

 On 04/28/2014 12:49 PM, Andrija Panic wrote:

 Hi,

 I'm trying to add CEPH as Primary Storage, but my libvirt 0.10.2 (CentOS
 6.5) does some complaints:
 -  internal error missing backend for pool type 8

 Is it possible that the libvirt 0.10.2 (shipped with CentOS 6.5) was not
 compiled with RBD support ?
 Can't find how to check this...


 No, it's probably not compiled with RBD storage pool support.

 As far as I know CentOS doesn't compile libvirt with that support yet.


  I'm able to use qemu-img to create rbd images etc...

 Here is cloudstack-agent DEBUG output, all seems fine...

 pool type='rbd'
 name1e119e4c-20d1-3fbc-a525-a5771944046d/name
 uuid1e119e4c-20d1-3fbc-a525-a5771944046d/uuid
 source
 host name='10.44.253.10' port='6789'/


 I recommend creating a Round Robin DNS record which points to all your
 monitors.

  namecloudstack/name
 auth username='cloudstack' type='ceph'
 secret uuid='1e119e4c-20d1-3fbc-a525-a5771944046d'/
 /auth
 /source
 /pool

 --

 Andrija Panić


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Thanks Dan :)


On 28 April 2014 15:02, Dan van der Ster daniel.vanders...@cern.ch wrote:


 On 28/04/14 14:54, Wido den Hollander wrote:

 On 04/28/2014 02:15 PM, Andrija Panic wrote:

 Thank you very much Wido,
 any suggestion on compiling libvirt with support (I already found a way)
 or perhaps use some prebuilt , that you would recommend ?


 No special suggestions, just make sure you use at least Ceph 0.67.7

 I'm not aware of any pre-build packages for CentOS.


 Look for qemu-kvm-rhev ... el6 ...
 That's the Redhat built version of kvm which supports RBD.

 Cheers, Dan




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Dan, is this maybe just rbd support for kvm package (I already have rbd
enabled qemu, qemu-img etc from ceph.com site)
I need just libvirt with rbd support ?

Thanks


On 28 April 2014 15:05, Andrija Panic andrija.pa...@gmail.com wrote:

 Thanks Dan :)


 On 28 April 2014 15:02, Dan van der Ster daniel.vanders...@cern.chwrote:


 On 28/04/14 14:54, Wido den Hollander wrote:

 On 04/28/2014 02:15 PM, Andrija Panic wrote:

 Thank you very much Wido,
 any suggestion on compiling libvirt with support (I already found a way)
 or perhaps use some prebuilt , that you would recommend ?


 No special suggestions, just make sure you use at least Ceph 0.67.7

 I'm not aware of any pre-build packages for CentOS.


 Look for qemu-kvm-rhev ... el6 ...
 That's the Redhat built version of kvm which supports RBD.

 Cheers, Dan




 --

 Andrija Panić
 --
   http://admintweets.com
 --




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD not starting at boot time

2014-04-30 Thread Andrija Panic
Hi,

I was wondering why would OSDs not start at the boot time, happens on 1
server (2 OSDs).

If i check with: chkconfig ceph --list, I can see that is should start,
that is, the MON on this server does really start but OSDs does not.

I can normally start them with: service ceph start osd.X

This is CentOS 6.5, and CEPH 0.72.2 deployed with ceph deploy tool.

I did not forget the ceph osd activate... for sure.

Thanks
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Migrate system VMs from local storage to CEPH

2014-05-02 Thread Andrija Panic
Hi.

I was wondering what would be correct way to migrate system VMs
(storage,console,VR) from local storage to CEPH.

I'm on CS 4.2.1 and will be soon updating to 4.3...

Is it enough to just change global setting system.vm.use.local.storage =
true, to FALSE, and then destroy system VMs (cloudstack will recreate them
in 1-2 minutes)

Also how to make sure that system VMs will NOT end up on NFS storage ?

Thanks for any input...

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate system VMs from local storage to CEPH

2014-05-05 Thread Andrija Panic
Thank you very much Wido, that's exatly what I was looking for.
Thanks


On 4 May 2014 18:30, Wido den Hollander w...@42on.com wrote:

 On 05/02/2014 04:06 PM, Andrija Panic wrote:

 Hi.

 I was wondering what would be correct way to migrate system VMs
 (storage,console,VR) from local storage to CEPH.

 I'm on CS 4.2.1 and will be soon updating to 4.3...

 Is it enough to just change global setting system.vm.use.local.storage =
 true, to FALSE, and then destroy system VMs (cloudstack will recreate
 them in 1-2 minutes)


 Yes, that would be sufficient. CloudStack will then deploy the SSVMs on
 your RBD storage.


  Also how to make sure that system VMs will NOT end up on NFS storage ?


 Make use of the tagging. Tag the RBD pools with 'rbd' and change the
 Service Offering for the SSVMs where they require 'rbd' as a storage tag.

  Thanks for any input...

 --

 Andrija Panić


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate system VMs from local storage to CEPH

2014-05-05 Thread Andrija Panic
Will try creating tag inside CS database, since GUI/cloudmoneky editing of
existing offer is NOT possible...



On 5 May 2014 16:04, Brian Rak b...@gameservers.com wrote:

  This would be a better question for the Cloudstack community.


 On 5/2/2014 10:06 AM, Andrija Panic wrote:

 Hi.

  I was wondering what would be correct way to migrate system VMs
 (storage,console,VR) from local storage to CEPH.

  I'm on CS 4.2.1 and will be soon updating to 4.3...

  Is it enough to just change global setting system.vm.use.local.storage =
 true, to FALSE, and then destroy system VMs (cloudstack will recreate them
 in 1-2 minutes)

  Also how to make sure that system VMs will NOT end up on NFS storage ?

  Thanks for any input...

  --

 Andrija Panić


 ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate system VMs from local storage to CEPH

2014-05-05 Thread Andrija Panic
Hi Wido,

thanks again for inputs.

Everything is fine, except for the Software Router - it doesn't seem to get
created on CEPH, no matter what I try.

I created new offering for CPVV and SSVM and used the guide here:
https://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html-single/Admin_Guide/index.html#sys-offering-sysvmto
start using these new system offerings and it is all fine. Did the
same
for Software Router, but it keeps using original system offering, instead
of the one I created.

CS keeps creating VR on NFS storage, choosen randomly among 3 NFS storage
nodes...

Any suggestion, please ?

Thanks,
Andrija


On 5 May 2014 16:11, Andrija Panic andrija.pa...@gmail.com wrote:

 Will try creating tag inside CS database, since GUI/cloudmoneky editing of
 existing offer is NOT possible...



 On 5 May 2014 16:04, Brian Rak b...@gameservers.com wrote:

  This would be a better question for the Cloudstack community.


 On 5/2/2014 10:06 AM, Andrija Panic wrote:

 Hi.

  I was wondering what would be correct way to migrate system VMs
 (storage,console,VR) from local storage to CEPH.

  I'm on CS 4.2.1 and will be soon updating to 4.3...

  Is it enough to just change global setting system.vm.use.local.storage
 = true, to FALSE, and then destroy system VMs (cloudstack will recreate
 them in 1-2 minutes)

  Also how to make sure that system VMs will NOT end up on NFS storage ?

  Thanks for any input...

  --

 Andrija Panić


 ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





 --

 Andrija Panić
 --
   http://admintweets.com
 --




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replace journals disk

2014-05-06 Thread Andrija Panic
If you have dedicated disk for Journal, that you want to replace - consider
(this may be not optimal, but crosses my mind...) stoping OSD (if that is
possible), maybe with no-out etc, then DD old disk to new one, and just
resize file system and partitions if needed...

I guess there is more elegant than this manual steps...

Cheers


On 6 May 2014 12:52, Gandalf Corvotempesta
gandalf.corvotempe...@gmail.comwrote:

 2014-05-06 12:39 GMT+02:00 Andrija Panic andrija.pa...@gmail.com:
  Good question - I'm also interested. Do you want to movejournal to
 dedicated
  disk/partition i.e. on SSD or just replace (failed) disk with new/bigger
 one
  ?

 I would like to replace the disk with a bigger one (in fact, my new
 disk is smaller, but this should not change the workflow)




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate system VMs from local storage to CEPH

2014-05-06 Thread Andrija Panic
I appologize, I did post to wrong mailing list, to much emails these days :)
@Wido, yes I did check and there is separete offering butyou can't change
it the same way you change for CPVM and SSVM...
Will post to CS mailing, sorry for this..



On 6 May 2014 17:52, Wido den Hollander w...@42on.com wrote:

 On 05/05/2014 11:40 PM, Andrija Panic wrote:

 Hi Wido,

 thanks again for inputs.

 Everything is fine, except for the Software Router - it doesn't seem to
 get created on CEPH, no matter what I try.


 There is a separate offering for the VR, have you checked that?

 But this is more something for the CloudStack users list as it's not
 related to Ceph.

 Wido

  I created new offering for CPVV and SSVM and used the guide here:
 https://cloudstack.apache.org/docs/en-US/Apache_CloudStack/
 4.2.0/html-single/Admin_Guide/index.html#sys-offering-sysvm
 to start using these new system offerings and it is all fine. Did the
 same for Software Router, but it keeps using original system offering,
 instead of the one I created.

 CS keeps creating VR on NFS storage, choosen randomly among 3 NFS
 storage nodes...

 Any suggestion, please ?

 Thanks,
 Andrija


 On 5 May 2014 16:11, Andrija Panic andrija.pa...@gmail.com
 mailto:andrija.pa...@gmail.com wrote:

 Will try creating tag inside CS database, since GUI/cloudmoneky
 editing of existing offer is NOT possible...



 On 5 May 2014 16:04, Brian Rak b...@gameservers.com
 mailto:b...@gameservers.com wrote:

 This would be a better question for the Cloudstack community.


 On 5/2/2014 10:06 AM, Andrija Panic wrote:

 Hi.

 I was wondering what would be correct way to migrate system
 VMs (storage,console,VR) from local storage to CEPH.

 I'm on CS 4.2.1 and will be soon updating to 4.3...

 Is it enough to just change global
 setting system.vm.use.local.storage = true, to FALSE, and then
 destroy system VMs (cloudstack will recreate them in 1-2 minutes)

 Also how to make sure that system VMs will NOT end up on NFS
 storage ?

 Thanks for any input...

 --

 Andrija Panić


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com  mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





 --

 Andrija Panić
 --
 http://admintweets.com
 --




 --

 Andrija Panić
 --
 http://admintweets.com
 --



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS over CEPH - best practice

2014-05-07 Thread Andrija Panic
Mapping RBD image to 2 or more servers is the same as a shared storage
device (SAN) -  so from there on, you could do any clustering you want,
based on what Wido said...



On 7 May 2014 12:43, Andrei Mikhailovsky and...@arhont.com wrote:


 Wido, would this work if I were to run nfs over two or more servers with
 virtual IP?

 I can see what you've suggested working in a one server setup. What about
 if you want to have two nfs servers in an active/backup or active/active
 setup?

 Thanks

 Andrei


 --
 *From: *Wido den Hollander w...@42on.com
 *To: *ceph-users@lists.ceph.com
 *Sent: *Wednesday, 7 May, 2014 11:15:39 AM
 *Subject: *Re: [ceph-users] NFS over CEPH - best practice

 On 05/07/2014 11:46 AM, Andrei Mikhailovsky wrote:
  Hello guys,
 
  I would like to offer NFS service to the XenServer and VMWare
  hypervisors for storing vm images. I am currently running ceph rbd with
  kvm, which is working reasonably well.
 
  What would be the best way of running NFS services over CEPH, so that
  the XenServer and VMWare's vm disk images are stored in ceph storage
  over NFS?
 

 Use kernel RBD, put XFS on it an re-export that with NFS? Would that be
 something that works?

 I'd however suggest that you use a recent kernel so that you have a new
 version of krbd. For example Ubuntu 14.04 LTS.

  Many thanks
 
  Andrei
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] qemu-img break cloudstack snapshot

2014-05-10 Thread Andrija Panic
Hi,

just to share my issue with qemu-img provided by CEPH (RedHat made a
problem, not CEPH):

newest qemu-img  - /qemu-img-0.12.1.2-2.415.el6.3ceph.x86_64.rpm  was built
from RHEL 6.5 source code, where Redhat removed the -s paramter, so
snapshooting in CloudStack up to 4.2.1 does not work, I guess there are
also problems with OpenStack...

Older CEPH's RPM for qemu-img that I have, that is working fine (I suppose
it was built based on RHEL 6.4 source)
is qemu-img-0.12.1.2-2.355.el6.2.cuttlefish.x86_64.rpm

Raised a ticket, although this is not a problem caused by CEPH, but by
RedHat.
Ticket was raised on hope for CEPH's developers to provide a older qemu-img
that works fine (the one that I have) - or possibly to compile new one
based on RHEL 6.4 source.
http://tracker.ceph.com/issues/8329

Best,

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] client: centos6.4 no rbd.ko

2014-05-14 Thread Andrija Panic
Try 3.x from elrepo repo...works for me, cloudstack/ceph...

Sent from Google Nexus 4
On May 14, 2014 11:56 AM, maoqi1982 maoqi1...@126.com wrote:

 Hi list
 our ceph(0.72) cluster use ubuntu12.04  is ok . client server run
 openstack install CentOS6.4 final, the kernel is up to
 kernel-2.6.32-358.123.2.openstack.el6.x86_64.
 the question is the kernel does not support the rbd.ko ceph.ko. can anyone
  help me to add the rbd.ko ceph.ko in 
 kernel-2.6.32-358.123.2.openstack.el6.x86_64
 or other way except up kernel

 thanks.



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Andrija Panic
Hi,

I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data,
network also fine:
Ceph ceph-0.72.2.

When I issue ceph status command, I get randomly HEALTH_OK, and
imidiately after that when repeating command, I get HEALTH_WARN

Examle given down - these commands were issues within less than 1 sec
between them
There are NO occuring of word warn in the logs (grep -ir warn
/var/log/ceph) on any of the servers...
I get false alerts with my status monitoring script, for this reason...

Any help would be greatly appriciated.

Thanks,

[root@cs3 ~]# ceph status
cluster cab20370-bf6a-4589-8010-8d5fc8682eab
 health HEALTH_OK
 monmap e2: 3 mons at
{cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
election epoch 122, quorum 0,1,2 cs1,cs2,cs3
 osdmap e890: 6 osds: 6 up, 6 in
  pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
2576 GB used, 19732 GB / 22309 GB avail
 448 active+clean
  client io 17331 kB/s rd, 113 kB/s wr, 176 op/s

[root@cs3 ~]# ceph status
cluster cab20370-bf6a-4589-8010-8d5fc8682eab
 health HEALTH_WARN
 monmap e2: 3 mons at
{cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
election epoch 122, quorum 0,1,2 cs1,cs2,cs3
 osdmap e890: 6 osds: 6 up, 6 in
  pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
2576 GB used, 19732 GB / 22309 GB avail
 448 active+clean
  client io 28383 kB/s rd, 566 kB/s wr, 321 op/s

[root@cs3 ~]# ceph status
cluster cab20370-bf6a-4589-8010-8d5fc8682eab
 health HEALTH_OK
 monmap e2: 3 mons at
{cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
election epoch 122, quorum 0,1,2 cs1,cs2,cs3
 osdmap e890: 6 osds: 6 up, 6 in
  pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
2576 GB used, 19732 GB / 22309 GB avail
 448 active+clean
  client io 21632 kB/s rd, 49354 B/s wr, 283 op/s

-- 

Andrija Panić
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Andrija Panic
Hi Christian,

that seems true, thanks.

But again, there are only occurence in GZ logs files (that were logrotated,
not in current log files):
Example:

[root@cs2 ~]# grep -ir WRN /var/log/ceph/
Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
Binary file /var/log/ceph/ceph.log-20140614.gz matches
Binary file /var/log/ceph/ceph.log-20140611.gz matches
Binary file /var/log/ceph/ceph.log-20140612.gz matches
Binary file /var/log/ceph/ceph.log-20140613.gz matches

Thanks,
Andrija


On 17 June 2014 10:48, Christian Balzer ch...@gol.com wrote:


 Hello,

 On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:

  Hi,
 
  I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data,
  network also fine:
  Ceph ceph-0.72.2.
 
  When I issue ceph status command, I get randomly HEALTH_OK, and
  imidiately after that when repeating command, I get HEALTH_WARN
 
  Examle given down - these commands were issues within less than 1 sec
  between them
  There are NO occuring of word warn in the logs (grep -ir warn
  /var/log/ceph) on any of the servers...
  I get false alerts with my status monitoring script, for this reason...
 
 If I recall correctly, the logs will show INF, WRN and ERR, so grep for
 WRN.

 Regards,

 Christian

  Any help would be greatly appriciated.
 
  Thanks,
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_OK
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576 GB used, 19732 GB / 22309 GB avail
   448 active+clean
client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_WARN
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576 GB used, 19732 GB / 22309 GB avail
   448 active+clean
client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_OK
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576 GB used, 19732 GB / 22309 GB avail
   448 active+clean
client io 21632 kB/s rd, 49354 B/s wr, 283 op/s
 


 --
 Christian BalzerNetwork/Systems Engineer
 ch...@gol.com   Global OnLine Japan/Fusion Communications
 http://www.gol.com/




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Andrija Panic
Hi,

thanks for that, but is not space issue:

OSD drives are only 12% full.
and /var drive on which MON lives is over 70% only on CS3 server, but I
have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
data avail crit = 5), and since I increased them those alerts are gone
(anyway, these alerts for /var full over 70% can be normally seen in logs
and in ceph -w output).

Here I get no normal/visible warning in eather logs or ceph -w output...

Thanks,
Andrija




On 17 June 2014 11:00, Stanislav Yanchev s.yanc...@maxtelecom.bg wrote:

 Try grep in cs1 and cs3 could be a disk space issue.





 Regards,

 *Stanislav Yanchev*
 Core System Administrator

 [image: MAX TELECOM]

 Mobile: +359 882 549 441
 s.yanc...@maxtelecom.bg
 www.maxtelecom.bg


 *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
 Of *Andrija Panic
 *Sent:* Tuesday, June 17, 2014 11:57 AM
 *To:* Christian Balzer
 *Cc:* ceph-users@lists.ceph.com
 *Subject:* Re: [ceph-users] Cluster status reported wrongly as
 HEALTH_WARN



 Hi Christian,



 that seems true, thanks.



 But again, there are only occurence in GZ logs files (that were
 logrotated, not in current log files):

 Example:



 [root@cs2 ~]# grep -ir WRN /var/log/ceph/

 Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches

 Binary file /var/log/ceph/ceph.log-20140614.gz matches

 Binary file /var/log/ceph/ceph.log-20140611.gz matches

 Binary file /var/log/ceph/ceph.log-20140612.gz matches

 Binary file /var/log/ceph/ceph.log-20140613.gz matches



 Thanks,

 Andrija



 On 17 June 2014 10:48, Christian Balzer ch...@gol.com wrote:


 Hello,


 On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:

  Hi,
 
  I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data,
  network also fine:
  Ceph ceph-0.72.2.
 
  When I issue ceph status command, I get randomly HEALTH_OK, and
  imidiately after that when repeating command, I get HEALTH_WARN
 
  Examle given down - these commands were issues within less than 1 sec
  between them
  There are NO occuring of word warn in the logs (grep -ir warn
  /var/log/ceph) on any of the servers...
  I get false alerts with my status monitoring script, for this reason...
 

 If I recall correctly, the logs will show INF, WRN and ERR, so grep for
 WRN.

 Regards,

 Christian


  Any help would be greatly appriciated.
 
  Thanks,
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_OK
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576 GB used, 19732 GB / 22309 GB avail
   448 active+clean
client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_WARN
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576 GB used, 19732 GB / 22309 GB avail
   448 active+clean
client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_OK
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576 GB used, 19732 GB / 22309 GB avail
   448 active+clean
client io 21632 kB/s rd, 49354 B/s wr, 283 op/s
 


 --

 Christian BalzerNetwork/Systems Engineer
 ch...@gol.com   Global OnLine Japan/Fusion Communications
 http://www.gol.com/





 --



 Andrija Panić

 --

   http://admintweets.com

 --

 http://gfidisc.maxtelecom.bg

 *Confidentiality notice*
 --



 The information contained in this message (including any attachments) is
 confidential and may be legally privileged or otherwise protected from
 disclosure. This message is intended solely for the addressee(s). If you
 are not the intended recipient, please notify the sender by return e-mail
 and delete this message from your system. Any unauthorised use,
 reproduction, or dissemination of this message is strictly prohibited. Any
 liability arising from any third party acting, or refraining from acting,
 on any information contained in this e-mail is hereby excluded. Please note
 that e-mails are susceptible to change. Max Telecom shall not be liable for
 the improper or incomplete transmission

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-18 Thread Andrija Panic
Hi Gregory,

indeed - I still have warnings about 20% free space on CS3 server, where
MON lives...strange is that I don't get these warnings with prolonged ceph
-w output...
[root@cs2 ~]# ceph health detail
HEALTH_WARN
mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk space!

I don't understand, how is this possible to get warnings - I have folowing
in each ceph.conf file, under the general section:

mon data avail warn = 15
mon data avail crit = 5

I found this settings on ceph mailing list...

Thanks a lot,
Andrija


On 17 June 2014 19:22, Gregory Farnum g...@inktank.com wrote:

 Try running ceph health detail on each of the monitors. Your disk space
 thresholds probably aren't configured correctly or something.
 -Greg

 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:

 Hi,

 thanks for that, but is not space issue:

 OSD drives are only 12% full.
 and /var drive on which MON lives is over 70% only on CS3 server, but I
 have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
 data avail crit = 5), and since I increased them those alerts are gone
 (anyway, these alerts for /var full over 70% can be normally seen in logs
 and in ceph -w output).

 Here I get no normal/visible warning in eather logs or ceph -w output...

 Thanks,
 Andrija




 On 17 June 2014 11:00, Stanislav Yanchev s.yanc...@maxtelecom.bg wrote:

 Try grep in cs1 and cs3 could be a disk space issue.





 Regards,

 *Stanislav Yanchev*
 Core System Administrator

 [image: MAX TELECOM]

 Mobile: +359 882 549 441
 s.yanc...@maxtelecom.bg
 www.maxtelecom.bg


 *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
 Behalf Of *Andrija Panic
 *Sent:* Tuesday, June 17, 2014 11:57 AM
 *To:* Christian Balzer
 *Cc:* ceph-users@lists.ceph.com
 *Subject:* Re: [ceph-users] Cluster status reported wrongly as
 HEALTH_WARN



 Hi Christian,



 that seems true, thanks.



 But again, there are only occurence in GZ logs files (that were
 logrotated, not in current log files):

 Example:



 [root@cs2 ~]# grep -ir WRN /var/log/ceph/

 Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches

 Binary file /var/log/ceph/ceph.log-20140614.gz matches

 Binary file /var/log/ceph/ceph.log-20140611.gz matches

 Binary file /var/log/ceph/ceph.log-20140612.gz matches

 Binary file /var/log/ceph/ceph.log-20140613.gz matches



 Thanks,

 Andrija



 On 17 June 2014 10:48, Christian Balzer ch...@gol.com wrote:


 Hello,


 On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:

  Hi,
 
  I have 3 node (2 OSD per node) CEPH cluster, running fine, not much
 data,
  network also fine:
  Ceph ceph-0.72.2.
 
  When I issue ceph status command, I get randomly HEALTH_OK, and
  imidiately after that when repeating command, I get HEALTH_WARN
 
  Examle given down - these commands were issues within less than 1 sec
  between them
  There are NO occuring of word warn in the logs (grep -ir warn
  /var/log/ceph) on any of the servers...
  I get false alerts with my status monitoring script, for this reason...
 

 If I recall correctly, the logs will show INF, WRN and ERR, so grep for
 WRN.

 Regards,

 Christian


  Any help would be greatly appriciated.
 
  Thanks,
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_OK
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576 GB used, 19732 GB / 22309 GB avail
   448 active+clean
client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_WARN
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576 GB used, 19732 GB / 22309 GB avail
   448 active+clean
client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_OK
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576 GB used, 19732 GB / 22309 GB avail
   448 active+clean
client io 21632 kB/s rd, 49354 B/s wr, 283 op/s
 


 --

 Christian BalzerNetwork/Systems Engineer
 ch...@gol.com   Global OnLine Japan/Fusion Communications
 http://www.gol.com/





 --



 Andrija Panić

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-18 Thread Andrija Panic
As stupid as I could do it...
After lowering mon data . from 20% to 15% treshold, it seems I forgot
to restart MON service on this one node...

I appologies for bugging and thanks again everybody.

Andrija


On 18 June 2014 09:49, Andrija Panic andrija.pa...@gmail.com wrote:

 Hi Gregory,

 indeed - I still have warnings about 20% free space on CS3 server, where
 MON lives...strange is that I don't get these warnings with prolonged ceph
 -w output...
 [root@cs2 ~]# ceph health detail
 HEALTH_WARN
 mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk
 space!

 I don't understand, how is this possible to get warnings - I have folowing
 in each ceph.conf file, under the general section:

 mon data avail warn = 15
 mon data avail crit = 5

 I found this settings on ceph mailing list...

 Thanks a lot,
 Andrija


 On 17 June 2014 19:22, Gregory Farnum g...@inktank.com wrote:

 Try running ceph health detail on each of the monitors. Your disk space
 thresholds probably aren't configured correctly or something.
 -Greg

 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:

 Hi,

 thanks for that, but is not space issue:

 OSD drives are only 12% full.
 and /var drive on which MON lives is over 70% only on CS3 server, but I
 have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
 data avail crit = 5), and since I increased them those alerts are gone
 (anyway, these alerts for /var full over 70% can be normally seen in logs
 and in ceph -w output).

 Here I get no normal/visible warning in eather logs or ceph -w output...

 Thanks,
 Andrija




 On 17 June 2014 11:00, Stanislav Yanchev s.yanc...@maxtelecom.bg
 wrote:

 Try grep in cs1 and cs3 could be a disk space issue.





 Regards,

 *Stanislav Yanchev*
 Core System Administrator

 [image: MAX TELECOM]

 Mobile: +359 882 549 441
 s.yanc...@maxtelecom.bg
 www.maxtelecom.bg


 *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
 Behalf Of *Andrija Panic
 *Sent:* Tuesday, June 17, 2014 11:57 AM
 *To:* Christian Balzer
 *Cc:* ceph-users@lists.ceph.com
 *Subject:* Re: [ceph-users] Cluster status reported wrongly as
 HEALTH_WARN



 Hi Christian,



 that seems true, thanks.



 But again, there are only occurence in GZ logs files (that were
 logrotated, not in current log files):

 Example:



 [root@cs2 ~]# grep -ir WRN /var/log/ceph/

 Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches

 Binary file /var/log/ceph/ceph.log-20140614.gz matches

 Binary file /var/log/ceph/ceph.log-20140611.gz matches

 Binary file /var/log/ceph/ceph.log-20140612.gz matches

 Binary file /var/log/ceph/ceph.log-20140613.gz matches



 Thanks,

 Andrija



 On 17 June 2014 10:48, Christian Balzer ch...@gol.com wrote:


 Hello,


 On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:

  Hi,
 
  I have 3 node (2 OSD per node) CEPH cluster, running fine, not much
 data,
  network also fine:
  Ceph ceph-0.72.2.
 
  When I issue ceph status command, I get randomly HEALTH_OK, and
  imidiately after that when repeating command, I get HEALTH_WARN
 
  Examle given down - these commands were issues within less than 1 sec
  between them
  There are NO occuring of word warn in the logs (grep -ir warn
  /var/log/ceph) on any of the servers...
  I get false alerts with my status monitoring script, for this
 reason...
 

 If I recall correctly, the logs will show INF, WRN and ERR, so grep for
 WRN.

 Regards,

 Christian


  Any help would be greatly appriciated.
 
  Thanks,
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_OK
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576 GB used, 19732 GB / 22309 GB avail
   448 active+clean
client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_WARN
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576 GB used, 19732 GB / 22309 GB avail
   448 active+clean
client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
 
  [root@cs3 ~]# ceph status
  cluster cab20370-bf6a-4589-8010-8d5fc8682eab
   health HEALTH_OK
   monmap e2: 3 mons at
 
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
  election epoch 122, quorum 0,1,2 cs1,cs2,cs3
   osdmap e890: 6 osds: 6 up, 6 in
pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
  2576

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-18 Thread Andrija Panic
Thanks Greg, seems like I'm going to update soon...

Thanks again,
Andrija


On 18 June 2014 14:06, Gregory Farnum g...@inktank.com wrote:

 The lack of warnings in ceph -w for this issue is a bug in Emperor.
 It's resolved in Firefly.
 -Greg

 On Wed, Jun 18, 2014 at 3:49 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:
 
  Hi Gregory,
 
  indeed - I still have warnings about 20% free space on CS3 server, where
 MON lives...strange is that I don't get these warnings with prolonged ceph
 -w output...
  [root@cs2 ~]# ceph health detail
  HEALTH_WARN
  mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk
 space!
 
  I don't understand, how is this possible to get warnings - I have
 folowing in each ceph.conf file, under the general section:
 
  mon data avail warn = 15
  mon data avail crit = 5
 
  I found this settings on ceph mailing list...
 
  Thanks a lot,
  Andrija
 
 
  On 17 June 2014 19:22, Gregory Farnum g...@inktank.com wrote:
 
  Try running ceph health detail on each of the monitors. Your disk
 space thresholds probably aren't configured correctly or something.
  -Greg
 
  Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
  On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:
 
  Hi,
 
  thanks for that, but is not space issue:
 
  OSD drives are only 12% full.
  and /var drive on which MON lives is over 70% only on CS3 server, but
 I have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
 data avail crit = 5), and since I increased them those alerts are gone
 (anyway, these alerts for /var full over 70% can be normally seen in logs
 and in ceph -w output).
 
  Here I get no normal/visible warning in eather logs or ceph -w
 output...
 
  Thanks,
  Andrija
 
 
 
 
  On 17 June 2014 11:00, Stanislav Yanchev s.yanc...@maxtelecom.bg
 wrote:
 
  Try grep in cs1 and cs3 could be a disk space issue.
 
 
 
 
 
  Regards,
 
  Stanislav Yanchev
  Core System Administrator
 
 
 
  Mobile: +359 882 549 441
  s.yanc...@maxtelecom.bg
  www.maxtelecom.bg
 
 
  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
 Behalf Of Andrija Panic
  Sent: Tuesday, June 17, 2014 11:57 AM
  To: Christian Balzer
  Cc: ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] Cluster status reported wrongly as
 HEALTH_WARN
 
 
 
  Hi Christian,
 
 
 
  that seems true, thanks.
 
 
 
  But again, there are only occurence in GZ logs files (that were
 logrotated, not in current log files):
 
  Example:
 
 
 
  [root@cs2 ~]# grep -ir WRN /var/log/ceph/
 
  Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
 
  Binary file /var/log/ceph/ceph.log-20140614.gz matches
 
  Binary file /var/log/ceph/ceph.log-20140611.gz matches
 
  Binary file /var/log/ceph/ceph.log-20140612.gz matches
 
  Binary file /var/log/ceph/ceph.log-20140613.gz matches
 
 
 
  Thanks,
 
  Andrija
 
 
 
  On 17 June 2014 10:48, Christian Balzer ch...@gol.com wrote:
 
 
  Hello,
 
 
  On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
 
   Hi,
  
   I have 3 node (2 OSD per node) CEPH cluster, running fine, not much
 data,
   network also fine:
   Ceph ceph-0.72.2.
  
   When I issue ceph status command, I get randomly HEALTH_OK, and
   imidiately after that when repeating command, I get HEALTH_WARN
  
   Examle given down - these commands were issues within less than 1
 sec
   between them
   There are NO occuring of word warn in the logs (grep -ir warn
   /var/log/ceph) on any of the servers...
   I get false alerts with my status monitoring script, for this
 reason...
  
 
  If I recall correctly, the logs will show INF, WRN and ERR, so grep
 for
  WRN.
 
  Regards,
 
  Christian
 
 
   Any help would be greatly appriciated.
  
   Thanks,
  
   [root@cs3 ~]# ceph status
   cluster cab20370-bf6a-4589-8010-8d5fc8682eab
health HEALTH_OK
monmap e2: 3 mons at
  
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
   election epoch 122, quorum 0,1,2 cs1,cs2,cs3
osdmap e890: 6 osds: 6 up, 6 in
 pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
   2576 GB used, 19732 GB / 22309 GB avail
448 active+clean
 client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
  
   [root@cs3 ~]# ceph status
   cluster cab20370-bf6a-4589-8010-8d5fc8682eab
health HEALTH_WARN
monmap e2: 3 mons at
  
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
   election epoch 122, quorum 0,1,2 cs1,cs2,cs3
osdmap e890: 6 osds: 6 up, 6 in
 pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
   2576 GB used, 19732 GB / 22309 GB avail
448 active+clean
 client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
  
   [root@cs3 ~]# ceph status
   cluster cab20370-bf6a-4589-8010-8d5fc8682eab
health HEALTH_OK
monmap e2: 3 mons at
  
 {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx

[ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-02 Thread Andrija Panic
Hi,

I have existing CEPH cluster of 3 nodes, versions 0.72.2

I'm in a process of installing CEPH on 4th node, but now CEPH version is
0.80.1

Will this make problems running mixed CEPH versions ?

I intend to upgrade CEPH on exsiting 3 nodes anyway ?
Recommended steps ?

Thanks

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-03 Thread Andrija Panic
Hi Wido, thanks for answers - I have mons and OSD on each host... server1:
mon + 2 OSDs, same for server2 and server3.

Any Proposed upgrade path, or just start with 1 server and move along to
others ?

Thanks again.
Andrija


On 2 July 2014 16:34, Wido den Hollander w...@42on.com wrote:

 On 07/02/2014 04:08 PM, Andrija Panic wrote:

 Hi,

 I have existing CEPH cluster of 3 nodes, versions 0.72.2

 I'm in a process of installing CEPH on 4th node, but now CEPH version is
 0.80.1

 Will this make problems running mixed CEPH versions ?


 No, but the recommendation is not to have this running for a very long
 period. Try to upgrade all nodes to the same version within a reasonable
 amount of time.


  I intend to upgrade CEPH on exsiting 3 nodes anyway ?
 Recommended steps ?


 Always upgrade the monitors first! Then to the OSDs one by one.

  Thanks

 --

 Andrija Panić


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-03 Thread Andrija Panic
Thanks a lot Wido, will do...

Andrija


On 3 July 2014 13:12, Wido den Hollander w...@42on.com wrote:

 On 07/03/2014 10:59 AM, Andrija Panic wrote:

 Hi Wido, thanks for answers - I have mons and OSD on each host...
 server1: mon + 2 OSDs, same for server2 and server3.

 Any Proposed upgrade path, or just start with 1 server and move along to
 others ?


 Upgrade the packages, but don't restart the daemons yet, then:

 1. Restart the mon leader
 2. Restart the two other mons
 3. Restart all the OSDs one by one

 I suggest that you wait for the cluster to become fully healthy again
 before restarting the next OSD.

 Wido

  Thanks again.
 Andrija


 On 2 July 2014 16:34, Wido den Hollander w...@42on.com
 mailto:w...@42on.com wrote:

 On 07/02/2014 04:08 PM, Andrija Panic wrote:

 Hi,

 I have existing CEPH cluster of 3 nodes, versions 0.72.2

 I'm in a process of installing CEPH on 4th node, but now CEPH
 version is
 0.80.1

 Will this make problems running mixed CEPH versions ?


 No, but the recommendation is not to have this running for a very
 long period. Try to upgrade all nodes to the same version within a
 reasonable amount of time.


 I intend to upgrade CEPH on exsiting 3 nodes anyway ?
 Recommended steps ?


 Always upgrade the monitors first! Then to the OSDs one by one.

 Thanks

 --

 Andrija Panić


 _
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902
 Skype: contact42on
 _
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić



 --
 Wido den Hollander
 Ceph consultant and trainer
 42on B.V.


 Phone: +31 (0)20 700 9902
 Skype: contact42on




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-03 Thread Andrija Panic
Wido,
one final question:
since I compiled libvirt1.2.3 usinfg ceph-devel 0.72 - do I need to
recompile libvirt again now with ceph-devel 0.80 ?

Perhaps not smart question, but need to make sure I don't screw something...
Thanks for your time,
Andrija


On 3 July 2014 14:27, Andrija Panic andrija.pa...@gmail.com wrote:

 Thanks a lot Wido, will do...

 Andrija


 On 3 July 2014 13:12, Wido den Hollander w...@42on.com wrote:

 On 07/03/2014 10:59 AM, Andrija Panic wrote:

 Hi Wido, thanks for answers - I have mons and OSD on each host...
 server1: mon + 2 OSDs, same for server2 and server3.

 Any Proposed upgrade path, or just start with 1 server and move along to
 others ?


 Upgrade the packages, but don't restart the daemons yet, then:

 1. Restart the mon leader
 2. Restart the two other mons
 3. Restart all the OSDs one by one

 I suggest that you wait for the cluster to become fully healthy again
 before restarting the next OSD.

 Wido

  Thanks again.
 Andrija


 On 2 July 2014 16:34, Wido den Hollander w...@42on.com
 mailto:w...@42on.com wrote:

 On 07/02/2014 04:08 PM, Andrija Panic wrote:

 Hi,

 I have existing CEPH cluster of 3 nodes, versions 0.72.2

 I'm in a process of installing CEPH on 4th node, but now CEPH
 version is
 0.80.1

 Will this make problems running mixed CEPH versions ?


 No, but the recommendation is not to have this running for a very
 long period. Try to upgrade all nodes to the same version within a
 reasonable amount of time.


 I intend to upgrade CEPH on exsiting 3 nodes anyway ?
 Recommended steps ?


 Always upgrade the monitors first! Then to the OSDs one by one.

 Thanks

 --

 Andrija Panić


 _
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902
 Skype: contact42on
 _
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić



 --
 Wido den Hollander
 Ceph consultant and trainer
 42on B.V.


 Phone: +31 (0)20 700 9902
 Skype: contact42on




 --

 Andrija Panić
 --
   http://admintweets.com
 --




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-03 Thread Andrija Panic
Thanks again a lot.


On 3 July 2014 15:20, Wido den Hollander w...@42on.com wrote:

 On 07/03/2014 03:07 PM, Andrija Panic wrote:

 Wido,
 one final question:
 since I compiled libvirt1.2.3 usinfg ceph-devel 0.72 - do I need to
 recompile libvirt again now with ceph-devel 0.80 ?

 Perhaps not smart question, but need to make sure I don't screw
 something...


 No, no need to. The librados API didn't change in case you are using RBD
 storage pool support.

 Otherwise it just talks to Qemu and that talks to librbd/librados.

 Wido

  Thanks for your time,
 Andrija


 On 3 July 2014 14:27, Andrija Panic andrija.pa...@gmail.com
 mailto:andrija.pa...@gmail.com wrote:

 Thanks a lot Wido, will do...

 Andrija


 On 3 July 2014 13:12, Wido den Hollander w...@42on.com
 mailto:w...@42on.com wrote:

 On 07/03/2014 10:59 AM, Andrija Panic wrote:

 Hi Wido, thanks for answers - I have mons and OSD on each
 host...
 server1: mon + 2 OSDs, same for server2 and server3.

 Any Proposed upgrade path, or just start with 1 server and
 move along to
 others ?


 Upgrade the packages, but don't restart the daemons yet, then:

 1. Restart the mon leader
 2. Restart the two other mons
 3. Restart all the OSDs one by one

 I suggest that you wait for the cluster to become fully healthy
 again before restarting the next OSD.

 Wido

 Thanks again.
 Andrija


 On 2 July 2014 16:34, Wido den Hollander w...@42on.com
 mailto:w...@42on.com
 mailto:w...@42on.com mailto:w...@42on.com wrote:

  On 07/02/2014 04:08 PM, Andrija Panic wrote:

  Hi,

  I have existing CEPH cluster of 3 nodes, versions
 0.72.2

  I'm in a process of installing CEPH on 4th node,
 but now CEPH
  version is
  0.80.1

  Will this make problems running mixed CEPH versions ?


  No, but the recommendation is not to have this running
 for a very
  long period. Try to upgrade all nodes to the same
 version within a
  reasonable amount of time.


  I intend to upgrade CEPH on exsiting 3 nodes anyway ?
  Recommended steps ?


  Always upgrade the monitors first! Then to the OSDs one
 by one.

  Thanks

  --

  Andrija Panić


  ___

  ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 mailto:ceph-us...@lists.ceph.__com
 mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph._
 ___com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com



 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



  --
  Wido den Hollander
  42on B.V.
  Ceph trainer and consultant

  Phone: +31 (0)20 700 9902
 tel:%2B31%20%280%2920%20700%209902
 tel:%2B31%20%280%2920%20700%__209902
  Skype: contact42on
  ___

  ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 mailto:ceph-us...@lists.ceph.__com
 mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph._
 ___com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com



 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić



 --
 Wido den Hollander
 Ceph consultant and trainer
 42on B.V.


 Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902

 Skype: contact42on




 --

 Andrija Panić
 --
 http://admintweets.com
 --




 --

 Andrija Panić
 --
 http://admintweets.com
 --



 --
 Wido den Hollander
 Ceph consultant and trainer
 42on B.V.

 Phone: +31 (0)20 700 9902
 Skype: contact42on




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users

[ceph-users] [URGENT]. Can't connect to CEPH after upgrade from 0.72 to 0.80

2014-07-12 Thread Andrija Panic
Hi,

Sorry to bother, but I have urgent situation: upgraded CEPH from 0.72 to
0.80 (centos 6.5), and now all my CloudStack HOSTS can not connect.

I did basic yum update ceph on the first MON leader, and all CEPH
services on that HOST, have been restarted - done same on other CEPH nodes
(I have 1MON + 2 OSD per physical host), then I have set variables to
optimal with ceph osd crush tunables optimal and after some rebalancing,
ceph shows HEALTH_OK.

Also, I can create new images with qemu-img -f rbd rbd:/cloudstack

Libvirt 1.2.3 was compiled while ceph was 0.72, but I got instructions from
Wido that I don't need to REcompile now with ceph 0.80...

Libvirt logs:

libvirt: Storage Driver error : Storage pool not found: no storage pool
with matching uuid ‡ÎhyšJŠ~`a*×

Note there are some strange uuid - not sure what is happening ?

Did I forget to do something after CEPH upgrade ?


Any help will be VERY much appriciated...
Andrija
-- 

Andrija Panić
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [URGENT]. Can't connect to CEPH after upgrade from 0.72 to 0.80

2014-07-13 Thread Andrija Panic
Hi Mark,
actually, CEPH is running fine, and I have deployed NEW host (new compile
libvirt with ceph 0.8 devel, and newer kernel) - and it works... so
migrating some VMs to this new host...

I have 3 physical hosts, that are both MON and 2x OSD per host, all3 don't
work-cloudstack/libvirt...

Any suggestion on need to recompile libvirt ? I got info from Wido, that
libvirt does NOT need to be recompiled


Best


On 13 July 2014 08:35, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:

 On 13/07/14 17:07, Andrija Panic wrote:

 Hi,

 Sorry to bother, but I have urgent situation: upgraded CEPH from 0.72 to
 0.80 (centos 6.5), and now all my CloudStack HOSTS can not connect.

 I did basic yum update ceph on the first MON leader, and all CEPH
 services on that HOST, have been restarted - done same on other CEPH
 nodes (I have 1MON + 2 OSD per physical host), then I have set variables
 to optimal with ceph osd crush tunables optimal and after some
 rebalancing, ceph shows HEALTH_OK.

 Also, I can create new images with qemu-img -f rbd rbd:/cloudstack

 Libvirt 1.2.3 was compiled while ceph was 0.72, but I got instructions
 from Wido that I don't need to REcompile now with ceph 0.80...

 Libvirt logs:

 libvirt: Storage Driver error : Storage pool not found: no storage pool
 with matching uuid ‡ÎhyšJŠ~`a*×

 Note there are some strange uuid - not sure what is happening ?

 Did I forget to do something after CEPH upgrade ?


 Have you got any ceph logs to examine on the host running libvirt? When I
 try to connect a v0.72 client to v0.81 cluster I get:

 2014-07-13 18:21:23.860898 7fc3bd2ca700  0 -- 192.168.122.41:0/1002012 
 192.168.122.21:6789/0 pipe(0x7fc3c00241f0 sd=3 :49451 s=1 pgs=0 cs=0 l=1
 c=0x7fc3c0024450).connect protocol feature mismatch, my f  peer
 5f missing 50

 Regards

 Mark




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [URGENT]. Can't connect to CEPH after upgrade from 0.72 to 0.80

2014-07-13 Thread Andrija Panic
Hi Mark,

update:

after restarting libvirtd and cloudstack-agent and management server God
know how many times - it WORKS now !

Not sure what is happening here, but it works again... I know for sure it
was not CEPH cluster, since it was fine, and accessible via qemu-img, etc...

Thanks Mark for your time for my issue...
Best.
Andrija




On 13 July 2014 10:20, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:

 On 13/07/14 19:15, Mark Kirkwood wrote:

 On 13/07/14 18:38, Andrija Panic wrote:


  Any suggestion on need to recompile libvirt ? I got info from Wido, that
 libvirt does NOT need to be recompiled


 Thinking about this a bit more - Wido *may* have meant:

 - *libvirt* does not need to be rebuild
 - ...but you need to get/build a later ceph client i.e - 0.80

 Of course depending on how your libvirt build was set up (e.g static
 linkage), this *might* have meant you needed to rebuild it too.

 Regards

 Mark




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-13 Thread Andrija Panic
Hi,

after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush
tunables optimal and after only few minutes I have added 2 more OSDs to
the CEPH cluster...

So these 2 changes were more or a less done at the same time - rebalancing
because of tunables optimal, and rebalancing because of adding new OSD...

Result - all VMs living on CEPH storage have gone mad, no disk access
efectively, blocked so to speak.

Since this rebalancing took 5h-6h, I had bunch of VMs down for that long...

Did I do wrong by causing 2 rebalancing to happen at the same time ?
Is this behaviour normal, to cause great load on all VMs because they are
unable to access CEPH storage efectively ?

Thanks for any input...
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-13 Thread Andrija Panic
Hi Wido,

you said previously:
  Upgrade the packages, but don't restart the daemons yet, then:
  1. Restart the mon leader
  2. Restart the two other mons
  3. Restart all the OSDs one by one

But in reality (yum update or by using ceph-deploy install nodename) -
the package manager does restart ALL ceph services on that node by its
own...
So, I have upgraded - MON leader and 2 OSD on this 1st upgraded host were
restarted, folowed by doing the same with other 2 servers (1 MON peon and 2
OSD per host).

Is this perhaps a package (RPM) bug - restarting daemons automatically ?
Since it makes sense to have all MONs updated first, and than OSD (and
perhaps after that MDS if using it...)

Upgraded to 0.80.3 release btw.

Thanks for your help again.
Andrija



On 3 July 2014 15:21, Andrija Panic andrija.pa...@gmail.com wrote:

 Thanks again a lot.


 On 3 July 2014 15:20, Wido den Hollander w...@42on.com wrote:

 On 07/03/2014 03:07 PM, Andrija Panic wrote:

 Wido,
 one final question:
 since I compiled libvirt1.2.3 usinfg ceph-devel 0.72 - do I need to
 recompile libvirt again now with ceph-devel 0.80 ?

 Perhaps not smart question, but need to make sure I don't screw
 something...


 No, no need to. The librados API didn't change in case you are using RBD
 storage pool support.

 Otherwise it just talks to Qemu and that talks to librbd/librados.

 Wido

  Thanks for your time,
 Andrija


 On 3 July 2014 14:27, Andrija Panic andrija.pa...@gmail.com
 mailto:andrija.pa...@gmail.com wrote:

 Thanks a lot Wido, will do...

 Andrija


 On 3 July 2014 13:12, Wido den Hollander w...@42on.com
 mailto:w...@42on.com wrote:

 On 07/03/2014 10:59 AM, Andrija Panic wrote:

 Hi Wido, thanks for answers - I have mons and OSD on each
 host...
 server1: mon + 2 OSDs, same for server2 and server3.

 Any Proposed upgrade path, or just start with 1 server and
 move along to
 others ?


 Upgrade the packages, but don't restart the daemons yet, then:

 1. Restart the mon leader
 2. Restart the two other mons
 3. Restart all the OSDs one by one

 I suggest that you wait for the cluster to become fully healthy
 again before restarting the next OSD.

 Wido

 Thanks again.
 Andrija


 On 2 July 2014 16:34, Wido den Hollander w...@42on.com
 mailto:w...@42on.com
 mailto:w...@42on.com mailto:w...@42on.com wrote:

  On 07/02/2014 04:08 PM, Andrija Panic wrote:

  Hi,

  I have existing CEPH cluster of 3 nodes, versions
 0.72.2

  I'm in a process of installing CEPH on 4th node,
 but now CEPH
  version is
  0.80.1

  Will this make problems running mixed CEPH versions
 ?


  No, but the recommendation is not to have this running
 for a very
  long period. Try to upgrade all nodes to the same
 version within a
  reasonable amount of time.


  I intend to upgrade CEPH on exsiting 3 nodes anyway
 ?
  Recommended steps ?


  Always upgrade the monitors first! Then to the OSDs one
 by one.

  Thanks

  --

  Andrija Panić


  ___

  ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 mailto:ceph-us...@lists.ceph.__com
 mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph._
 ___com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com



 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



  --
  Wido den Hollander
  42on B.V.
  Ceph trainer and consultant

  Phone: +31 (0)20 700 9902
 tel:%2B31%20%280%2920%20700%209902
 tel:%2B31%20%280%2920%20700%__209902
  Skype: contact42on
  ___

  ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 mailto:ceph-us...@lists.ceph.__com
 mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph._
 ___com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com



 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
 http://lists.ceph.com/listinfo.cgi/ceph

Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Andrija Panic
Hi Andrei, nice to meet you again ;)

Thanks for sharing this info with me - I though it was my mistake by
introducing new OSD components at the same time - I though that since it's
rebalancing, let's add those new OSD, so it also rebalances - so I don't
have to cause 2 data rebalancing  - but during normal OSD restart and data
rebalancing (I did not set osd noout etc...) I did have somehat lower VM
performacne, but it was all UP and fine.

Also 30% of data moving during my upgrade/tunables change... although
documents say 10% as you said.

Did not lost any data, but finding all VMs that use CEPH as storage, is
somewhat PITA...

So, any CEPH developers input would be greatly appriciated...

Thanks agan for such detailed info,
Andrija





On 14 July 2014 10:52, Andrei Mikhailovsky and...@arhont.com wrote:

 Hi Andrija,

 I've got at least two more stories of similar nature. One is my friend
 running a ceph cluster and one is from me. Both of our clusters are pretty
 small. My cluster has only two osd servers with 8 osds each, 3 mons. I have
 an ssd journal per 4 osds. My friend has a cluster of 3 mons and 3 osd
 servers with 4 osds each and an ssd per 4 osds as well. Both clusters are
 connected with 40gbit/s IP over Infiniband links.

 We had the same issue while upgrading to firefly. However, we did not add
 any new disks, just ran the ceph osd crush tunables optimal command after
 following an upgrade.

 Both of our clusters were down as far as the virtual machines are
 concerned. All vms have crashed because of the lack of IO. It was a bit
 problematic, taking into account that ceph is typically so great at staying
 alive during failures and upgrades. So, there seems to be a problem with
 the upgrade. I wish devs would have added a big note in red letters that if
 you run this command it will likely affect your cluster performance and
 most likely all your vms will die. So, please shutdown your vms if you do
 not want to have data loss.

 I've changed the default values to reduce the load during recovery and
 also to tune a few things performance wise. My settings were:

 osd recovery max chunk = 8388608

 osd recovery op priority = 2

 osd max backfills = 1

 osd recovery max active = 1

 osd recovery threads = 1

 osd disk threads = 2

 filestore max sync interval = 10

 filestore op threads = 20

 filestore_flusher = false

 However, this didn't help much and i've noticed that shortly after running
 the tunnables command my guest vms iowait has quickly jumped to 50% and a
 to 99% a minute after. This has happened on all vms at once. During the
 recovery phase I ran the rbd -p poolname ls -l command several times
 and it took between 20-40 minutes to complete. It typically takes less than
 2 seconds when the cluster is not in recovery mode.

 My mate's cluster had the same tunables apart from the last three. He had
 exactly the same behaviour.

 One other thing that i've noticed is that somewhere in the docs I've read
 that running the tunnable optimal command should move not more than 10% of
 your data. However, in both of our cases our status was just over 30%
 degraded and it took a good part of 9 hours to complete the data
 reshuffling.


 Any comments from the ceph team or other ceph gurus on:

 1. What have we done wrong in our upgrade  process
 2. What options should we have used to keep our vms alive


 Cheers

 Andrei




 --
 *From: *Andrija Panic andrija.pa...@gmail.com
 *To: *ceph-users@lists.ceph.com
 *Sent: *Sunday, 13 July, 2014 9:54:17 PM
 *Subject: *[ceph-users] ceph osd crush tunables optimal AND add new OSD
 at thesame time


 Hi,

 after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush
 tunables optimal and after only few minutes I have added 2 more OSDs to
 the CEPH cluster...

 So these 2 changes were more or a less done at the same time - rebalancing
 because of tunables optimal, and rebalancing because of adding new OSD...

 Result - all VMs living on CEPH storage have gone mad, no disk access
 efectively, blocked so to speak.

 Since this rebalancing took 5h-6h, I had bunch of VMs down for that long...

 Did I do wrong by causing 2 rebalancing to happen at the same time ?
 Is this behaviour normal, to cause great load on all VMs because they are
 unable to access CEPH storage efectively ?

 Thanks for any input...
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Andrija Panic
Perhaps here: http://ceph.com/releases/v0-80-firefly-released/
Thanks


On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote:

 I've added some additional notes/warnings to the upgrade and release
 notes:


 https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451

 If there is somewhere else where you think a warning flag would be useful,
 let me know!

 Generally speaking, we want to be able to cope with huge data rebalances
 without interrupting service.  It's an ongoing process of improving the
 recovery vs client prioritization, though, and removing sources of
 overhead related to rebalancing... and it's clearly not perfect yet. :/

 sage


 On Sun, 13 Jul 2014, Andrija Panic wrote:

  Hi,
  after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd
 crush
  tunables optimal and after only few minutes I have added 2 more OSDs to
 the
  CEPH cluster...
 
  So these 2 changes were more or a less done at the same time -
 rebalancing
  because of tunables optimal, and rebalancing because of adding new OSD...
 
  Result - all VMs living on CEPH storage have gone mad, no disk access
  efectively, blocked so to speak.
 
  Since this rebalancing took 5h-6h, I had bunch of VMs down for that
 long...
 
  Did I do wrong by causing 2 rebalancing to happen at the same time ?
  Is this behaviour normal, to cause great load on all VMs because they are
  unable to access CEPH storage efectively ?
 
  Thanks for any input...
  --
 
  Andrija Pani?
 
 




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Andrija Panic
Udo, I had all VMs completely unoperational - so don't set optimal for
now...


On 14 July 2014 20:48, Udo Lembke ulem...@polarzone.de wrote:

 Hi,
 which values are all changed with ceph osd crush tunables optimal?

 Is it perhaps possible to change some parameter the weekends before the
 upgrade is running, to have more time?
 (depends if the parameter are available in 0.72...).

 The warning told, it's can take days... we have an cluster with 5
 storage node and 12 4TB-osd-disk each (60 osd), replica 2. The cluster
 is 60% filled.
 Networkconnection 10Gb.
 Takes tunables optimal in such an configuration one, two or more days?

 Udo

 On 14.07.2014 18:18, Sage Weil wrote:
  I've added some additional notes/warnings to the upgrade and release
  notes:
 
 
 https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451
 
  If there is somewhere else where you think a warning flag would be
 useful,
  let me know!
 
  Generally speaking, we want to be able to cope with huge data rebalances
  without interrupting service.  It's an ongoing process of improving the
  recovery vs client prioritization, though, and removing sources of
  overhead related to rebalancing... and it's clearly not perfect yet. :/
 
  sage
 
 
 

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-15 Thread Andrija Panic
Hi Sage,

since this problem is tunables-related, do we need to expect same behavior
or not  when we do regular data rebalancing caused by adding new/removing
OSD? I guess not, but would like your confirmation.
I'm already on optimal tunables, but I'm afraid to test this by i.e.
shuting down 1 OSD.

Thanks,
Andrija


On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote:

 I've added some additional notes/warnings to the upgrade and release
 notes:


 https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451

 If there is somewhere else where you think a warning flag would be useful,
 let me know!

 Generally speaking, we want to be able to cope with huge data rebalances
 without interrupting service.  It's an ongoing process of improving the
 recovery vs client prioritization, though, and removing sources of
 overhead related to rebalancing... and it's clearly not perfect yet. :/

 sage


 On Sun, 13 Jul 2014, Andrija Panic wrote:

  Hi,
  after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd
 crush
  tunables optimal and after only few minutes I have added 2 more OSDs to
 the
  CEPH cluster...
 
  So these 2 changes were more or a less done at the same time -
 rebalancing
  because of tunables optimal, and rebalancing because of adding new OSD...
 
  Result - all VMs living on CEPH storage have gone mad, no disk access
  efectively, blocked so to speak.
 
  Since this rebalancing took 5h-6h, I had bunch of VMs down for that
 long...
 
  Did I do wrong by causing 2 rebalancing to happen at the same time ?
  Is this behaviour normal, to cause great load on all VMs because they are
  unable to access CEPH storage efectively ?
 
  Thanks for any input...
  --
 
  Andrija Pani?
 
 




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.4 Firefly released

2014-07-16 Thread Andrija Panic
Hi Sage,

can anyone validate, if there is still bug inside RPMs that does
automatic CEPH service restart after updating packages ?

We are instructed to first update/restart MONs, and after that OSD - but
that is impossible if we have MON+OSDs on same host...since the ceph is
automaticaly restarted with YUM/RPM, but NOT automaticaly restarted on
Ubuntu/Debian (as reported by some other list memeber...)

Thanks


On 16 July 2014 01:45, Sage Weil s...@inktank.com wrote:

 This Firefly point release fixes an potential data corruption problem
 when ceph-osd daemons run on top of XFS and service Firefly librbd
 clients.  A recently added allocation hint that RBD utilizes triggers
 an XFS bug on some kernels (Linux 3.2, and likely others) that leads
 to data corruption and deep-scrub errors (and inconsistent PGs).  This
 release avoids the situation by disabling the allocation hint until we
 can validate which kernels are affected and/or are known to be safe to
 use the hint on.

 We recommend that all v0.80.x Firefly users urgently upgrade,
 especially if they are using RBD.

 Notable Changes
 ---

 * osd: disable XFS extsize hint by default (#8830, Samuel Just)
 * rgw: fix extra data pool default name (Yehuda Sadeh)

 For more detailed information, see:

   http://ceph.com/docs/master/_downloads/v0.80.4.txt

 Getting Ceph
 

 * Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.80.4.tar.gz
 * For packages, see http://ceph.com/docs/master/install/get-packages
 * For ceph-deploy, see
 http://ceph.com/docs/master/install/install-ceph-deploy

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-16 Thread Andrija Panic
For me, 3 nodes, 1MON+ 2x2TB OSDs on each node... no mds used...
I went through pain of waiting for data rebalancing and now I'm on
optimal tunables...
Cheers


On 16 July 2014 14:29, Andrei Mikhailovsky and...@arhont.com wrote:

 Quenten,

 We've got two monitors sitting on the osd servers and one on a different
 server.

 Andrei

 --
 Andrei Mikhailovsky
 Director
 Arhont Information Security

 Web: http://www.arhont.com
 http://www.wi-foo.com
 Tel: +44 (0)870 4431337
 Fax: +44 (0)208 429 3111
 PGP: Key ID - 0x2B3438DE
 PGP: Server - keyserver.pgp.com

 DISCLAIMER

 The information contained in this email is intended only for the use of
 the person(s) to whom it is addressed and may be confidential or contain
 legally privileged information. If you are not the intended recipient you
 are hereby notified that any perusal, use, distribution, copying or
 disclosure is strictly prohibited. If you have received this email in error
 please immediately advise us by return email at and...@arhont.com and
 delete and purge the email and any attachments without making a copy.


 --
 *From: *Quenten Grasso qgra...@onq.com.au
 *To: *Andrija Panic andrija.pa...@gmail.com, Sage Weil 
 sw...@redhat.com
 *Cc: *ceph-users@lists.ceph.com
 *Sent: *Wednesday, 16 July, 2014 1:20:19 PM

 *Subject: *Re: [ceph-users] ceph osd crush tunables optimal AND add new
 OSD at the same time

 Hi Sage, Andrija  List



 I have seen the tuneables issue on our cluster when I upgraded to firefly.



 I ended up going back to legacy settings after about an hour as my cluster
 is of 55 3TB OSD’s over 5 nodes and it decided it needed to move around 32%
 of our data, which after an hour all of our vm’s were frozen and I had to
 revert the change back to legacy settings and wait about the same time
 again until our cluster had recovered and reboot our vms. (wasn’t really
 expecting that one from the patch notes)



 Also our CPU usage went through the roof as well on our nodes, do you per
 chance have your metadata servers co-located on your osd nodes as we do?
  I’ve been thinking about trying to move these to dedicated nodes as it may
 resolve our issues.



 Regards,

 Quenten



 *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
 Of *Andrija Panic
 *Sent:* Tuesday, 15 July 2014 8:38 PM
 *To:* Sage Weil
 *Cc:* ceph-users@lists.ceph.com
 *Subject:* Re: [ceph-users] ceph osd crush tunables optimal AND add new
 OSD at the same time



 Hi Sage,



 since this problem is tunables-related, do we need to expect same behavior
 or not  when we do regular data rebalancing caused by adding new/removing
 OSD? I guess not, but would like your confirmation.

 I'm already on optimal tunables, but I'm afraid to test this by i.e.
 shuting down 1 OSD.



 Thanks,
 Andrija



 On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote:

 I've added some additional notes/warnings to the upgrade and release
 notes:


 https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451

 If there is somewhere else where you think a warning flag would be useful,
 let me know!

 Generally speaking, we want to be able to cope with huge data rebalances
 without interrupting service.  It's an ongoing process of improving the
 recovery vs client prioritization, though, and removing sources of
 overhead related to rebalancing... and it's clearly not perfect yet. :/

 sage



 On Sun, 13 Jul 2014, Andrija Panic wrote:

  Hi,
  after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd
 crush
  tunables optimal and after only few minutes I have added 2 more OSDs to
 the
  CEPH cluster...
 
  So these 2 changes were more or a less done at the same time -
 rebalancing
  because of tunables optimal, and rebalancing because of adding new OSD...
 
  Result - all VMs living on CEPH storage have gone mad, no disk access
  efectively, blocked so to speak.
 
  Since this rebalancing took 5h-6h, I had bunch of VMs down for that
 long...
 
  Did I do wrong by causing 2 rebalancing to happen at the same time ?
  Is this behaviour normal, to cause great load on all VMs because they are
  unable to access CEPH storage efectively ?
 
  Thanks for any input...
  --
 

  Andrija Pani?
 
 





 --



 Andrija Panić

 --

   http://admintweets.com

 --

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Hi,

we just had some new clients, and have suffered very big degradation in
CEPH performance for some reasons (we are using CloudStack).

I'm wondering if there is way to monitor OP/s or similar usage by client
connected, so we can isolate the heavy client ?

Also, what is the general best practice to monitor these kind of changes in
CEPH ? I'm talking about R/W or OP/s change or similar...

Thanks,
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer
some precise OP/s per ceph Image at least...
Will check CloudStack then...

Thx


On 8 August 2014 13:53, Wido den Hollander w...@42on.com wrote:

 On 08/08/2014 01:51 PM, Andrija Panic wrote:

 Hi,

 we just had some new clients, and have suffered very big degradation in
 CEPH performance for some reasons (we are using CloudStack).

 I'm wondering if there is way to monitor OP/s or similar usage by client
 connected, so we can isolate the heavy client ?


 This is not very easy to do with Ceph, but CloudStack keeps track of this
 in the usage database.

 With never versions of CloudStack you can also limit the IOps of Instances
 to prevent such situations.

  Also, what is the general best practice to monitor these kind of changes
 in CEPH ? I'm talking about R/W or OP/s change or similar...

 Thanks,
 --

 Andrija Panić



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Hm, true...
One final question, I might be a noob...
13923 B/s rd, 4744 kB/s wr, 1172 op/s
what does this op/s represent - is it classic IOps (4k reads/writes) or
something else ? how much is too much :)  - I'm familiar with SATA/SSD IO/s
specs/tests, etc, but not sure what CEPH menas by op/s - could not find
anything with google...

Thanks again Wido.
Andrija


On 8 August 2014 14:07, Wido den Hollander w...@42on.com wrote:

 On 08/08/2014 02:02 PM, Andrija Panic wrote:

 Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer
 some precise OP/s per ceph Image at least...
 Will check CloudStack then...


 Ceph doesn't really know that since RBD is just a layer on top of RADOS.
 In the end the CloudStack hypervisors are doing I/O towards RADOS objects,
 so giving exact stats of how many IOps you are seeing per image is hard to
 figure out.

 The hypervisor knows this best since it sees all the I/O going through.

 Wido

  Thx


 On 8 August 2014 13:53, Wido den Hollander w...@42on.com
 mailto:w...@42on.com wrote:

 On 08/08/2014 01:51 PM, Andrija Panic wrote:

 Hi,

 we just had some new clients, and have suffered very big
 degradation in
 CEPH performance for some reasons (we are using CloudStack).

 I'm wondering if there is way to monitor OP/s or similar usage
 by client
 connected, so we can isolate the heavy client ?


 This is not very easy to do with Ceph, but CloudStack keeps track of
 this in the usage database.

 With never versions of CloudStack you can also limit the IOps of
 Instances to prevent such situations.

 Also, what is the general best practice to monitor these kind of
 changes
 in CEPH ? I'm talking about R/W or OP/s change or similar...

 Thanks,
 --

 Andrija Panić



 _
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902
 Skype: contact42on
 _
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić
 --
 http://admintweets.com
 --



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Hi Dan,

thank you very much for the script, will check it out...no thortling so
far, but I guess it will have to be done...

This seems to read only gziped logs? so since read only I guess it is safe
to run it on proudction cluster now... ?
The script will also check for mulitply OSDs as far as I can understadn,
not just osd.0 given in script comment ?

Thanks a lot.
Andrija




On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote:

  Hi,
 Here’s what we do to identify our top RBD users.

  First, enable log level 10 for the filestore so you can see all the IOs
 coming from the VMs. Then use a script like this (used on a dumpling
 cluster):


 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

  to summarize the osd logs and identify the top clients.

  Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who is
 responsible for a new peak in overall ops — and daily-granular statistics
 from the above script tends to suffice.

  BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.

  Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


  On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote:

  Hi,

  we just had some new clients, and have suffered very big degradation in
 CEPH performance for some reasons (we are using CloudStack).

  I'm wondering if there is way to monitor OP/s or similar usage by client
 connected, so we can isolate the heavy client ?

  Also, what is the general best practice to monitor these kind of changes
 in CEPH ? I'm talking about R/W or OP/s change or similar...

  Thanks,
 --

 Andrija Panić

   ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Thanks again, and btw, beside being Friday I'm also on vacation - so double
the joy of troubleshooting performance problmes :)))

Thx :)


On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote:

  Hi,

  On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote:

  Hi Dan,

  thank you very much for the script, will check it out...no thortling so
 far, but I guess it will have to be done...

  This seems to read only gziped logs?


  Well it’s pretty simple, and it zcat’s each input file. So yes, only gz
 files in the current script. But you can change that pretty trivially ;)

  so since read only I guess it is safe to run it on proudction cluster
 now… ?


  I personally don’t do anything new on a Friday just before leaving ;)

  But its just grepping the log files, so start with one, then two, then...

   The script will also check for mulitply OSDs as far as I can
 understadn, not just osd.0 given in script comment ?


  Yup, what I do is gather all of the OSD logs for a single day in a
 single directory (in CephFS ;), then run that script on all of the OSDs. It
 takes awhile, but it will give you the overall daily totals for the whole
 cluster.

  If you are only trying to find the top users, then it is sufficient to
 check a subset of OSDs, since by their nature the client IOs are spread
 across most/all OSDs.

  Cheers, Dan

  Thanks a lot.
 Andrija




 On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

 Hi,
 Here’s what we do to identify our top RBD users.

  First, enable log level 10 for the filestore so you can see all the IOs
 coming from the VMs. Then use a script like this (used on a dumpling
 cluster):


 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

  to summarize the osd logs and identify the top clients.

  Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who is
 responsible for a new peak in overall ops — and daily-granular statistics
 from the above script tends to suffice.

  BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.

  Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


   On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com
 wrote:

Hi,

  we just had some new clients, and have suffered very big degradation in
 CEPH performance for some reasons (we are using CloudStack).

  I'm wondering if there is way to monitor OP/s or similar usage by
 client connected, so we can isolate the heavy client ?

  Also, what is the general best practice to monitor these kind of
 changes in CEPH ? I'm talking about R/W or OP/s change or similar...

  Thanks,
 --

 Andrija Panić

___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





  --

 Andrija Panić
 --
   http://admintweets.com
 --





-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Will do so definitively, thanks Wido and Dan...
Cheers guys


On 8 August 2014 16:13, Wido den Hollander w...@42on.com wrote:

 On 08/08/2014 03:44 PM, Dan Van Der Ster wrote:

 Hi,
 Here’s what we do to identify our top RBD users.

 First, enable log level 10 for the filestore so you can see all the IOs
 coming from the VMs. Then use a script like this (used on a dumpling
 cluster):

 https://github.com/cernceph/ceph-scripts/blob/master/
 tools/rbd-io-stats.pl

 to summarize the osd logs and identify the top clients.

 Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who
 is responsible for a new peak in overall ops — and daily-granular
 statistics from the above script tends to suffice.

 BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.


 +1

 I'd strongly advise to set I/O limits for Instances. I've had multiple
 occasions where a runaway script inside a VM was hammering on the
 underlying storage killing all I/O.

 Not only with Ceph, but over the many years I've worked with storage. I/O
 == expensive

 CloudStack supports I/O limiting, so I recommend you set a limit. Set it
 to 750 write IOps for example. That way one Instance can't kill the whole
 cluster, but it still has enough I/O to run. (usually).

 Wido


 Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


 On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com
 mailto:andrija.pa...@gmail.com wrote:

  Hi,

 we just had some new clients, and have suffered very big degradation
 in CEPH performance for some reasons (we are using CloudStack).

 I'm wondering if there is way to monitor OP/s or similar usage by
 client connected, so we can isolate the heavy client ?

 Also, what is the general best practice to monitor these kind of
 changes in CEPH ? I'm talking about R/W or OP/s change or similar...

 Thanks,
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-11 Thread Andrija Panic
Hi Dan,

the script provided seems to not work on my ceph cluster :(
This is ceph version 0.80.3

I get empty results, on both debug level 10 and the maximum level of 20...

[root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz
Writes per OSD:
Writes per pool:
Writes per PG:
Writes per RBD:
Writes per object:
Writes per length:
.
.
.




On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote:

  Hi,

  On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote:

  Hi Dan,

  thank you very much for the script, will check it out...no thortling so
 far, but I guess it will have to be done...

  This seems to read only gziped logs?


  Well it’s pretty simple, and it zcat’s each input file. So yes, only gz
 files in the current script. But you can change that pretty trivially ;)

  so since read only I guess it is safe to run it on proudction cluster
 now… ?


  I personally don’t do anything new on a Friday just before leaving ;)

  But its just grepping the log files, so start with one, then two, then...

   The script will also check for mulitply OSDs as far as I can
 understadn, not just osd.0 given in script comment ?


  Yup, what I do is gather all of the OSD logs for a single day in a
 single directory (in CephFS ;), then run that script on all of the OSDs. It
 takes awhile, but it will give you the overall daily totals for the whole
 cluster.

  If you are only trying to find the top users, then it is sufficient to
 check a subset of OSDs, since by their nature the client IOs are spread
 across most/all OSDs.

  Cheers, Dan

  Thanks a lot.
 Andrija




 On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

 Hi,
 Here’s what we do to identify our top RBD users.

  First, enable log level 10 for the filestore so you can see all the IOs
 coming from the VMs. Then use a script like this (used on a dumpling
 cluster):


 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

  to summarize the osd logs and identify the top clients.

  Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who is
 responsible for a new peak in overall ops — and daily-granular statistics
 from the above script tends to suffice.

  BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.

  Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


   On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com
 wrote:

Hi,

  we just had some new clients, and have suffered very big degradation in
 CEPH performance for some reasons (we are using CloudStack).

  I'm wondering if there is way to monitor OP/s or similar usage by
 client connected, so we can isolate the heavy client ?

  Also, what is the general best practice to monitor these kind of
 changes in CEPH ? I'm talking about R/W or OP/s change or similar...

  Thanks,
 --

 Andrija Panić

___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





  --

 Andrija Panić
 --
   http://admintweets.com
 --





-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-11 Thread Andrija Panic
I appologize, clicked the Send button to fast...

Anyway, I can see there are lines in log file:
2014-08-11 12:43:25.477693 7f022d257700 10
filestore(/var/lib/ceph/osd/ceph-0) write
3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3
3641344~4608 = 4608
Not sure if I can do anything to fix this... ?

Thanks,
Andrija



On 11 August 2014 12:46, Andrija Panic andrija.pa...@gmail.com wrote:

 Hi Dan,

 the script provided seems to not work on my ceph cluster :(
 This is ceph version 0.80.3

 I get empty results, on both debug level 10 and the maximum level of 20...

 [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz
 Writes per OSD:
 Writes per pool:
 Writes per PG:
 Writes per RBD:
 Writes per object:
 Writes per length:
 .
 .
 .




 On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

  Hi,

  On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote:

  Hi Dan,

  thank you very much for the script, will check it out...no thortling so
 far, but I guess it will have to be done...

  This seems to read only gziped logs?


  Well it’s pretty simple, and it zcat’s each input file. So yes, only gz
 files in the current script. But you can change that pretty trivially ;)

  so since read only I guess it is safe to run it on proudction cluster
 now… ?


  I personally don’t do anything new on a Friday just before leaving ;)

  But its just grepping the log files, so start with one, then two,
 then...

   The script will also check for mulitply OSDs as far as I can
 understadn, not just osd.0 given in script comment ?


  Yup, what I do is gather all of the OSD logs for a single day in a
 single directory (in CephFS ;), then run that script on all of the OSDs. It
 takes awhile, but it will give you the overall daily totals for the whole
 cluster.

  If you are only trying to find the top users, then it is sufficient to
 check a subset of OSDs, since by their nature the client IOs are spread
 across most/all OSDs.

  Cheers, Dan

  Thanks a lot.
 Andrija




 On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

 Hi,
 Here’s what we do to identify our top RBD users.

  First, enable log level 10 for the filestore so you can see all the
 IOs coming from the VMs. Then use a script like this (used on a dumpling
 cluster):


 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

  to summarize the osd logs and identify the top clients.

  Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who is
 responsible for a new peak in overall ops — and daily-granular statistics
 from the above script tends to suffice.

  BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.

  Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


   On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com
 wrote:

Hi,

  we just had some new clients, and have suffered very big degradation
 in CEPH performance for some reasons (we are using CloudStack).

  I'm wondering if there is way to monitor OP/s or similar usage by
 client connected, so we can isolate the heavy client ?

  Also, what is the general best practice to monitor these kind of
 changes in CEPH ? I'm talking about R/W or OP/s change or similar...

  Thanks,
 --

 Andrija Panić

___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





  --

 Andrija Panić
 --
   http://admintweets.com
 --





 --

 Andrija Panić
 --
   http://admintweets.com
 --




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-11 Thread Andrija Panic
That's better :D

Thanks a lot, now I will be able to troubleshoot my problem :)

Thanks Dan,
Andrija


On 11 August 2014 13:21, Dan Van Der Ster daniel.vanders...@cern.ch wrote:

  Hi,
 I changed the script to be a bit more flexible with the osd path. Give
 this a try again:
 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
 Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


  On 11 Aug 2014, at 12:48, Andrija Panic andrija.pa...@gmail.com wrote:

  I appologize, clicked the Send button to fast...

  Anyway, I can see there are lines in log file:
 2014-08-11 12:43:25.477693 7f022d257700 10
 filestore(/var/lib/ceph/osd/ceph-0) write
 3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3
 3641344~4608 = 4608
  Not sure if I can do anything to fix this... ?

  Thanks,
 Andrija



 On 11 August 2014 12:46, Andrija Panic andrija.pa...@gmail.com wrote:

 Hi Dan,

  the script provided seems to not work on my ceph cluster :(
 This is ceph version 0.80.3

  I get empty results, on both debug level 10 and the maximum level of
 20...

  [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz
 Writes per OSD:
 Writes per pool:
  Writes per PG:
  Writes per RBD:
  Writes per object:
  Writes per length:
  .
  .
 .




 On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

 Hi,

  On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com
 wrote:

  Hi Dan,

  thank you very much for the script, will check it out...no thortling
 so far, but I guess it will have to be done...

  This seems to read only gziped logs?


  Well it’s pretty simple, and it zcat’s each input file. So yes, only
 gz files in the current script. But you can change that pretty trivially ;)

  so since read only I guess it is safe to run it on proudction cluster
 now… ?


  I personally don’t do anything new on a Friday just before leaving ;)

  But its just grepping the log files, so start with one, then two,
 then...

   The script will also check for mulitply OSDs as far as I can
 understadn, not just osd.0 given in script comment ?


  Yup, what I do is gather all of the OSD logs for a single day in a
 single directory (in CephFS ;), then run that script on all of the OSDs. It
 takes awhile, but it will give you the overall daily totals for the whole
 cluster.

  If you are only trying to find the top users, then it is sufficient to
 check a subset of OSDs, since by their nature the client IOs are spread
 across most/all OSDs.

  Cheers, Dan

  Thanks a lot.
 Andrija




 On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

 Hi,
 Here’s what we do to identify our top RBD users.

  First, enable log level 10 for the filestore so you can see all the
 IOs coming from the VMs. Then use a script like this (used on a dumpling
 cluster):


 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

  to summarize the osd logs and identify the top clients.

  Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who is
 responsible for a new peak in overall ops — and daily-granular statistics
 from the above script tends to suffice.

  BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.

  Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


   On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com
 wrote:

Hi,

  we just had some new clients, and have suffered very big degradation
 in CEPH performance for some reasons (we are using CloudStack).

  I'm wondering if there is way to monitor OP/s or similar usage by
 client connected, so we can isolate the heavy client ?

  Also, what is the general best practice to monitor these kind of
 changes in CEPH ? I'm talking about R/W or OP/s change or similar...

  Thanks,
 --

 Andrija Panić

___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





  --

 Andrija Panić
 --
   http://admintweets.com
 --





  --

 Andrija Panić
 --
   http://admintweets.com
 --




  --

 Andrija Panić
 --
   http://admintweets.com
 --





-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multiply OSDs per host strategy ?

2013-10-16 Thread Andrija Panic
Hi,

I have 2 x  2TB disks, in 3 servers, so total of 6 disks... I have deployed
total of 6 OSDs.
ie:
host1 = osd.0 and osd.1
host2 = osd.2 and osd.3
host4 = osd.4 and osd.5

Now, since I will have total of 3 replica (original + 2 replicas), I want
my replica placement to be such, that I don't end up having 2 replicas on 1
host (replica on osd0, osd1 (both on host1) and replica on osd2. I want all
3 replicas spread on different hosts...

I know this is to be done via crush maps, but I'm not sure if it would be
better to have 2 pools, 1 pool on osd0,2,4 and and another pool on osd1,3,5.

If possible, I would want only 1 pool, spread across all 6 OSDs, but with
data placement such, that I don't end up having 2 replicas on 1 host...not
sure if this is possible at all...

Is that possible, or maybe I should go for RAID0 in each server (2 x 2Tb =
4TB for osd0) or maybe JBOD  (1 volume, so 1 OSD per host) ?

Any suggesting about best practice ?

Regards,

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiply OSDs per host strategy ?

2013-10-16 Thread Andrija Panic
well, nice one :)

*step chooseleaf firstn 0 type host* - it is the part of default crush map
(3 hosts, 2 OSDs per host)

It means: write 3 replicas (in my case) to 3 hosts...and randomly select
OSD from each host ?

I already read all the docs...and still not sure how to proceed...


On 16 October 2013 23:27, Mike Dawson mike.daw...@cloudapt.com wrote:

 Andrija,

 You can use a single pool and the proper CRUSH rule


 step chooseleaf firstn 0 type host


 to accomplish your goal.

 http://ceph.com/docs/master/**rados/operations/crush-map/http://ceph.com/docs/master/rados/operations/crush-map/


 Cheers,
 Mike Dawson



 On 10/16/2013 5:16 PM, Andrija Panic wrote:

 Hi,

 I have 2 x  2TB disks, in 3 servers, so total of 6 disks... I have
 deployed total of 6 OSDs.
 ie:
 host1 = osd.0 and osd.1
 host2 = osd.2 and osd.3
 host4 = osd.4 and osd.5

 Now, since I will have total of 3 replica (original + 2 replicas), I
 want my replica placement to be such, that I don't end up having 2
 replicas on 1 host (replica on osd0, osd1 (both on host1) and replica on
 osd2. I want all 3 replicas spread on different hosts...

 I know this is to be done via crush maps, but I'm not sure if it would
 be better to have 2 pools, 1 pool on osd0,2,4 and and another pool on
 osd1,3,5.

 If possible, I would want only 1 pool, spread across all 6 OSDs, but
 with data placement such, that I don't end up having 2 replicas on 1
 host...not sure if this is possible at all...

 Is that possible, or maybe I should go for RAID0 in each server (2 x 2Tb
 = 4TB for osd0) or maybe JBOD  (1 volume, so 1 OSD per host) ?

 Any suggesting about best practice ?

 Regards,

 --

 Andrija Panić


 __**_
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD read-ahead not working in 0.87.1

2015-03-18 Thread Andrija Panic
Acutally, good question - is RBD caching at all - possible with Windows
guestes, if it ussing latest VirtIO drivers ?
Linux caching (write caching, writeback) is working fine with newer virt-io
drivers...

Thanks

On 18 March 2015 at 10:39, Alexandre DERUMIER aderum...@odiso.com wrote:

 Hi,

 I don't known how rbd read-ahead is working,

 but with qemu virtio-scsi, you can have read merge request (for sequential
 reads), so it's doing bigger ops to ceph cluster and improve throughput.
 virtio-blk merge request will be supported in coming qemu 2.3.


 (I'm not sure of virtio-win drivers support of theses features)


 - Mail original -
 De: Stephen Taylor stephen.tay...@storagecraft.com
 À: ceph-users ceph-us...@ceph.com
 Envoyé: Mardi 17 Mars 2015 21:22:59
 Objet: Re: [ceph-users] RBD read-ahead not working in 0.87.1



 Never mind. After digging through the history on Github it looks like the
 docs are wrong. The code for the RBD read-ahead feature appears in 0.88,
 not 0.86, which explains why I can’t get it to work in 0.87.1.



 Steve




 From: Stephen Taylor
 Sent: Tuesday, March 17, 2015 11:32 AM
 To: 'ceph-us...@ceph.com'
 Subject: RBD read-ahead not working in 0.87.1




 Hello, fellow Ceph users,



 I’m trying to utilize RBD read-ahead settings with 0.87.1 (documented as
 new in 0.86) to convince the Windows boot loader to boot a Windows RBD in a
 reasonable amount of time using QEMU on Ubuntu 14.04.2. Below is the output
 of “ceph -w” during the Windows VM boot process. During the boot loader
 phase it’s almost a perfect correspondence of kB/s rd and op/s, which I
 interpret as the boot loader doing LOTS of non-cached, 1kB reads. This is
 what the [client] section of my ceph.conf looks like:



 [client]

 rbd_cache = true

 rbd_cache_size = 268435456

 rbd_cache_max_dirty = 201326592

 rbd_cache_target_dirty = 134217728

 rbd_readahead_trigger_requests = 1

 rbd_readahead_max_bytes = 524288

 rbd_readahead_disable_after_bytes = 0

 rbd_cache_writethrough_until_flush = true

 admin_socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok



 Some of those values are not what I would use in production. This is just
 a test environment to try to prove that the RBD read-ahead caching works as
 I expect.



 Another interesting note is that “sudo ceph daemon /var/run/ceph/admin
 socket config show | grep rbd_readahead” yields nothing. The “config show”
 lists all of the config settings with the values I expect, but the
 rbd_readahead_* settings are absent. I have tried all kinds of different
 values in my ceph.conf file with the same result.



 The reason I’m convinced that read-ahead caching is my problem here is
 that I can mount my RBD via rbd-fuse and use the same QEMU command with the
 -drive parameter changed to use the rbd-fuse mount as a raw file instead of
 direct librbd, and the same Windows VM boots in a fraction of the time with
 much lower op/s numbers in the Ceph status output. I assume this is due to
 the Linux page cache helping me out with the rbd-fuse mount.



 Are the RBD read-ahead settings simply not working? That’s what it looks
 like, but I figure I must be doing something wrong. Thanks for any help.



 Steve Taylor



 2015-03-17 09:50:19.209721 mon.0 [INF] pgmap v20871: 8192 pgs: 8192
 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 3 B/s rd,
 0 op/s

 2015-03-17 09:50:24.199327 mon.0 [INF] pgmap v20872: 8192 pgs: 8192
 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 7 B/s rd,
 0 op/s

 2015-03-17 10:02:03.471846 mon.0 [INF] pgmap v20873: 8192 pgs: 8192
 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 1 B/s rd,
 0 op/s

 2015-03-17 10:02:05.739547 mon.0 [INF] pgmap v20874: 8192 pgs: 8192
 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 754 B/s
 rd, 0 op/s

 2015-03-17 10:02:08.008245 mon.0 [INF] pgmap v20875: 8192 pgs: 8192
 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 144 kB/s
 rd, 156 op/s

 2015-03-17 10:02:09.286862 mon.0 [INF] pgmap v20876: 8192 pgs: 8192
 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 130 kB/s
 rd, 147 op/s

 2015-03-17 10:02:10.543695 mon.0 [INF] pgmap v20877: 8192 pgs: 8192
 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 614 kB/s
 rd, 614 op/s

 2015-03-17 10:02:11.832906 mon.0 [INF] pgmap v20878: 8192 pgs: 8192
 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 828 kB/s
 rd, 828 op/s

 2015-03-17 10:02:12.998471 mon.0 [INF] pgmap v20879: 8192 pgs: 8192
 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 387 kB/s
 rd, 387 op/s

 2015-03-17 10:02:14.378462 mon.0 [INF] pgmap v20880: 8192 pgs: 8192
 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 76889 B/s
 rd, 75 op/s

 2015-03-17 10:02:15.656530 mon.0 [INF] pgmap v20881: 8192 pgs: 8192
 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 73924 B/s
 rd, 72 op/s

 2015-03-17 10:02:16.935335 mon.0 [INF] pgmap 

Re: [ceph-users] Doesn't Support Qcow2 Disk images

2015-03-12 Thread Andrija Panic
ceph is RAW format - should be all fine...so VM will be using that RAW
format

On 12 March 2015 at 09:03, Azad Aliyar azad.ali...@sparksupport.com wrote:

 Community please explain the 2nd warning on this page:

 http://ceph.com/docs/master/rbd/rbd-openstack/

 Important Ceph doesn’t support QCOW2 for hosting a virtual machine disk.
 Thus if you want to boot virtual machines in Ceph (ephemeral backend or
 boot from volume), the Glance image format must be RAW.


 --
Warm Regards,  Azad Aliyar
  Linux Server Engineer
  *Email* :  azad.ali...@sparksupport.com   *|*   *Skype* :   spark.azad
 http://www.sparksupport.com http://www.sparkmycloud.com
 https://www.facebook.com/sparksupport
 http://www.linkedin.com/company/244846
 https://twitter.com/sparksupport3rd Floor, Leela Infopark, Phase
 -2,Kakanad, Kochi-30, Kerala, India  *Phone*:+91 484 6561696 , 
 *Mobile*:91-8129270421.
   *Confidentiality Notice:* Information in this e-mail is proprietary to
 SparkSupport. and is intended for use only by the addressed, and may
 contain information that is privileged, confidential or exempt from
 disclosure. If you are not the intended recipient, you are notified that
 any use of this information in any manner is strictly prohibited. Please
 delete this mail  notify us immediately at i...@sparksupport.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-05 Thread Andrija Panic
Hi Robert,

it seems I have not listened well on your advice - I set osd to out,
instead of stoping it - and now instead of some ~ 3% of degraded objects,
now there is 0.000% of degraded, and arround 6% misplaced - and rebalancing
is happening again, but this is small percentage..

Do you know if later when I remove this OSD from crush map - no more data
will be rebalanced (as per CEPH official documentation) - since already
missplaced objects are geting distributed away to all other nodes ?

(after service ceph stop osd.0 - there was 2.45% degraded data - but no
backfilling was happening for some reason...it just stayed degraded... so
this is a reason why I started back the OSD, and then set it to out...)

Thanks

On 4 March 2015 at 17:54, Andrija Panic andrija.pa...@gmail.com wrote:

 Hi Robert,

 I already have this stuff set. CEph is 0.87.0 now...

 Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
 move data in less than 8h per my last experineced that was arround8h, but
 some 1G OSDs were included...

 Thx!

 On 4 March 2015 at 17:49, Robert LeBlanc rob...@leblancnet.us wrote:

 You will most likely have a very high relocation percentage. Backfills
 always are more impactful on smaller clusters, but osd max backfills
 should be what you need to help reduce the impact. The default is 10,
 you will want to use 1.

 I didn't catch which version of Ceph you are running, but I think
 there was some priority work done in firefly to help make backfills
 lower priority. I think it has gotten better in later versions.

 On Wed, Mar 4, 2015 at 1:35 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  Thank you Rober - I'm wondering when I do remove total of 7 OSDs from
 crush
  map - weather that will cause more than 37% of data moved (80% or
 whatever)
 
  I'm also wondering if the thortling that I applied is fine or not - I
 will
  introduce the osd_recovery_delay_start 10sec as Irek said.
 
  I'm just wondering hom much will be the performance impact, because:
  - when stoping OSD, the impact while backfilling was fine more or a
 less - I
  can leave with this
  - when I removed OSD from cursh map - first 1h or so, impact was
 tremendous,
  and later on during recovery process impact was much less but still
  noticable...
 
  Thanks for the tip of course !
  Andrija
 
  On 3 March 2015 at 18:34, Robert LeBlanc rob...@leblancnet.us wrote:
 
  I would be inclined to shut down both OSDs in a node, let the cluster
  recover. Once it is recovered, shut down the next two, let it recover.
  Repeat until all the OSDs are taken out of the cluster. Then I would
  set nobackfill and norecover. Then remove the hosts/disks from the
  CRUSH then unset nobackfill and norecover.
 
  That should give you a few small changes (when you shut down OSDs) and
  then one big one to get everything in the final place. If you are
  still adding new nodes, when nobackfill and norecover is set, you can
  add them in so that the one big relocate fills the new drives too.
 
  On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic andrija.pa...@gmail.com
 
  wrote:
   Thx Irek. Number of replicas is 3.
  
   I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
   decommissioned), which is further connected to a new 10G
 switch/network
   with
   3 servers on it with 12 OSDs each.
   I'm decommissioning old 3 nodes on 1G network...
  
   So you suggest removing whole node with 2 OSDs manually from crush
 map?
   Per my knowledge, ceph never places 2 replicas on 1 node, all 3
 replicas
   were originally been distributed over all 3 nodes. So anyway It
 could be
   safe to remove 2 OSDs at once together with the node itself...since
   replica
   count is 3...
   ?
  
   Thx again for your time
  
   On Mar 3, 2015 1:35 PM, Irek Fasikhov malm...@gmail.com wrote:
  
   Once you have only three nodes in the cluster.
   I recommend you add new nodes to the cluster, and then delete the
 old.
  
   2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com:
  
   You have a number of replication?
  
   2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com
 :
  
   Hi Irek,
  
   yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
   degraded and moved/recovered.
   When I after that removed it from Crush map ceph osd crush rm
 id,
   that's when the stuff with 37% happened.
  
   And thanks Irek for help - could you kindly just let me know of
 the
   prefered steps when removing whole node?
   Do you mean I first stop all OSDs again, or just remove each OSD
 from
   crush map, or perhaps, just decompile cursh map, delete the node
   completely,
   compile back in, and let it heal/recover ?
  
   Do you think this would result in less data missplaces and moved
   arround
   ?
  
   Sorry for bugging you, I really appreaciate your help.
  
   Thanks
  
   On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com
 wrote:
  
   A large percentage of the rebuild of the cluster map (But low

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-05 Thread Andrija Panic
Thanks a lot Robert.

I have actually already tried folowing:

a) set one OSD to out (6% of data misplaced, CEPH recovered fine), stop
OSD, remove OSD from crush map (again 36% of data misplaced !!!) - then
inserted OSD back in to crushmap - and those 36% displaced objects
disappeared, of course - I'v undone the crush remove...
so damage undone - the OSD is just out and cluster healthy again.


b) set norecover, nobackfill, and then:
- Remove one OSD from crush (the running OSD, not the one from point a)
- only 18% of data misplaced !!! (no recovery was happening though, because
of norecover, nobackfill)
- Removed another OSD from same node - total of only 20% of objects
missplaced (with 2 OSDs on same node, removed from crush map)
-So these 2 OSD were still running UP and IN, and I just removed them
from crush map, per the advice to avoid calcualting Crush map twice = from:
http://image.slidesharecdn.com/scalingcephatcern-140311134847-phpapp01/95/scaling-ceph-at-cern-ceph-day-frankfurt-19-638.jpg?cb=1394564547
- And I added back this 2 OSD to crush map, this was just a test...

So the algorith is very funny in some aspect..but it's all pseudo stuff so
I kind of understand...

I will share my finding during the rest of the OSD demotion, after I demote
them...

Thanks for your detailed inputs !
Andrija


On 5 March 2015 at 22:51, Robert LeBlanc rob...@leblancnet.us wrote:

 Setting an OSD out will start the rebalance with the degraded object
 count. The OSD is still alive and can participate in the relocation of the
 objects. This is preferable so that you don't happen to get less the
 min_size because a disk fails during the rebalance then I/O stops on the
 cluster.

 Because CRUSH is an algorithm, anything that changes algorithm will cause
 a change in the output (location). When you set/fail an OSD, it changes the
 CRUSH, but the host and weight of the host are still in effect. When you
 remove the host or change the weight of the host (by removing a single
 OSD), it makes a change to the algorithm which will also cause some changes
 in how it computes the locations.

 Disclaimer - I have not tried this

 It may be possible to minimize the data movement by doing the following:

1. set norecover and nobackfill on the cluster
2. Set the OSDs to be removed to out
3. Adjust the weight of the hosts in the CRUSH (if removing all OSDs
for the host, set it to zero)
4. If you have new OSDs to add, add them into the cluster now
5. Once all OSDs changes have been entered, unset norecover and
nobackfill
6. This will migrate the data off the old OSDs and onto the new OSDs
in one swoop.
7. Once the data migration is complete, set norecover and nobackfill
on the cluster again.
8. Remove the old OSDs
9. Unset norecover and nobackfill

 The theory is that by setting the host weights to 0, removing the
 OSDs/hosts later should minimize the data movement afterwards because the
 algorithm should have already dropped it out as a candidate for placement.

 If this works right, then you basically queue up a bunch of small changes,
 do one data movement, always keep all copies of your objects online and
 minimize the impact of the data movement by leveraging both your old and
 new hardware at the same time.

 If you try this, please report back on your experience. I'm might try it
 in my lab, but I'm really busy at the moment so I don't know if I'll get to
 it real soon.

 On Thu, Mar 5, 2015 at 12:53 PM, Andrija Panic andrija.pa...@gmail.com
 wrote:

 Hi Robert,

 it seems I have not listened well on your advice - I set osd to out,
 instead of stoping it - and now instead of some ~ 3% of degraded objects,
 now there is 0.000% of degraded, and arround 6% misplaced - and rebalancing
 is happening again, but this is small percentage..

 Do you know if later when I remove this OSD from crush map - no more data
 will be rebalanced (as per CEPH official documentation) - since already
 missplaced objects are geting distributed away to all other nodes ?

 (after service ceph stop osd.0 - there was 2.45% degraded data - but no
 backfilling was happening for some reason...it just stayed degraded... so
 this is a reason why I started back the OSD, and then set it to out...)

 Thanks

 On 4 March 2015 at 17:54, Andrija Panic andrija.pa...@gmail.com wrote:

 Hi Robert,

 I already have this stuff set. CEph is 0.87.0 now...

 Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
 move data in less than 8h per my last experineced that was arround8h, but
 some 1G OSDs were included...

 Thx!

 On 4 March 2015 at 17:49, Robert LeBlanc rob...@leblancnet.us wrote:

 You will most likely have a very high relocation percentage. Backfills
 always are more impactful on smaller clusters, but osd max backfills
 should be what you need to help reduce the impact. The default is 10,
 you will want to use 1.

 I didn't catch which version of Ceph you are running, but I

[ceph-users] [rbd cache experience - given]

2015-03-07 Thread Andrija Panic
Hi there,

just wanted to share some benchmark experience with RBD caching, that I
have just (partially) implemented. This is not nicely formated results,
just raw numbers to understadn the difference

 *INFRASTRUCTURE:
- 3 hosts with:  12 x 4TB drives, 6 Journals on 1 SSD, 6 journals on
second SSD
- 10GB NICs on both Compute and Storage nodes
- 10GB dedicated replication/private CEPH network
- Libvirt 1.2.3
- Qemu 0.12.1.2
- qemu drive-cache=none (set by CloudStack)

*** CEPH SETTINGS (ceph.conf on KVM hosts):
[client]
rbd cache = true
rbd cache size = 67108864 # (64MB)
rbd cache max dirty = 50331648 # (48MB)
rbd cache target dirty = 33554432 # (32MB)
rbd cache max dirty age = 2
rbd cache writethrough until flush = true # For safety reasons


 *NUMBERS (CentOS 6.6 VM - FIO/sysbench tools):

Random write 16k IO size (yes I know, this is not iops because true
IOPS is considered 4K size - but is good enough for comparison):

Random write, NO RBD cache: 170 IOPS 
Random write, RBD cache 64MB:  6500 IOPS.

Sequential writes improved from ~ 40 MB/s to 800 MB/s

Will check latency also...and let you know

*** IMPORTANT:
Make sure to have latest VirtIO drivers, because:
- CentOS 6.6, Kernel 2.6.32.x - *RBD caching does not work* (2.6.32 VirtiIO
driver does not send flushes properly)
- CentOS 6.6 Kernel 3.10 Elrepo *RBD caching works fine* (new VirtIO
drivers sending flushes fine)

I dont know for Windows, but will give you before and after numbers
very soon.

Best,
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding Monitor

2015-03-13 Thread Andrija Panic
Georgeos

, you need to have deployment server and cd into folder that you used
originaly while deploying CEPH - in this folder you should already have
ceph.conf, admin.client keyring and other stuff - which is required to to
connect to cluster...and provision new MONs or OSDs, etc.

Message:
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run 'new' to
create a new cluster...

...means (if I'm not mistaken) that you are runnign ceph-deploy from NOT
original folder...


On 13 March 2015 at 23:03, Georgios Dimitrakakis gior...@acmac.uoc.gr
wrote:

 Not a firewall problem!! Firewall is disabled ...

 Loic I 've tried mon create because of this: http://ceph.com/docs/v0.80.5/
 start/quick-ceph-deploy/#adding-monitors


 Should I first create and then add?? What is the proper order??? Should I
 do it from the already existing monitor node or can I run it from the new
 one?

 If I try add from the beginning I am getting this:

 ceph_deploy.conf][DEBUG ] found configuration file at:
 /home/.cephdeploy.conf
 [ceph_deploy.cli][INFO  ] Invoked (1.5.22): /usr/bin/ceph-deploy mon add
 jin
 [ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run 'new' to
 create a new cluster



 Regards,


 George



  Hi,

 I think ceph-deploy mon add (instead of create) is what you should be
 using.

 Cheers

 On 13/03/2015 22:25, Georgios Dimitrakakis wrote:

 On an already available cluster I 've tried to add a new monitor!

 I have used ceph-deploy mon create {NODE}

 where {NODE}=the name of the node

 and then I restarted the /etc/init.d/ceph service with a success at the
 node
 where it showed that the monitor is running like:

 # /etc/init.d/ceph restart
 === mon.jin ===
 === mon.jin ===
 Stopping Ceph mon.jin on jin...kill 36388...done
 === mon.jin ===
 Starting Ceph mon.jin on jin...
 Starting ceph-create-keys on jin...



 But checking the quorum it doesn't show the newly added monitor!

 Plus ceph mon stat gives out only 1 monitor!!!

 # ceph mon stat
 e1: 1 mons at {fu=192.168.1.100:6789/0}, election epoch 1, quorum 0 fu


 Any ideas on what have I done wrong???


 Regards,

 George
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding Monitor

2015-03-13 Thread Andrija Panic
Check firewall - I hit this issue over and over again...

On 13 March 2015 at 22:25, Georgios Dimitrakakis gior...@acmac.uoc.gr
wrote:

 On an already available cluster I 've tried to add a new monitor!

 I have used ceph-deploy mon create {NODE}

 where {NODE}=the name of the node

 and then I restarted the /etc/init.d/ceph service with a success at the
 node
 where it showed that the monitor is running like:

 # /etc/init.d/ceph restart
 === mon.jin ===
 === mon.jin ===
 Stopping Ceph mon.jin on jin...kill 36388...done
 === mon.jin ===
 Starting Ceph mon.jin on jin...
 Starting ceph-create-keys on jin...



 But checking the quorum it doesn't show the newly added monitor!

 Plus ceph mon stat gives out only 1 monitor!!!

 # ceph mon stat
 e1: 1 mons at {fu=192.168.1.100:6789/0}, election epoch 1, quorum 0 fu


 Any ideas on what have I done wrong???


 Regards,

 George
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Public Network Meaning

2015-03-14 Thread Andrija Panic
Public network is clients-to-OSD traffic - and if you have NOT explicitely
defined cluster network, than also OSD-to-OSD replication takes place over
same network.

Otherwise, you can define public and cluster(private) network - so OSD
replication will happen over dedicated NICs (cluster network) and thus
speed up.

If i.e. replica count on pool is 3, that means, each 1GB of data writen to
some particualr OSD, will generate 3 x 1GB of more writes, to the
replicas... - which ideally will take place over separate NICs to speed up
things...

On 14 March 2015 at 17:43, Georgios Dimitrakakis gior...@acmac.uoc.gr
wrote:


 Hi all!!

 What is the meaning of public_network in ceph.conf?

 Is it the network that OSDs are talking and transferring data?

 I have two nodes with two IP addresses each. One for internal network
 192.168.1.0/24
 and one external 15.12.6.*

 I see the following in my logs:

 osd.0 is down since epoch 2204, last address 15.12.6.21:6826/33094
 osd.1 is down since epoch 2206, last address 15.12.6.21:6817/32463
 osd.2 is down since epoch 2198, last address 15.12.6.21:6843/34921
 osd.3 is down since epoch 2200, last address 15.12.6.21:6838/34208
 osd.4 is down since epoch 2202, last address 15.12.6.21:6831/33610
 osd.5 is down since epoch 2194, last address 15.12.6.21:6858/35948
 osd.7 is down since epoch 2192, last address 15.12.6.21:6871/36720
 osd.8 is down since epoch 2196, last address 15.12.6.21:6855/35354


 I 've managed to add a second node and during rebalancing I see that data
 is transfered through
 the internal 192.* but the external link is also saturated!

 What is being transferred from that?


 Any help much appreciated!

 Regards,

 George
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [SPAM] Changing pg_num = RBD VM down !

2015-03-14 Thread Andrija Panic
changin PG number - causes LOOOT of data rebalancing (in my case was 80%)
which I learned the hard way...

On 14 March 2015 at 18:49, Gabri Mate mailingl...@modernbiztonsag.org
wrote:

 I had the same issue a few days ago. I was increasing the pg_num of one
 pool from 512 to 1024 and all the VMs in that pool stopped. I came to
 the conclusion that doubling the pg_num caused such a high load in ceph
 that the VMs were blocked. The next time I will test with small
 increments.


 On 12:38 Sat 14 Mar , Florent B wrote:
  Hi all,
 
  I have a Giant cluster in production.
 
  Today one of my RBD pools had the too few pgs warning. So I changed
  pg_num  pgp_num.
 
  And at this moment, some of the VM stored on this pool were stopped (on
  some hosts, not all, it depends, no logic)
 
  All was running fine for months...
 
  Have you ever seen this ?
  What could have caused this ?
 
  Thank you.
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Public Network Meaning

2015-03-14 Thread Andrija Panic
This is how I did it, and then retart each OSD one by one, but monritor
with ceph -s, when ceph is healthy, proceed with next OSD restart...
Make sure the networks are fine on physical nodes, that you can ping in
between...

[global]
x
x
x
x
x
x

#
### REPLICATION NETWORK ON SEPARATE 10G NICs

# replication network
cluster network = 10.44.251.0/24

# public/client network
public network = 10.44.253.0/16

#

[mon.xx]
mon_addr = x.x.x.x:6789
host = xx

[mon.yy]
mon_addr = x.x.x.x:6789
host = yy

[mon.zz]
mon_addr = x.x.x.x:6789
host = zz

On 14 March 2015 at 19:14, Georgios Dimitrakakis gior...@acmac.uoc.gr
wrote:

 I thought that it was easy but apparently it's not!

 I have the following in my conf file


 mon_host = 192.168.1.100,192.168.1.101,192.168.1.102
 public_network = 192.168.1.0/24
 mon_initial_members = fu,rai,jin


 but still the 15.12.6.21 link is being saturated

 Any ideas why???

 Should I put cluster network as well??

 Should I put each OSD in the CONF file???


 Regards,


 George





  Andrija,

 thanks a lot for the useful info!

 I would also like to thank Kingrat at the IRC channel for his
 useful advice!


 I was under the wrong impression that public is the one used for RADOS.

 So I thought that public=external=internet and therefore I used that
 one in my conf.

 I understand now that I should have specified in CEPH Public's
 Network what I call
 internal and which is the one that all machines are talking
 directly to each other.


 Thanks you all for the feedback!


 Regards,


 George


  Public network is clients-to-OSD traffic - and if you have NOT
 explicitely defined cluster network, than also OSD-to-OSD replication
 takes place over same network.

 Otherwise, you can define public and cluster(private) network - so OSD
 replication will happen over dedicated NICs (cluster network) and thus
 speed up.

 If i.e. replica count on pool is 3, that means, each 1GB of data
 writen to some particualr OSD, will generate 3 x 1GB of more writes,
 to the replicas... - which ideally will take place over separate NICs
 to speed up things...

 On 14 March 2015 at 17:43, Georgios Dimitrakakis  wrote:

  Hi all!!

 What is the meaning of public_network in ceph.conf?

 Is it the network that OSDs are talking and transferring data?

 I have two nodes with two IP addresses each. One for internal
 network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
 192.168.1.0/24 [1]
 and one external 15.12.6.*

 I see the following in my logs:

 osd.0 is down since epoch 2204, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094 [2]
 osd.1 is down since epoch 2206, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6817/32463 [3]
 osd.2 is down since epoch 2198, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6843/34921 [4]
 osd.3 is down since epoch 2200, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6838/34208 [5]
 osd.4 is down since epoch 2202, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6831/33610 [6]
 osd.5 is down since epoch 2194, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6858/35948 [7]
 osd.7 is down since epoch 2192, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6871/36720 [8]
 osd.8 is down since epoch 2196, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6855/35354 [9]

 I ve managed to add a second node and during rebalancing I see that
 data is transfered through
 the internal 192.* but the external link is also saturated!

 What is being transferred from that?

 Any help much appreciated!

 Regards,

 George
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com [10]
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [11]


 --

 Andrija Panić

 Links:
 --
 [1] http://192.168.1.0/24
 [2] http://15.12.6.21:6826/33094
 [3] http://15.12.6.21:6817/32463
 [4] http://15.12.6.21:6843/34921
 [5] http://15.12.6.21:6838/34208
 [6] http://15.12.6.21:6831/33610
 [7] http://15.12.6.21:6858/35948
 [8] http://15.12.6.21:6871/36720
 [9] http://15.12.6.21:6855/35354
 [10] mailto:ceph-users@lists.ceph.com
 [11] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 [12] mailto:gior...@acmac.uoc.gr


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] {Disarmed} Re: Public Network Meaning

2015-03-14 Thread Andrija Panic
Georgios,

no need to put ANYTHING if you don't plan to split client-to-OSD vs
OSD-OSD-replication on 2 different Network Cards/Networks - for pefromance
reasons.

if you have only 1 network - simply DONT configure networks at all inside
your CEPH.conf file...

if you have 2 x 1G cards in servers, then you may use first 1G for client
traffic, and second 1G for OSD-to-OSD replication...

best

On 14 March 2015 at 19:33, Georgios Dimitrakakis gior...@acmac.uoc.gr
wrote:

 Andrija,

 Thanks for you help!

 In my case I just have one 192.* network, so should I put that for both?

 Besides monitors do I have to list OSDs as well?

 Thanks again!

 Best,

 George

  This is how I did it, and then retart each OSD one by one, but
 monritor with ceph -s, when ceph is healthy, proceed with next OSD
 restart...
 Make sure the networks are fine on physical nodes, that you can ping
 in between...

 [global]
 x
 x
 x
 x
 x
 x

 #
 ### REPLICATION NETWORK ON SEPARATE 10G NICs

 # replication network
 cluster network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: 10.44.251.0/24 [29]

 # public/client network
 public network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: 10.44.253.0/16 [30]

 #

 [mon.xx]
 mon_addr = x.x.x.x:6789
 host = xx

 [mon.yy]
 mon_addr = x.x.x.x:6789
 host = yy

 [mon.zz]
 mon_addr = x.x.x.x:6789
 host = zz

 On 14 March 2015 at 19:14, Georgios Dimitrakakis  wrote:

  I thought that it was easy but apparently its not!

 I have the following in my conf file

 mon_host = 192.168.1.100,192.168.1.101,192.168.1.102
 public_network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: 192.168.1.0/24 [26]
 mon_initial_members = fu,rai,jin

 but still the 15.12.6.21 link is being saturated

 Any ideas why???

 Should I put cluster network as well??

 Should I put each OSD in the CONF file???

 Regards,

 George

  Andrija,

 thanks a lot for the useful info!

 I would also like to thank Kingrat at the IRC channel for his
 useful advice!

 I was under the wrong impression that public is the one used for
 RADOS.

 So I thought that public=external=internet and therefore I used
 that
 one in my conf.

 I understand now that I should have specified in CEPH Publics
 Network what I call
 internal and which is the one that all machines are talking
 directly to each other.

 Thanks you all for the feedback!

 Regards,

 George

  Public network is clients-to-OSD traffic - and if you have NOT
 explicitely defined cluster network, than also OSD-to-OSD
 replication
 takes place over same network.

 Otherwise, you can define public and cluster(private) network -
 so OSD
 replication will happen over dedicated NICs (cluster network)
 and thus
 speed up.

 If i.e. replica count on pool is 3, that means, each 1GB of
 data
 writen to some particualr OSD, will generate 3 x 1GB of more
 writes,
 to the replicas... - which ideally will take place over
 separate NICs
 to speed up things...

 On 14 March 2015 at 17:43, Georgios Dimitrakakis  wrote:

  Hi all!!

 What is the meaning of public_network in ceph.conf?

 Is it the network that OSDs are talking and transferring
 data?

 I have two nodes with two IP addresses each. One for internal
 network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS:
 MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
 192.168.1.0/24 [1] [1]
 and one external 15.12.6.*

 I see the following in my logs:

 osd.0 is down since epoch 2204, last address MAILSCANNER
 WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094 [2]
 [2]
 osd.1 is down since epoch 2206, last address MAILSCANNER
 WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6817/32463 [3]
 [3]
 osd.2 is down since epoch 2198, last address MAILSCANNER
 WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6843/34921 [4]
 [4]
 osd.3 is down since epoch 2200, last address MAILSCANNER
 WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6838/34208 [5]
 [5]
 osd.4 is down since epoch 2202, last address MAILSCANNER
 WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6831/33610 [6]
 [6]
 osd.5 is down since epoch 2194, last address MAILSCANNER
 WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6858/35948 [7]
 [7]
 osd.7 is down since epoch 2192, last address MAILSCANNER
 WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6871/36720 [8]
 [8]
 osd.8 is down since epoch 2196, last address MAILSCANNER
 WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 

Re: [ceph-users] {Disarmed} Re: {Disarmed} Re: Public Network Meaning

2015-03-14 Thread Andrija Panic
In that case - yes...put everything on 1 card - or if both cards are 1G (or
same speed for that matter...) - then you might want toblock all external
traffic except i.e. SSH, WEB, but allow ALL traffic between all CEPH
OSDs... so you can still use that network for public/client traffic - not
sure how do you connect/use CEPH - from internet ??? or you have some more
VMs/servers/clients on 192.* network... ?



On 14 March 2015 at 19:38, Georgios Dimitrakakis gior...@acmac.uoc.gr
wrote:

 Andrija,

 I have two cards!

 One on 15.12.* and one on 192.*

 Obviously the 15.12.* is the external network (real public IP address e.g
 used to access the node via SSH)

 That's why I am telling that my public network for CEPH is the 192. and
 should I use the cluster network for that as well?

 Best,

 George


  Georgios,

 no need to put ANYTHING if you dont plan to split client-to-OSD vs
 OSD-OSD-replication on 2 different Network Cards/Networks - for
 pefromance reasons.

 if you have only 1 network - simply DONT configure networks at all
 inside your CEPH.conf file...

 if you have 2 x 1G cards in servers, then you may use first 1G for
 client traffic, and second 1G for OSD-to-OSD replication...

 best

 On 14 March 2015 at 19:33, Georgios Dimitrakakis  wrote:

  Andrija,

 Thanks for you help!

 In my case I just have one 192.* network, so should I put that for
 both?

 Besides monitors do I have to list OSDs as well?

 Thanks again!

 Best,

 George

  This is how I did it, and then retart each OSD one by one, but
 monritor with ceph -s, when ceph is healthy, proceed with next
 OSD
 restart...
 Make sure the networks are fine on physical nodes, that you can
 ping
 in between...

 [global]
 x
 x
 x
 x
 x
 x

 #
 ### REPLICATION NETWORK ON SEPARATE 10G NICs

 # replication network
 cluster network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: 10.44.251.0/24 [29] [29]

 # public/client network
 public network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: 10.44.253.0/16 [30] [30]

 #

 [mon.xx]
 mon_addr = x.x.x.x:6789
 host = xx

 [mon.yy]
 mon_addr = x.x.x.x:6789
 host = yy

 [mon.zz]
 mon_addr = x.x.x.x:6789
 host = zz

 On 14 March 2015 at 19:14, Georgios Dimitrakakis  wrote:

  I thought that it was easy but apparently its not!

 I have the following in my conf file

 mon_host = 192.168.1.100,192.168.1.101,192.168.1.102
 public_network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: 192.168.1.0/24 [26] [26]
 mon_initial_members = fu,rai,jin

 but still the 15.12.6.21 link is being saturated

 Any ideas why???

 Should I put cluster network as well??

 Should I put each OSD in the CONF file???

 Regards,

 George

  Andrija,

 thanks a lot for the useful info!

 I would also like to thank Kingrat at the IRC channel for
 his
 useful advice!

 I was under the wrong impression that public is the one used
 for
 RADOS.

 So I thought that public=external=internet and therefore I
 used
 that
 one in my conf.

 I understand now that I should have specified in CEPH Publics
 Network what I call
 internal and which is the one that all machines are talking
 directly to each other.

 Thanks you all for the feedback!

 Regards,

 George

  Public network is clients-to-OSD traffic - and if you have
 NOT
 explicitely defined cluster network, than also OSD-to-OSD
 replication
 takes place over same network.

 Otherwise, you can define public and cluster(private)
 network -
 so OSD
 replication will happen over dedicated NICs (cluster
 network)
 and thus
 speed up.

 If i.e. replica count on pool is 3, that means, each 1GB of
 data
 writen to some particualr OSD, will generate 3 x 1GB of
 more
 writes,
 to the replicas... - which ideally will take place over
 separate NICs
 to speed up things...

 On 14 March 2015 at 17:43, Georgios Dimitrakakis  wrote:

  Hi all!!

 What is the meaning of public_network in ceph.conf?

 Is it the network that OSDs are talking and transferring
 data?

 I have two nodes with two IP addresses each. One for
 internal
 network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS:
 MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
 MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
 192.168.1.0/24 [1] [1] [1]
 and one external 15.12.6.*

 I see the following in my logs:

 osd.0 is down since epoch 2204, last address MAILSCANNER
 WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094
 [2] [2]
 [2]
 osd.1 is down since epoch 2206, last address MAILSCANNER
 WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
 NUMERICAL LINKS ARE 

Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Thanks Wido - I will do that.

On 13 March 2015 at 09:46, Wido den Hollander w...@42on.com wrote:



 On 13-03-15 09:42, Andrija Panic wrote:
  Hi all,
 
  I have set nodeep-scrub and noscrub while I had small/slow hardware for
  the cluster.
  It has been off for a while now.
 
  Now we are upgraded with hardware/networking/SSDs and I would like to
  activate - or unset these flags.
 
  Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I
  was wondering what is the best way to unset flags - meaning if I just
  unset the flags, should I expect that the SCRUB will start all of the
  sudden on all disks - or is there way to let the SCRUB do drives one by
  one...
 

 So, I *think* that unsetting these flags will trigger a big scrub, since
 all PGs have a very old last_scrub_stamp and last_deepscrub_stamp

 You can verify this with:

 $ ceph pg pgid query

 A solution would be to scrub each PG manually first in a timely fashion.

 $ ceph pg scrub pgid

 That way you set the timestamps and slowly scrub each PG.

 When that's done, unset the flags.

 Wido

  In other words - should I expect BIG performance impact ornot ?
 
  Any experience is very appreciated...
 
  Thanks,
 
  --
 
  Andrija Panić
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Hi all,

I have set nodeep-scrub and noscrub while I had small/slow hardware for the
cluster.
It has been off for a while now.

Now we are upgraded with hardware/networking/SSDs and I would like to
activate - or unset these flags.

Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I was
wondering what is the best way to unset flags - meaning if I just unset the
flags, should I expect that the SCRUB will start all of the sudden on all
disks - or is there way to let the SCRUB do drives one by one...

In other words - should I expect BIG performance impact ornot ?

Any experience is very appreciated...

Thanks,

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Nice - so I just realized I need to manually scrub 1216 placements groups :)


On 13 March 2015 at 10:16, Andrija Panic andrija.pa...@gmail.com wrote:

 Thanks Wido - I will do that.

 On 13 March 2015 at 09:46, Wido den Hollander w...@42on.com wrote:



 On 13-03-15 09:42, Andrija Panic wrote:
  Hi all,
 
  I have set nodeep-scrub and noscrub while I had small/slow hardware for
  the cluster.
  It has been off for a while now.
 
  Now we are upgraded with hardware/networking/SSDs and I would like to
  activate - or unset these flags.
 
  Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I
  was wondering what is the best way to unset flags - meaning if I just
  unset the flags, should I expect that the SCRUB will start all of the
  sudden on all disks - or is there way to let the SCRUB do drives one by
  one...
 

 So, I *think* that unsetting these flags will trigger a big scrub, since
 all PGs have a very old last_scrub_stamp and last_deepscrub_stamp

 You can verify this with:

 $ ceph pg pgid query

 A solution would be to scrub each PG manually first in a timely fashion.

 $ ceph pg scrub pgid

 That way you set the timestamps and slowly scrub each PG.

 When that's done, unset the flags.

 Wido

  In other words - should I expect BIG performance impact ornot ?
 
  Any experience is very appreciated...
 
  Thanks,
 
  --
 
  Andrija Panić
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Interestingthx for that Henrik.

BTW, my placements groups are arround 1800 objects (ceph pg dump) - meainng
max of 7GB od data at the moment,

regular scrub just took 5-10sec to finish. Deep scrub would I guess take
some minutes for sure

What about deepscrub - timestamp is still some months ago, but regular
scrub is fine now with fresh timestamp...?

I don't see max deep scrub setings - or are these settings applied in
general for both kind on scrubs ?

Thanks



On 13 March 2015 at 12:22, Henrik Korkuc li...@kirneh.eu wrote:

  I think that there will be no big scrub, as there are limits of maximum
 scrubs at a time.
 http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing

 If we take osd max scrubs which is 1 by default, then you will not get
 more than 1 scrub per OSD.

 I couldn't quickly find if there are cluster wide limits.


 On 3/13/15 10:46, Wido den Hollander wrote:


 On 13-03-15 09:42, Andrija Panic wrote:

  Hi all,

 I have set nodeep-scrub and noscrub while I had small/slow hardware for
 the cluster.
 It has been off for a while now.

 Now we are upgraded with hardware/networking/SSDs and I would like to
 activate - or unset these flags.

 Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I
 was wondering what is the best way to unset flags - meaning if I just
 unset the flags, should I expect that the SCRUB will start all of the
 sudden on all disks - or is there way to let the SCRUB do drives one by
 one...


  So, I *think* that unsetting these flags will trigger a big scrub, since
 all PGs have a very old last_scrub_stamp and last_deepscrub_stamp

 You can verify this with:

 $ ceph pg pgid query

 A solution would be to scrub each PG manually first in a timely fashion.

 $ ceph pg scrub pgid

 That way you set the timestamps and slowly scrub each PG.

 When that's done, unset the flags.

 Wido


  In other words - should I expect BIG performance impact ornot ?

 Any experience is very appreciated...

 Thanks,

 --

 Andrija Panić


 ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

  ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Hmnice Thx guys


On 13 March 2015 at 12:33, Henrik Korkuc li...@kirneh.eu wrote:

  I think settings apply to both kinds of scrubs


 On 3/13/15 13:31, Andrija Panic wrote:

 Interestingthx for that Henrik.

  BTW, my placements groups are arround 1800 objects (ceph pg dump) -
 meainng max of 7GB od data at the moment,

  regular scrub just took 5-10sec to finish. Deep scrub would I guess take
 some minutes for sure

  What about deepscrub - timestamp is still some months ago, but regular
 scrub is fine now with fresh timestamp...?

  I don't see max deep scrub setings - or are these settings applied in
 general for both kind on scrubs ?

  Thanks



 On 13 March 2015 at 12:22, Henrik Korkuc li...@kirneh.eu wrote:

  I think that there will be no big scrub, as there are limits of maximum
 scrubs at a time.
 http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing

 If we take osd max scrubs which is 1 by default, then you will not get
 more than 1 scrub per OSD.

 I couldn't quickly find if there are cluster wide limits.


 On 3/13/15 10:46, Wido den Hollander wrote:

 On 13-03-15 09:42, Andrija Panic wrote:

  Hi all,

 I have set nodeep-scrub and noscrub while I had small/slow hardware for
 the cluster.
 It has been off for a while now.

 Now we are upgraded with hardware/networking/SSDs and I would like to
 activate - or unset these flags.

 Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I
 was wondering what is the best way to unset flags - meaning if I just
 unset the flags, should I expect that the SCRUB will start all of the
 sudden on all disks - or is there way to let the SCRUB do drives one by
 one...


  So, I *think* that unsetting these flags will trigger a big scrub, since
 all PGs have a very old last_scrub_stamp and last_deepscrub_stamp

 You can verify this with:

 $ ceph pg pgid query

 A solution would be to scrub each PG manually first in a timely fashion.

 $ ceph pg scrub pgid

 That way you set the timestamps and slowly scrub each PG.

 When that's done, unset the flags.

 Wido


  In other words - should I expect BIG performance impact ornot ?

 Any experience is very appreciated...

 Thanks,

 --

 Andrija Panić


 ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

  ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




  --

 Andrija Panić





-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Will do, of course :)

THx Wido for quick help, as always !

On 13 March 2015 at 12:04, Wido den Hollander w...@42on.com wrote:



 On 13-03-15 12:00, Andrija Panic wrote:
  Nice - so I just realized I need to manually scrub 1216 placements
 groups :)
 

 With manual I meant using a script.

 Loop through 'ceph pg dump', get the PGid, issue a scrub, sleep for X
 seconds and issue the next scrub.

 Wido

 
  On 13 March 2015 at 10:16, Andrija Panic andrija.pa...@gmail.com
  mailto:andrija.pa...@gmail.com wrote:
 
  Thanks Wido - I will do that.
 
  On 13 March 2015 at 09:46, Wido den Hollander w...@42on.com
  mailto:w...@42on.com wrote:
 
 
 
  On 13-03-15 09:42, Andrija Panic wrote:
   Hi all,
  
   I have set nodeep-scrub and noscrub while I had small/slow
 hardware for
   the cluster.
   It has been off for a while now.
  
   Now we are upgraded with hardware/networking/SSDs and I would
 like to
   activate - or unset these flags.
  
   Since I now have 3 servers with 12 OSDs each (SSD based
 Journals) - I
   was wondering what is the best way to unset flags - meaning if
 I just
   unset the flags, should I expect that the SCRUB will start all
 of the
   sudden on all disks - or is there way to let the SCRUB do
 drives one by
   one...
  
 
  So, I *think* that unsetting these flags will trigger a big
  scrub, since
  all PGs have a very old last_scrub_stamp and last_deepscrub_stamp
 
  You can verify this with:
 
  $ ceph pg pgid query
 
  A solution would be to scrub each PG manually first in a timely
  fashion.
 
  $ ceph pg scrub pgid
 
  That way you set the timestamps and slowly scrub each PG.
 
  When that's done, unset the flags.
 
  Wido
 
   In other words - should I expect BIG performance impact
 ornot ?
  
   Any experience is very appreciated...
  
   Thanks,
  
   --
  
   Andrija Panić
  
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 
  --
 
  Andrija Panić
 
 
 
 
  --
 
  Andrija Panić




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Thanks Irek.

Does this mean, that after peering for each PG, there will be delay of
10sec, meaning that every once in a while, I will have 10sec od the cluster
NOT being stressed/overloaded, and then the recovery takes place for that
PG, and then another 10sec cluster is fine, and then stressed again ?

I'm trying to understand process before actually doing stuff (config
reference is there on ceph.com but I don't fully understand the process)

Thanks,
Andrija

On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
 config show  | grep osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
 over 37% od the data to rebalance - let's say this is fine (this is when I
 removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 1500
 MB/s - and VMs were unusable completely, and then last 4h of the duration
 of recover this recovery rate went down to, say, 100-200 MB.s and during
 this VM performance was still pretty impacted, but at least I could work
 more or a less

 So my question, is this behaviour expected, is throtling here working as
 expected, since first 1h was almoust no throtling applied if I check the
 recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
 SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Thx Irek. Number of replicas is 3.

I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
decommissioned), which is further connected to a new 10G switch/network
with 3 servers on it with 12 OSDs each.
I'm decommissioning old 3 nodes on 1G network...

So you suggest removing whole node with 2 OSDs manually from crush map?
Per my knowledge, ceph never places 2 replicas on 1 node, all 3 replicas
were originally been distributed over all 3 nodes. So anyway It could be
safe to remove 2 OSDs at once together with the node itself...since replica
count is 3...
?

Thx again for your time
On Mar 3, 2015 1:35 PM, Irek Fasikhov malm...@gmail.com wrote:

 Once you have only three nodes in the cluster.
 I recommend you add new nodes to the cluster, and then delete the old.

 2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com:

 You have a number of replication?

 2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Hi Irek,

 yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
 degraded and moved/recovered.
 When I after that removed it from Crush map ceph osd crush rm id,
 that's when the stuff with 37% happened.

 And thanks Irek for help - could you kindly just let me know of the
 prefered steps when removing whole node?
 Do you mean I first stop all OSDs again, or just remove each OSD from
 crush map, or perhaps, just decompile cursh map, delete the node
 completely, compile back in, and let it heal/recover ?

 Do you think this would result in less data missplaces and moved arround
 ?

 Sorry for bugging you, I really appreaciate your help.

 Thanks

 On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote:

 A large percentage of the rebuild of the cluster map (But low
 percentage degradation). If you had not made ceph osd crush rm id, the
 percentage would be low.
 In your case, the correct option is to remove the entire node, rather
 than each disk individually

 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Another question - I mentioned here 37% of objects being moved arround
 - this is MISPLACED object (degraded objects were 0.001%, after I removed 
 1
 OSD from cursh map (out of 44 OSD or so).

 Can anybody confirm this is normal behaviour - and are there any
 workarrounds ?

 I understand this is because of the object placement algorithm of
 CEPH, but still 37% of object missplaces just by removing 1 OSD from crush
 maps out of 44 make me wonder why this large percentage ?

 Seems not good to me, and I have to remove another 7 OSDs (we are
 demoting some old hardware nodes). This means I can potentialy go with 7 x
 the same number of missplaced objects...?

 Any thoughts ?

 Thanks

 On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com
 wrote:

 Thanks Irek.

 Does this mean, that after peering for each PG, there will be delay
 of 10sec, meaning that every once in a while, I will have 10sec od the
 cluster NOT being stressed/overloaded, and then the recovery takes place
 for that PG, and then another 10sec cluster is fine, and then stressed
 again ?

 I'm trying to understand process before actually doing stuff (config
 reference is there on ceph.com but I don't fully understand the
 process)

 Thanks,
 Andrija

 On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon
 /var/run/ceph/ceph-osd.94.asok config show  | grep 
 osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it
 caused over 37% od the data to rebalance - let's say this is fine 
 (this is
 when I removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 
 1500
 MB/s - and VMs were unusable completely, and then last 4h of the 
 duration
 of recover this recovery rate went down to, say, 100-200 MB.s and 
 during
 this VM performance was still pretty impacted, but at least I could 
 work
 more or a less

 So my question, is this behaviour expected, is throtling here
 working as expected, since first 1h was almoust no throtling applied 
 if I
 check the recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in
 general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on
 one SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users

[ceph-users] [URGENT-HELP] - Ceph rebalancing again after taking OSD out of CRUSH map

2015-03-02 Thread Andrija Panic
Hi people,

I had one OSD crash, so the rebalancing happened - all fine (some 3% of the
data has been moved arround, and rebalanced) and my previous
recovery/backfill throtling was applied fine and we didnt have a unusable
cluster.

Now I used the procedure to remove this crashed OSD comletely from the CEPH
(
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd
)

and when I used the ceph osd crush remove osd.0 command, all of a sudden,
CEPH started to rebalance once again, this time with 37% of the object that
are missplaced and based on the eperience inside VMs, and the Recovery
RAte in MB/s - I can tell that my throtling of backfilling and recovery is
not taken into consideration.

Why is this, 37% of all objects again being moved arround, any help, hint,
explanation greatly appreciated.

This is CEPH 0.87.0 from CEPH repo of course. 42 OSD total after the crash
etc.

The throtling that I have applied from before is like folowing:

ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
ceph tell osd.* injectargs '--osd_max_backfills 1'

Please advise...
Thanks

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [URGENT-HELP] - Ceph rebalancing again after taking OSD out of CRUSH map

2015-03-02 Thread Andrija Panic
OK
thx Wido.

Than can we at least update the documentaiton, that will say MAJOR data
rebalancing will happen AGAIN, and not 3%, but 37% in my case.
Because, I would never run this during work hours, while clients are
hammering VMs...

This reminds me of those tunable changes couple of months ago, when my
cluster completely colapsed during data rebalancing...

I don't see any option to contribute to documentation ?

Best




On 2 March 2015 at 16:07, Wido den Hollander w...@42on.com wrote:

 On 03/02/2015 03:56 PM, Andrija Panic wrote:
  Hi people,
 
  I had one OSD crash, so the rebalancing happened - all fine (some 3% of
 the
  data has been moved arround, and rebalanced) and my previous
  recovery/backfill throtling was applied fine and we didnt have a unusable
  cluster.
 
  Now I used the procedure to remove this crashed OSD comletely from the
 CEPH
  (
 
 http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd
  )
 
  and when I used the ceph osd crush remove osd.0 command, all of a
 sudden,
  CEPH started to rebalance once again, this time with 37% of the object
 that
  are missplaced and based on the eperience inside VMs, and the Recovery
  RAte in MB/s - I can tell that my throtling of backfilling and recovery
 is
  not taken into consideration.
 
  Why is this, 37% of all objects again being moved arround, any help,
 hint,
  explanation greatly appreciated.
 

 This has been discussed a couple of times on the list. If you remove a
 item from the CRUSHMap, although it has a weight of 0, a rebalance still
 happens since the CRUSHMap changes.

  This is CEPH 0.87.0 from CEPH repo of course. 42 OSD total after the
 crash
  etc.
 
  The throtling that I have applied from before is like folowing:
 
  ceph tell osd.* injectargs '--osd_recovery_max_active 1'
  ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
  ceph tell osd.* injectargs '--osd_max_backfills 1'
 
  Please advise...
  Thanks
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
That was my thought, yes - I found this blog that confirms what you are
saying I guess:
http://www.sebastien-han.fr/blog/2012/07/29/tip-ceph-public-slash-private-network-configuration/
I will do that... Thx

I guess it doesnt matter, since my Crush Map will still refernce old OSDs,
that are stoped (and cluster resynced after that) ?

Thx again for the help

On 4 March 2015 at 17:44, Robert LeBlanc rob...@leblancnet.us wrote:

 If I remember right, someone has done this on a live cluster without
 any issues. I seem to remember that it had a fallback mechanism if the
 OSDs couldn't be reached on the cluster network to contact them on the
 public network. You could test it pretty easily without much impact.
 Take one OSD that has both networks and configure it and restart the
 process. If all the nodes (specifically the old ones with only one
 network) is able to connect to it, then you are good to go by
 restarting one OSD at a time.

 On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  Hi,
 
  I'm having a live cluster with only public network (so no explicit
 network
  configuraion in the ceph.conf file)
 
  I'm wondering what is the procedure to implement dedicated
  Replication/Private and Public network.
  I've read the manual, know how to do it in ceph.conf, but I'm wondering
  since this is already running cluster - what should I do after I change
  ceph.conf on all nodes ?
  Restarting OSDs one by one, or... ? Is there any downtime expected ? -
 for
  the replication network to actually imlemented completely.
 
 
  Another related quetion:
 
  Also, I'm demoting some old OSDs, on old servers, I will have them all
  stoped, but would like to implement replication network before actually
  removing old OSDs from crush map - since lot of data will be moved
 arround.
 
  My old nodes/OSDs (that will be stoped before I implement replication
  network) - do NOT have dedicated NIC for replication network, in
 contrast to
  new nodes/OSDs. So there will be still reference to these old OSD in the
  crush map.
  Will this be a problem - me changing/implementing replication network
 that
  WILL work on new nodes/OSDs, but not on old ones since they don't have
  dedicated NIC ? I guess not since old OSDs are stoped anyway, but would
 like
  opinion.
 
  Or perhaps i might remove OSD from crush map with prior seting of
  nobackfill and   norecover (so no rebalancing happens) and then implement
  replication netwotk?
 
 
  Sorry for old post, but...
 
  Thanks,
  --
 
  Andrija Panić
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
Thx Wido, I needed this confirmations - thanks!

On 4 March 2015 at 17:49, Wido den Hollander w...@42on.com wrote:

 On 03/04/2015 05:44 PM, Robert LeBlanc wrote:
  If I remember right, someone has done this on a live cluster without
  any issues. I seem to remember that it had a fallback mechanism if the
  OSDs couldn't be reached on the cluster network to contact them on the
  public network. You could test it pretty easily without much impact.
  Take one OSD that has both networks and configure it and restart the
  process. If all the nodes (specifically the old ones with only one
  network) is able to connect to it, then you are good to go by
  restarting one OSD at a time.
 

 In the OSDMap each OSD has a public and cluster network address. If the
 cluster network address is not set, replication to that OSD will be done
 over the public network.

 So you can push a new configuration to all OSDs and restart them one by
 one.

 Make sure the network is ofcourse up and running and it should work.

  On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  Hi,
 
  I'm having a live cluster with only public network (so no explicit
 network
  configuraion in the ceph.conf file)
 
  I'm wondering what is the procedure to implement dedicated
  Replication/Private and Public network.
  I've read the manual, know how to do it in ceph.conf, but I'm wondering
  since this is already running cluster - what should I do after I change
  ceph.conf on all nodes ?
  Restarting OSDs one by one, or... ? Is there any downtime expected ? -
 for
  the replication network to actually imlemented completely.
 
 
  Another related quetion:
 
  Also, I'm demoting some old OSDs, on old servers, I will have them all
  stoped, but would like to implement replication network before actually
  removing old OSDs from crush map - since lot of data will be moved
 arround.
 
  My old nodes/OSDs (that will be stoped before I implement replication
  network) - do NOT have dedicated NIC for replication network, in
 contrast to
  new nodes/OSDs. So there will be still reference to these old OSD in the
  crush map.
  Will this be a problem - me changing/implementing replication network
 that
  WILL work on new nodes/OSDs, but not on old ones since they don't have
  dedicated NIC ? I guess not since old OSDs are stoped anyway, but would
 like
  opinion.
 
  Or perhaps i might remove OSD from crush map with prior seting of
  nobackfill and   norecover (so no rebalancing happens) and then
 implement
  replication netwotk?
 
 
  Sorry for old post, but...
 
  Thanks,
  --
 
  Andrija Panić
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
Thx again - I really appreciatethe help guys !

On 4 March 2015 at 17:51, Robert LeBlanc rob...@leblancnet.us wrote:

 If the data have been replicated to new OSDs, it will be able to
 function properly even them them down or only on the public network.

 On Wed, Mar 4, 2015 at 9:49 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  I guess it doesnt matter, since my Crush Map will still refernce old
 OSDs,
  that are stoped (and cluster resynced after that) ?
 
  I wanted to say: it doesnt matter (I guess?) that my Crush map is still
  referencing old OSD nodes that are already stoped. Tired, sorry...
 
  On 4 March 2015 at 17:48, Andrija Panic andrija.pa...@gmail.com wrote:
 
  That was my thought, yes - I found this blog that confirms what you are
  saying I guess:
 
 http://www.sebastien-han.fr/blog/2012/07/29/tip-ceph-public-slash-private-network-configuration/
  I will do that... Thx
 
  I guess it doesnt matter, since my Crush Map will still refernce old
 OSDs,
  that are stoped (and cluster resynced after that) ?
 
  Thx again for the help
 
  On 4 March 2015 at 17:44, Robert LeBlanc rob...@leblancnet.us wrote:
 
  If I remember right, someone has done this on a live cluster without
  any issues. I seem to remember that it had a fallback mechanism if the
  OSDs couldn't be reached on the cluster network to contact them on the
  public network. You could test it pretty easily without much impact.
  Take one OSD that has both networks and configure it and restart the
  process. If all the nodes (specifically the old ones with only one
  network) is able to connect to it, then you are good to go by
  restarting one OSD at a time.
 
  On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic andrija.pa...@gmail.com
 
  wrote:
   Hi,
  
   I'm having a live cluster with only public network (so no explicit
   network
   configuraion in the ceph.conf file)
  
   I'm wondering what is the procedure to implement dedicated
   Replication/Private and Public network.
   I've read the manual, know how to do it in ceph.conf, but I'm
 wondering
   since this is already running cluster - what should I do after I
 change
   ceph.conf on all nodes ?
   Restarting OSDs one by one, or... ? Is there any downtime expected ?
 -
   for
   the replication network to actually imlemented completely.
  
  
   Another related quetion:
  
   Also, I'm demoting some old OSDs, on old servers, I will have them
 all
   stoped, but would like to implement replication network before
 actually
   removing old OSDs from crush map - since lot of data will be moved
   arround.
  
   My old nodes/OSDs (that will be stoped before I implement replication
   network) - do NOT have dedicated NIC for replication network, in
   contrast to
   new nodes/OSDs. So there will be still reference to these old OSD in
   the
   crush map.
   Will this be a problem - me changing/implementing replication network
   that
   WILL work on new nodes/OSDs, but not on old ones since they don't
 have
   dedicated NIC ? I guess not since old OSDs are stoped anyway, but
 would
   like
   opinion.
  
   Or perhaps i might remove OSD from crush map with prior seting of
   nobackfill and   norecover (so no rebalancing happens) and then
   implement
   replication netwotk?
  
  
   Sorry for old post, but...
  
   Thanks,
   --
  
   Andrija Panić
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
 
 
 
  --
 
  Andrija Panić
 
 
 
 
  --
 
  Andrija Panić




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
I guess it doesnt matter, since my Crush Map will still refernce old OSDs,
that are stoped (and cluster resynced after that) ?

I wanted to say: it doesnt matter (I guess?) that my Crush map is still
referencing old OSD nodes that are already stoped. Tired, sorry...

On 4 March 2015 at 17:48, Andrija Panic andrija.pa...@gmail.com wrote:

 That was my thought, yes - I found this blog that confirms what you are
 saying I guess:
 http://www.sebastien-han.fr/blog/2012/07/29/tip-ceph-public-slash-private-network-configuration/
 I will do that... Thx

 I guess it doesnt matter, since my Crush Map will still refernce old OSDs,
 that are stoped (and cluster resynced after that) ?

 Thx again for the help

 On 4 March 2015 at 17:44, Robert LeBlanc rob...@leblancnet.us wrote:

 If I remember right, someone has done this on a live cluster without
 any issues. I seem to remember that it had a fallback mechanism if the
 OSDs couldn't be reached on the cluster network to contact them on the
 public network. You could test it pretty easily without much impact.
 Take one OSD that has both networks and configure it and restart the
 process. If all the nodes (specifically the old ones with only one
 network) is able to connect to it, then you are good to go by
 restarting one OSD at a time.

 On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  Hi,
 
  I'm having a live cluster with only public network (so no explicit
 network
  configuraion in the ceph.conf file)
 
  I'm wondering what is the procedure to implement dedicated
  Replication/Private and Public network.
  I've read the manual, know how to do it in ceph.conf, but I'm wondering
  since this is already running cluster - what should I do after I change
  ceph.conf on all nodes ?
  Restarting OSDs one by one, or... ? Is there any downtime expected ? -
 for
  the replication network to actually imlemented completely.
 
 
  Another related quetion:
 
  Also, I'm demoting some old OSDs, on old servers, I will have them all
  stoped, but would like to implement replication network before actually
  removing old OSDs from crush map - since lot of data will be moved
 arround.
 
  My old nodes/OSDs (that will be stoped before I implement replication
  network) - do NOT have dedicated NIC for replication network, in
 contrast to
  new nodes/OSDs. So there will be still reference to these old OSD in the
  crush map.
  Will this be a problem - me changing/implementing replication network
 that
  WILL work on new nodes/OSDs, but not on old ones since they don't have
  dedicated NIC ? I guess not since old OSDs are stoped anyway, but would
 like
  opinion.
 
  Or perhaps i might remove OSD from crush map with prior seting of
  nobackfill and   norecover (so no rebalancing happens) and then
 implement
  replication netwotk?
 
 
  Sorry for old post, but...
 
  Thanks,
  --
 
  Andrija Panić
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 




 --

 Andrija Panić




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-04 Thread Andrija Panic
Hi Robert,

I already have this stuff set. CEph is 0.87.0 now...

Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
move data in less than 8h per my last experineced that was arround8h, but
some 1G OSDs were included...

Thx!

On 4 March 2015 at 17:49, Robert LeBlanc rob...@leblancnet.us wrote:

 You will most likely have a very high relocation percentage. Backfills
 always are more impactful on smaller clusters, but osd max backfills
 should be what you need to help reduce the impact. The default is 10,
 you will want to use 1.

 I didn't catch which version of Ceph you are running, but I think
 there was some priority work done in firefly to help make backfills
 lower priority. I think it has gotten better in later versions.

 On Wed, Mar 4, 2015 at 1:35 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  Thank you Rober - I'm wondering when I do remove total of 7 OSDs from
 crush
  map - weather that will cause more than 37% of data moved (80% or
 whatever)
 
  I'm also wondering if the thortling that I applied is fine or not - I
 will
  introduce the osd_recovery_delay_start 10sec as Irek said.
 
  I'm just wondering hom much will be the performance impact, because:
  - when stoping OSD, the impact while backfilling was fine more or a less
 - I
  can leave with this
  - when I removed OSD from cursh map - first 1h or so, impact was
 tremendous,
  and later on during recovery process impact was much less but still
  noticable...
 
  Thanks for the tip of course !
  Andrija
 
  On 3 March 2015 at 18:34, Robert LeBlanc rob...@leblancnet.us wrote:
 
  I would be inclined to shut down both OSDs in a node, let the cluster
  recover. Once it is recovered, shut down the next two, let it recover.
  Repeat until all the OSDs are taken out of the cluster. Then I would
  set nobackfill and norecover. Then remove the hosts/disks from the
  CRUSH then unset nobackfill and norecover.
 
  That should give you a few small changes (when you shut down OSDs) and
  then one big one to get everything in the final place. If you are
  still adding new nodes, when nobackfill and norecover is set, you can
  add them in so that the one big relocate fills the new drives too.
 
  On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic andrija.pa...@gmail.com
  wrote:
   Thx Irek. Number of replicas is 3.
  
   I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
   decommissioned), which is further connected to a new 10G
 switch/network
   with
   3 servers on it with 12 OSDs each.
   I'm decommissioning old 3 nodes on 1G network...
  
   So you suggest removing whole node with 2 OSDs manually from crush
 map?
   Per my knowledge, ceph never places 2 replicas on 1 node, all 3
 replicas
   were originally been distributed over all 3 nodes. So anyway It could
 be
   safe to remove 2 OSDs at once together with the node itself...since
   replica
   count is 3...
   ?
  
   Thx again for your time
  
   On Mar 3, 2015 1:35 PM, Irek Fasikhov malm...@gmail.com wrote:
  
   Once you have only three nodes in the cluster.
   I recommend you add new nodes to the cluster, and then delete the
 old.
  
   2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com:
  
   You have a number of replication?
  
   2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:
  
   Hi Irek,
  
   yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
   degraded and moved/recovered.
   When I after that removed it from Crush map ceph osd crush rm id,
   that's when the stuff with 37% happened.
  
   And thanks Irek for help - could you kindly just let me know of the
   prefered steps when removing whole node?
   Do you mean I first stop all OSDs again, or just remove each OSD
 from
   crush map, or perhaps, just decompile cursh map, delete the node
   completely,
   compile back in, and let it heal/recover ?
  
   Do you think this would result in less data missplaces and moved
   arround
   ?
  
   Sorry for bugging you, I really appreaciate your help.
  
   Thanks
  
   On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote:
  
   A large percentage of the rebuild of the cluster map (But low
   percentage degradation). If you had not made ceph osd crush rm
 id,
   the
   percentage would be low.
   In your case, the correct option is to remove the entire node,
   rather
   than each disk individually
  
   2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com
 :
  
   Another question - I mentioned here 37% of objects being moved
   arround
   - this is MISPLACED object (degraded objects were 0.001%, after I
   removed 1
   OSD from cursh map (out of 44 OSD or so).
  
   Can anybody confirm this is normal behaviour - and are there any
   workarrounds ?
  
   I understand this is because of the object placement algorithm of
   CEPH, but still 37% of object missplaces just by removing 1 OSD
   from crush
   maps out of 44 make me wonder why this large percentage ?
  
   Seems not good

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-04 Thread Andrija Panic
Thank you Rober - I'm wondering when I do remove total of 7 OSDs from crush
map - weather that will cause more than 37% of data moved (80% or whatever)

I'm also wondering if the thortling that I applied is fine or not - I will
introduce the osd_recovery_delay_start 10sec as Irek said.

I'm just wondering hom much will be the performance impact, because:
- when stoping OSD, the impact while backfilling was fine more or a less -
I can leave with this
- when I removed OSD from cursh map - first 1h or so, impact was
tremendous, and later on during recovery process impact was much less but
still noticable...

Thanks for the tip of course !
Andrija

On 3 March 2015 at 18:34, Robert LeBlanc rob...@leblancnet.us wrote:

 I would be inclined to shut down both OSDs in a node, let the cluster
 recover. Once it is recovered, shut down the next two, let it recover.
 Repeat until all the OSDs are taken out of the cluster. Then I would
 set nobackfill and norecover. Then remove the hosts/disks from the
 CRUSH then unset nobackfill and norecover.

 That should give you a few small changes (when you shut down OSDs) and
 then one big one to get everything in the final place. If you are
 still adding new nodes, when nobackfill and norecover is set, you can
 add them in so that the one big relocate fills the new drives too.

 On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  Thx Irek. Number of replicas is 3.
 
  I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
  decommissioned), which is further connected to a new 10G switch/network
 with
  3 servers on it with 12 OSDs each.
  I'm decommissioning old 3 nodes on 1G network...
 
  So you suggest removing whole node with 2 OSDs manually from crush map?
  Per my knowledge, ceph never places 2 replicas on 1 node, all 3 replicas
  were originally been distributed over all 3 nodes. So anyway It could be
  safe to remove 2 OSDs at once together with the node itself...since
 replica
  count is 3...
  ?
 
  Thx again for your time
 
  On Mar 3, 2015 1:35 PM, Irek Fasikhov malm...@gmail.com wrote:
 
  Once you have only three nodes in the cluster.
  I recommend you add new nodes to the cluster, and then delete the old.
 
  2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com:
 
  You have a number of replication?
 
  2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:
 
  Hi Irek,
 
  yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
  degraded and moved/recovered.
  When I after that removed it from Crush map ceph osd crush rm id,
  that's when the stuff with 37% happened.
 
  And thanks Irek for help - could you kindly just let me know of the
  prefered steps when removing whole node?
  Do you mean I first stop all OSDs again, or just remove each OSD from
  crush map, or perhaps, just decompile cursh map, delete the node
 completely,
  compile back in, and let it heal/recover ?
 
  Do you think this would result in less data missplaces and moved
 arround
  ?
 
  Sorry for bugging you, I really appreaciate your help.
 
  Thanks
 
  On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote:
 
  A large percentage of the rebuild of the cluster map (But low
  percentage degradation). If you had not made ceph osd crush rm id,
 the
  percentage would be low.
  In your case, the correct option is to remove the entire node, rather
  than each disk individually
 
  2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:
 
  Another question - I mentioned here 37% of objects being moved
 arround
  - this is MISPLACED object (degraded objects were 0.001%, after I
 removed 1
  OSD from cursh map (out of 44 OSD or so).
 
  Can anybody confirm this is normal behaviour - and are there any
  workarrounds ?
 
  I understand this is because of the object placement algorithm of
  CEPH, but still 37% of object missplaces just by removing 1 OSD
 from crush
  maps out of 44 make me wonder why this large percentage ?
 
  Seems not good to me, and I have to remove another 7 OSDs (we are
  demoting some old hardware nodes). This means I can potentialy go
 with 7 x
  the same number of missplaced objects...?
 
  Any thoughts ?
 
  Thanks
 
  On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com
  wrote:
 
  Thanks Irek.
 
  Does this mean, that after peering for each PG, there will be delay
  of 10sec, meaning that every once in a while, I will have 10sec od
 the
  cluster NOT being stressed/overloaded, and then the recovery takes
 place for
  that PG, and then another 10sec cluster is fine, and then stressed
 again ?
 
  I'm trying to understand process before actually doing stuff
 (config
  reference is there on ceph.com but I don't fully understand the
 process)
 
  Thanks,
  Andrija
 
  On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:
 
  Hi.
 
  Use value osd_recovery_delay_start
  example:
  [root@ceph08 ceph]# ceph --admin-daemon
  /var/run/ceph/ceph-osd.94.asok config show

[ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
Hi,

I'm having a live cluster with only public network (so no explicit network
configuraion in the ceph.conf file)

I'm wondering what is the procedure to implement dedicated
Replication/Private and Public network.
I've read the manual, know how to do it in ceph.conf, but I'm wondering
since this is already running cluster - what should I do after I change
ceph.conf on all nodes ?
Restarting OSDs one by one, or... ? Is there any downtime expected ? - for
the replication network to actually imlemented completely.


Another related quetion:

Also, I'm demoting some old OSDs, on old servers, I will have them all
stoped, but would like to implement replication network before actually
removing old OSDs from crush map - since lot of data will be moved arround.

My old nodes/OSDs (that will be stoped before I implement replication
network) - do NOT have dedicated NIC for replication network, in contrast
to new nodes/OSDs. So there will be still reference to these old OSD in the
crush map.
Will this be a problem - me changing/implementing replication network that
WILL work on new nodes/OSDs, but not on old ones since they don't have
dedicated NIC ? I guess not since old OSDs are stoped anyway, but would
like opinion.

Or perhaps i might remove OSD from crush map with prior seting of
 nobackfill and   norecover (so no rebalancing happens) and then implement
replication netwotk?


Sorry for old post, but...

Thanks,
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Another question - I mentioned here 37% of objects being moved arround -
this is MISPLACED object (degraded objects were 0.001%, after I removed 1
OSD from cursh map (out of 44 OSD or so).

Can anybody confirm this is normal behaviour - and are there any
workarrounds ?

I understand this is because of the object placement algorithm of CEPH, but
still 37% of object missplaces just by removing 1 OSD from crush maps out
of 44 make me wonder why this large percentage ?

Seems not good to me, and I have to remove another 7 OSDs (we are demoting
some old hardware nodes). This means I can potentialy go with 7 x the same
number of missplaced objects...?

Any thoughts ?

Thanks

On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote:

 Thanks Irek.

 Does this mean, that after peering for each PG, there will be delay of
 10sec, meaning that every once in a while, I will have 10sec od the cluster
 NOT being stressed/overloaded, and then the recovery takes place for that
 PG, and then another 10sec cluster is fine, and then stressed again ?

 I'm trying to understand process before actually doing stuff (config
 reference is there on ceph.com but I don't fully understand the process)

 Thanks,
 Andrija

 On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
 config show  | grep osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
 over 37% od the data to rebalance - let's say this is fine (this is when I
 removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 1500
 MB/s - and VMs were unusable completely, and then last 4h of the duration
 of recover this recovery rate went down to, say, 100-200 MB.s and during
 this VM performance was still pretty impacted, but at least I could work
 more or a less

 So my question, is this behaviour expected, is throtling here working as
 expected, since first 1h was almoust no throtling applied if I check the
 recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
 SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




 --

 Andrija Panić




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Hi Irek,

yes, stoping OSD (or seting it to OUT) resulted in only 3% of data degraded
and moved/recovered.
When I after that removed it from Crush map ceph osd crush rm id, that's
when the stuff with 37% happened.

And thanks Irek for help - could you kindly just let me know of the
prefered steps when removing whole node?
Do you mean I first stop all OSDs again, or just remove each OSD from crush
map, or perhaps, just decompile cursh map, delete the node completely,
compile back in, and let it heal/recover ?

Do you think this would result in less data missplaces and moved arround ?

Sorry for bugging you, I really appreaciate your help.

Thanks

On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote:

 A large percentage of the rebuild of the cluster map (But low percentage
 degradation). If you had not made ceph osd crush rm id, the percentage
 would be low.
 In your case, the correct option is to remove the entire node, rather than
 each disk individually

 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Another question - I mentioned here 37% of objects being moved arround -
 this is MISPLACED object (degraded objects were 0.001%, after I removed 1
 OSD from cursh map (out of 44 OSD or so).

 Can anybody confirm this is normal behaviour - and are there any
 workarrounds ?

 I understand this is because of the object placement algorithm of CEPH,
 but still 37% of object missplaces just by removing 1 OSD from crush maps
 out of 44 make me wonder why this large percentage ?

 Seems not good to me, and I have to remove another 7 OSDs (we are
 demoting some old hardware nodes). This means I can potentialy go with 7 x
 the same number of missplaced objects...?

 Any thoughts ?

 Thanks

 On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote:

 Thanks Irek.

 Does this mean, that after peering for each PG, there will be delay of
 10sec, meaning that every once in a while, I will have 10sec od the cluster
 NOT being stressed/overloaded, and then the recovery takes place for that
 PG, and then another 10sec cluster is fine, and then stressed again ?

 I'm trying to understand process before actually doing stuff (config
 reference is there on ceph.com but I don't fully understand the process)

 Thanks,
 Andrija

 On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
 config show  | grep osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
 over 37% od the data to rebalance - let's say this is fine (this is when I
 removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 1500
 MB/s - and VMs were unusable completely, and then last 4h of the duration
 of recover this recovery rate went down to, say, 100-200 MB.s and during
 this VM performance was still pretty impacted, but at least I could work
 more or a less

 So my question, is this behaviour expected, is throtling here working
 as expected, since first 1h was almoust no throtling applied if I check 
 the
 recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in
 general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
 SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




 --

 Andrija Panić




 --

 Andrija Panić




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
HI Guys,

I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over
37% od the data to rebalance - let's say this is fine (this is when I
removed it frm Crush Map).

I'm wondering - I have previously set some throtling mechanism, but during
first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s -
and VMs were unusable completely, and then last 4h of the duration of
recover this recovery rate went down to, say, 100-200 MB.s and during this
VM performance was still pretty impacted, but at least I could work more or
a less

So my question, is this behaviour expected, is throtling here working as
expected, since first 1h was almoust no throtling applied if I check the
recovery rate 1500MB/s and the impact on Vms.
And last 4h seemed pretty fine (although still lot of impact in general)

I changed these throtling on the fly with:

ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
ceph tell osd.* injectargs '--osd_max_backfills 1'

My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD,
6 journals on another SSD)  - I have 3 of these hosts.

Any thought are welcome.
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-05-06 Thread Andrija Panic
Well, seems like they are on satellite :)

On 6 May 2015 at 02:58, Matthew Monaco m...@monaco.cx wrote:

 On 05/05/2015 08:55 AM, Andrija Panic wrote:
  Hi,
 
  small update:
 
  in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in
  between of each SSD death) - cant believe it - NOT due to wearing out...
 I
  really hope we got efective series from suplier...
 

 That's ridiculous. Are these drives mounted un-shielded on a satellite? I
 didn't
 know the ISS had a ceph cluster.



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-05-05 Thread Andrija Panic
Hi,

small update:

in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in
between of each SSD death) - cant believe it - NOT due to wearing out... I
really hope we got efective series from suplier...

Regards

On 18 April 2015 at 14:24, Andrija Panic andrija.pa...@gmail.com wrote:

 yes I know, but to late now, I'm afraid :)

 On 18 April 2015 at 14:18, Josef Johansson jose...@gmail.com wrote:

 Have you looked into the samsung 845 dc? They are not that expensive last
 time I checked.

 /Josef
 On 18 Apr 2015 13:15, Andrija Panic andrija.pa...@gmail.com wrote:

 might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but
 these have horrible random/sequetial speeds - Samsun 850 PROs are 3 times
 at least faster on sequential, and more than 3 times faser on random/IOPS
 measures.
 And ofcourse modern enterprise drives = ...

 On 18 April 2015 at 12:42, Mark Kirkwood mark.kirkw...@catalyst.net.nz
 wrote:

 Yes, it sure is - my experience with 'consumer' SSD is that they die
 with obscure firmware bugs (wrong capacity, zero capacity, not detected in
 bios anymore) rather than flash wearout. It seems that the 'enterprise'
 tagged drives are less inclined to suffer this fate.

 Regards

 Mark

 On 18/04/15 22:23, Andrija Panic wrote:

 these 2 drives, are on the regular SATA (on board)controler, and beside
 this, there is 12 x 4TB on the fron of the servers - normal backplane
 on
 the front.

 Anyway, we are going to check those dead SSDs on a pc/laptop or so,just
 to confirm they are really dead - but this is the way they die, not
 wear
 out, but simply show different space instead of real one - thse were 3
 months old only when they died...

 On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com
 mailto:jose...@gmail.com wrote:

 If the same chassi/chip/backplane is behind both drives and maybe
 other drives in the chassi have troubles,it may be a defect there
 as
 well.

 On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com
 mailto:ste...@me.com wrote:


   On 17/04/2015, at 21.07, Andrija Panic
 andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com
 wrote:
  
   nahSamsun 850 PRO 128GB - dead after 3months - 2 of
 these
 died... wearing level is 96%, so only 4% wasted... (yes I know
 these are not enterprise,etc… )
 Damn… but maybe your surname says it all - Don’t Panic :) But
 making sure same type of SSD devices ain’t of near same age and
 doing preventive replacement rotation might be good practice I
 guess.

 /Steffen

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





 --

 Andrija Panić




 --

 Andrija Panić




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
Hi guys,

I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down,
ceph rebalanced etc.

Now I have new SSD inside, and I will partition it etc - but would like to
know, how to proceed now, with the journal recreation for those 6 OSDs that
are down now.

Should I flush journal (where to, journals doesnt still exist...?), or just
recreate journal from scratch (making symboliv links again: ln -s
/dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.

I expect the folowing procedure, but would like confirmation please:

rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
ceph-osd -i $ID --mkjournal
ll /var/lib/ceph/osd/ceph-$ID/journal
service ceph start osd.$ID

Any thought greatly appreciated !

Thanks,

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are
down, and rebalancing is about finish... after which I need to fix the OSDs.

On 17 April 2015 at 19:01, Josef Johansson jo...@oderland.se wrote:

 Hi,

 Did 6 other OSDs go down when re-adding?

 /Josef

 On 17 Apr 2015, at 18:49, Andrija Panic andrija.pa...@gmail.com wrote:

 12 osds down - I expect less work with removing and adding osd?
 On Apr 17, 2015 6:35 PM, Krzysztof Nowicki 
 krzysztof.a.nowi...@gmail.com wrote:

 Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the
 existing OSD UUID, copy the keyring and let it populate itself?

 pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic 
 andrija.pa...@gmail.com napisał:

 Thx guys, thats what I will be doing at the end.

 Cheers
 On Apr 17, 2015 6:24 PM, Robert LeBlanc rob...@leblancnet.us wrote:

 Delete and re-add all six OSDs.

 On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic andrija.pa...@gmail.com
  wrote:

 Hi guys,

 I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD
 down, ceph rebalanced etc.

 Now I have new SSD inside, and I will partition it etc - but would
 like to know, how to proceed now, with the journal recreation for those 6
 OSDs that are down now.

 Should I flush journal (where to, journals doesnt still exist...?), or
 just recreate journal from scratch (making symboliv links again: ln -s
 /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.

 I expect the folowing procedure, but would like confirmation please:

 rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
 ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
 ceph-osd -i $ID --mkjournal
 ll /var/lib/ceph/osd/ceph-$ID/journal
 service ceph start osd.$ID

 Any thought greatly appreciated !

 Thanks,

 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


  ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

  ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
Thx guys, thats what I will be doing at the end.

Cheers
On Apr 17, 2015 6:24 PM, Robert LeBlanc rob...@leblancnet.us wrote:

 Delete and re-add all six OSDs.

 On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:

 Hi guys,

 I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down,
 ceph rebalanced etc.

 Now I have new SSD inside, and I will partition it etc - but would like
 to know, how to proceed now, with the journal recreation for those 6 OSDs
 that are down now.

 Should I flush journal (where to, journals doesnt still exist...?), or
 just recreate journal from scratch (making symboliv links again: ln -s
 /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.

 I expect the folowing procedure, but would like confirmation please:

 rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
 ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
 ceph-osd -i $ID --mkjournal
 ll /var/lib/ceph/osd/ceph-$ID/journal
 service ceph start osd.$ID

 Any thought greatly appreciated !

 Thanks,

 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
12 osds down - I expect less work with removing and adding osd?
On Apr 17, 2015 6:35 PM, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com
wrote:

 Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the
 existing OSD UUID, copy the keyring and let it populate itself?

 pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic andrija.pa...@gmail.com
 napisał:

 Thx guys, thats what I will be doing at the end.

 Cheers
 On Apr 17, 2015 6:24 PM, Robert LeBlanc rob...@leblancnet.us wrote:

 Delete and re-add all six OSDs.

 On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:

 Hi guys,

 I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down,
 ceph rebalanced etc.

 Now I have new SSD inside, and I will partition it etc - but would like
 to know, how to proceed now, with the journal recreation for those 6 OSDs
 that are down now.

 Should I flush journal (where to, journals doesnt still exist...?), or
 just recreate journal from scratch (making symboliv links again: ln -s
 /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.

 I expect the folowing procedure, but would like confirmation please:

 rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
 ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
 ceph-osd -i $ID --mkjournal
 ll /var/lib/ceph/osd/ceph-$ID/journal
 service ceph start osd.$ID

 Any thought greatly appreciated !

 Thanks,

 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


  ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died...
wearing level is 96%, so only 4% wasted... (yes I know these are not
enterprise,etc... )

On 17 April 2015 at 21:01, Josef Johansson jose...@gmail.com wrote:

 tough luck, hope everything comes up ok afterwards. What models on the SSD?

 /Josef
 On 17 Apr 2015 20:05, Andrija Panic andrija.pa...@gmail.com wrote:

 SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are
 down, and rebalancing is about finish... after which I need to fix the OSDs.

 On 17 April 2015 at 19:01, Josef Johansson jo...@oderland.se wrote:

 Hi,

 Did 6 other OSDs go down when re-adding?

 /Josef

 On 17 Apr 2015, at 18:49, Andrija Panic andrija.pa...@gmail.com wrote:

 12 osds down - I expect less work with removing and adding osd?
 On Apr 17, 2015 6:35 PM, Krzysztof Nowicki 
 krzysztof.a.nowi...@gmail.com wrote:

 Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the
 existing OSD UUID, copy the keyring and let it populate itself?

 pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic 
 andrija.pa...@gmail.com napisał:

 Thx guys, thats what I will be doing at the end.

 Cheers
 On Apr 17, 2015 6:24 PM, Robert LeBlanc rob...@leblancnet.us
 wrote:

 Delete and re-add all six OSDs.

 On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic 
 andrija.pa...@gmail.com wrote:

 Hi guys,

 I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD
 down, ceph rebalanced etc.

 Now I have new SSD inside, and I will partition it etc - but would
 like to know, how to proceed now, with the journal recreation for those 
 6
 OSDs that are down now.

 Should I flush journal (where to, journals doesnt still exist...?),
 or just recreate journal from scratch (making symboliv links again: ln 
 -s
 /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs.

 I expect the folowing procedure, but would like confirmation please:

 rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
 ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
 ceph-osd -i $ID --mkjournal
 ll /var/lib/ceph/osd/ceph-$ID/journal
 service ceph start osd.$ID

 Any thought greatly appreciated !

 Thanks,

 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


  ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

  ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
damn, good news for me, pssibly bad news for you :)
what is wearing level (samrtctl -a /dev/sdX) - attribute near the end of
the atribute list...

thx

On 17 April 2015 at 21:12, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com
wrote:

 I have two of them in my cluster (plus one 256GB version) for about half a
 year now. So far so good. I'll be keeping a closer look at them.

 pt., 17 kwi 2015, 21:07 Andrija Panic użytkownik andrija.pa...@gmail.com
 napisał:

 nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died...
 wearing level is 96%, so only 4% wasted... (yes I know these are not
 enterprise,etc... )

 On 17 April 2015 at 21:01, Josef Johansson jose...@gmail.com wrote:

 tough luck, hope everything comes up ok afterwards. What models on the
 SSD?

 /Josef
 On 17 Apr 2015 20:05, Andrija Panic andrija.pa...@gmail.com wrote:

 SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are
 down, and rebalancing is about finish... after which I need to fix the 
 OSDs.

 On 17 April 2015 at 19:01, Josef Johansson jo...@oderland.se wrote:

 Hi,

 Did 6 other OSDs go down when re-adding?

 /Josef

 On 17 Apr 2015, at 18:49, Andrija Panic andrija.pa...@gmail.com
 wrote:

 12 osds down - I expect less work with removing and adding osd?
 On Apr 17, 2015 6:35 PM, Krzysztof Nowicki 
 krzysztof.a.nowi...@gmail.com wrote:

 Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with
 the existing OSD UUID, copy the keyring and let it populate itself?

 pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic 
 andrija.pa...@gmail.com napisał:

 Thx guys, thats what I will be doing at the end.

 Cheers
 On Apr 17, 2015 6:24 PM, Robert LeBlanc rob...@leblancnet.us
 wrote:

 Delete and re-add all six OSDs.

 On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic 
 andrija.pa...@gmail.com wrote:

 Hi guys,

 I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD
 down, ceph rebalanced etc.

 Now I have new SSD inside, and I will partition it etc - but would
 like to know, how to proceed now, with the journal recreation for 
 those 6
 OSDs that are down now.

 Should I flush journal (where to, journals doesnt still
 exist...?), or just recreate journal from scratch (making symboliv 
 links
 again: ln -s /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and
 starting OSDs.

 I expect the folowing procedure, but would like confirmation
 please:

 rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link)
 ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal
 ceph-osd -i $ID --mkjournal
 ll /var/lib/ceph/osd/ceph-$ID/journal
 service ceph start osd.$ID

 Any thought greatly appreciated !

 Thanks,

 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


  ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

  ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
yes I know, but to late now, I'm afraid :)

On 18 April 2015 at 14:18, Josef Johansson jose...@gmail.com wrote:

 Have you looked into the samsung 845 dc? They are not that expensive last
 time I checked.

 /Josef
 On 18 Apr 2015 13:15, Andrija Panic andrija.pa...@gmail.com wrote:

 might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but
 these have horrible random/sequetial speeds - Samsun 850 PROs are 3 times
 at least faster on sequential, and more than 3 times faser on random/IOPS
 measures.
 And ofcourse modern enterprise drives = ...

 On 18 April 2015 at 12:42, Mark Kirkwood mark.kirkw...@catalyst.net.nz
 wrote:

 Yes, it sure is - my experience with 'consumer' SSD is that they die
 with obscure firmware bugs (wrong capacity, zero capacity, not detected in
 bios anymore) rather than flash wearout. It seems that the 'enterprise'
 tagged drives are less inclined to suffer this fate.

 Regards

 Mark

 On 18/04/15 22:23, Andrija Panic wrote:

 these 2 drives, are on the regular SATA (on board)controler, and beside
 this, there is 12 x 4TB on the fron of the servers - normal backplane on
 the front.

 Anyway, we are going to check those dead SSDs on a pc/laptop or so,just
 to confirm they are really dead - but this is the way they die, not wear
 out, but simply show different space instead of real one - thse were 3
 months old only when they died...

 On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com
 mailto:jose...@gmail.com wrote:

 If the same chassi/chip/backplane is behind both drives and maybe
 other drives in the chassi have troubles,it may be a defect there as
 well.

 On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com
 mailto:ste...@me.com wrote:


   On 17/04/2015, at 21.07, Andrija Panic
 andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com
 wrote:
  
   nahSamsun 850 PRO 128GB - dead after 3months - 2 of these
 died... wearing level is 96%, so only 4% wasted... (yes I know
 these are not enterprise,etc… )
 Damn… but maybe your surname says it all - Don’t Panic :) But
 making sure same type of SSD devices ain’t of near same age and
 doing preventive replacement rotation might be good practice I
 guess.

 /Steffen

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





 --

 Andrija Panić




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but these
have horrible random/sequetial speeds - Samsun 850 PROs are 3 times at
least faster on sequential, and more than 3 times faser on random/IOPS
measures.
And ofcourse modern enterprise drives = ...

On 18 April 2015 at 12:42, Mark Kirkwood mark.kirkw...@catalyst.net.nz
wrote:

 Yes, it sure is - my experience with 'consumer' SSD is that they die with
 obscure firmware bugs (wrong capacity, zero capacity, not detected in bios
 anymore) rather than flash wearout. It seems that the 'enterprise' tagged
 drives are less inclined to suffer this fate.

 Regards

 Mark

 On 18/04/15 22:23, Andrija Panic wrote:

 these 2 drives, are on the regular SATA (on board)controler, and beside
 this, there is 12 x 4TB on the fron of the servers - normal backplane on
 the front.

 Anyway, we are going to check those dead SSDs on a pc/laptop or so,just
 to confirm they are really dead - but this is the way they die, not wear
 out, but simply show different space instead of real one - thse were 3
 months old only when they died...

 On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com
 mailto:jose...@gmail.com wrote:

 If the same chassi/chip/backplane is behind both drives and maybe
 other drives in the chassi have troubles,it may be a defect there as
 well.

 On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com
 mailto:ste...@me.com wrote:


   On 17/04/2015, at 21.07, Andrija Panic
 andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote:
  
   nahSamsun 850 PRO 128GB - dead after 3months - 2 of these
 died... wearing level is 96%, so only 4% wasted... (yes I know
 these are not enterprise,etc… )
 Damn… but maybe your surname says it all - Don’t Panic :) But
 making sure same type of SSD devices ain’t of near same age and
 doing preventive replacement rotation might be good practice I
 guess.

 /Steffen

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy journal on separate partition - quck info needed

2015-04-17 Thread Andrija Panic
Hi all,

when I run:

ceph-deploy osd create SERVER:sdi:/dev/sdb5

(sdi = previously ZAP-ed 4TB drive)
(sdb5 = previously manually created empty partition with fdisk)

Is ceph-deploy going to create journal properly on sdb5 (something similar
to: ceph-osd -i $ID --mkjournal ), or do I need to do something before this
?

I have actually already run this command but havent seen any mkjournal
commands in the output

OSD shows as up and in, but I have doubts if journal is fine (symlink does
link to /dev/sdb5) but again...

Any confimration is welcomed
Thanks,
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy journal on separate partition - quck info needed

2015-04-17 Thread Andrija Panic
ok, thx Robert - I expected that so this is fine then - just done it on 12
OSDs and all fine...

thx again

On 17 April 2015 at 23:38, Robert LeBlanc rob...@leblancnet.us wrote:

 If the journal file on the osd is a symlink to the partition and the OSD
 process is running, then the journal was created properly. The OSD would
 not start if the journal was not created.

 On Fri, Apr 17, 2015 at 2:43 PM, Andrija Panic andrija.pa...@gmail.com
 wrote:

 Hi all,

 when I run:

 ceph-deploy osd create SERVER:sdi:/dev/sdb5

 (sdi = previously ZAP-ed 4TB drive)
 (sdb5 = previously manually created empty partition with fdisk)

 Is ceph-deploy going to create journal properly on sdb5 (something
 similar to: ceph-osd -i $ID --mkjournal ), or do I need to do something
 before this ?

 I have actually already run this command but havent seen any mkjournal
 commands in the output

 OSD shows as up and in, but I have doubts if journal is fine (symlink
 does link to /dev/sdb5) but again...

 Any confimration is welcomed
 Thanks,
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
heh :) yes, intresting last name :)
anyway, all are the exact same age, we implememnted new CEPH nodes at
exactly same time - but it's now wearing problem - the dead SSDs were siply
DEAD - smartctl-a showing nothing, except 600 PB space/size :)

On 18 April 2015 at 09:41, Steffen W Sørensen ste...@me.com wrote:


  On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com wrote:
 
  nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died...
 wearing level is 96%, so only 4% wasted... (yes I know these are not
 enterprise,etc… )
 Damn… but maybe your surname says it all - Don’t Panic :) But making sure
 same type of SSD devices ain’t of near same age and doing preventive
 replacement rotation might be good practice I guess.

 /Steffen




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
these 2 drives, are on the regular SATA (on board)controler, and beside
this, there is 12 x 4TB on the fron of the servers - normal backplane on
the front.

Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to
confirm they are really dead - but this is the way they die, not wear out,
but simply show different space instead of real one - thse were 3 months
old only when they died...

On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com wrote:

 If the same chassi/chip/backplane is behind both drives and maybe other
 drives in the chassi have troubles,it may be a defect there as well.
 On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com wrote:


  On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com wrote:
 
  nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died...
 wearing level is 96%, so only 4% wasted... (yes I know these are not
 enterprise,etc… )
 Damn… but maybe your surname says it all - Don’t Panic :) But making sure
 same type of SSD devices ain’t of near same age and doing preventive
 replacement rotation might be good practice I guess.

 /Steffen

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Andrija Panic
Guys,

I'm Igor's colleague, working a bit on CEPH,  together with Igor.

This is production cluster, and we are becoming more desperate as the time
goes by.

Im not sure if this is appropriate place to seek commercial support, but
anyhow, I do it...

If anyone feels like and have some experience in this particular PG
troubleshooting issues, we are also ready to seek for commercial support to
solve our issue, company or individual, it doesn't matter.


Thanks,
Andrija

On 20 August 2015 at 19:07, Voloshanenko Igor igor.voloshane...@gmail.com
wrote:

 Inktank:

 https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf

 Mail-list:
 https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html

 2015-08-20 20:06 GMT+03:00 Samuel Just sj...@redhat.com:

 Which docs?
 -Sam

 On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Not yet. I will create.
  But according to mail lists and Inktank docs - it's expected behaviour
 when
  cache enable
 
  2015-08-20 19:56 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Is there a bug for this in the tracker?
  -Sam
 
  On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Issue, that in forward mode, fstrim doesn't work proper, and when we
   take
   snapshot - data not proper update in cache layer, and client (ceph)
 see
   damaged snap.. As headers requested from cache layer.
  
   2015-08-20 19:53 GMT+03:00 Samuel Just sj...@redhat.com:
  
   What was the issue?
   -Sam
  
   On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
Samuel, we turned off cache layer few hours ago...
I will post ceph.log in few minutes
   
For snap - we found issue, was connected with cache tier..
   
2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com:
   
Ok, you appear to be using a replicated cache tier in front of a
replicated base tier.  Please scrub both inconsistent pgs and
 post
the
ceph.log from before when you started the scrub until after.
 Also,
what command are you using to take snapshots?
-Sam
   
On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
 Hi Samuel, we try to fix it in trick way.

 we check all rbd_data chunks from logs (OSD) which are
 affected,
 then
 query
 rbd info to compare which rbd consist bad rbd_data, after that
 we
 mount
 this
 rbd as rbd0, create empty rbd, and DD all info from bad volume
 to
 new
 one.

 But after that - scrub errors growing... Was 15 errors.. .Now
 35...
 We
 laos
 try to out OSD which was lead, but after rebalancing this 2 pgs
 still
 have
 35 scrub errors...

 ceph osd getmap -o outfile - attached


 2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com:

 Is the number of inconsistent objects growing?  Can you attach
 the
 whole ceph.log from the 6 hours before and after the snippet
 you
 linked above?  Are you using cache/tiering?  Can you attach
 the
 osdmap
 (ceph osd getmap -o outfile)?
 -Sam

 On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  ceph - 0.94.2
  Its happen during rebalancing
 
  I thought too, that some OSD miss copy, but looks like all
  miss...
  So any advice in which direction i need to go
 
  2015-08-18 14:14 GMT+03:00 Gregory Farnum 
 gfar...@redhat.com:
 
  From a quick peek it looks like some of the OSDs are
 missing
  clones
  of
  objects. I'm not sure how that could happen and I'd expect
 the
  pg
  repair to handle that but if it's not there's probably
  something
  wrong; what version of Ceph are you running? Sam, is this
  something
  you've seen, a new bug, or some kind of config issue?
  -Greg
 
  On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Hi all, at our production cluster, due high rebalancing
 (((
   we
   have 2
   pgs in
   inconsistent state...
  
   root@temp:~# ceph health detail | grep inc
   HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
   pg 2.490 is active+clean+inconsistent, acting [56,15,29]
   pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
  
   From OSD logs, after recovery attempt:
  
   root@test:~# ceph pg dump | grep -i incons | cut -f 1 |
   while
   read
   i;
   do
   ceph pg repair ${i} ; done
   dumped all in format plain
   instructing pg 2.490 on osd.56 to repair
   instructing pg 2.c4 on osd.56 to repair
  
   /var/log/ceph/ceph-osd.56.log:51:2015-08-18
 07:26:37.035910
   7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490
   f5759490/rbd_data.1631755377d7e.04da/head//2

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Andrija Panic
This was related to the caching layer, which doesnt support snapshooting
per docs...for sake of closing the thread.

On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com
wrote:

 Hi all, can you please help me with unexplained situation...

 All snapshot inside ceph broken...

 So, as example, we have VM template, as rbd inside ceph.
 We can map it and mount to check that all ok with it

 root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
 /dev/rbd0
 root@test:~# parted /dev/rbd0 print
 Model: Unknown (unknown)
 Disk /dev/rbd0: 10.7GB
 Sector size (logical/physical): 512B/512B
 Partition Table: msdos

 Number  Start   End SizeType File system  Flags
  1  1049kB  525MB   524MB   primary  ext4 boot
  2  525MB   10.7GB  10.2GB  primary   lvm

 Than i want to create snap, so i do:
 root@test:~# rbd snap create
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

 And now i want to map it:

 root@test:~# rbd map
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 /dev/rbd1
 root@test:~# parted /dev/rbd1 print
 Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
  /dev/rbd1 has been opened read-only.
 Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
  /dev/rbd1 has been opened read-only.
 Error: /dev/rbd1: unrecognised disk label

 Even md5 different...
 root@ix-s2:~# md5sum /dev/rbd0
 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
 root@ix-s2:~# md5sum /dev/rbd1
 e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1


 Ok, now i protect snap and create clone... but same thing...
 md5 for clone same as for snap,,

 root@test:~# rbd unmap /dev/rbd1
 root@test:~# rbd snap protect
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 root@test:~# rbd clone
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 cold-storage/test-image
 root@test:~# rbd map cold-storage/test-image
 /dev/rbd1
 root@test:~# md5sum /dev/rbd1
 e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1

  but it's broken...
 root@test:~# parted /dev/rbd1 print
 Error: /dev/rbd1: unrecognised disk label


 =

 tech details:

 root@test:~# ceph -v
 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)

 We have 2 inconstistent pgs, but all images not placed on this pgs...

 root@test:~# ceph health detail
 HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
 pg 2.490 is active+clean+inconsistent, acting [56,15,29]
 pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
 18 scrub errors

 

 root@test:~# ceph osd map cold-storage
 0e23c701-401d-4465-b9b4-c02939d57bb5
 osdmap e16770 pool 'cold-storage' (2) object
 '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up
 ([37,15,14], p37) acting ([37,15,14], p37)
 root@test:~# ceph osd map cold-storage
 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
 osdmap e16770 pool 'cold-storage' (2) object
 '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) -
 up ([12,23,17], p12) acting ([12,23,17], p12)
 root@test:~# ceph osd map cold-storage
 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
 osdmap e16770 pool 'cold-storage' (2) object
 '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9
 (2.2a9) - up ([12,44,23], p12) acting ([12,44,23], p12)


 Also we use cache layer, which in current moment - in forward mode...

 Can you please help me with this.. As my brain stop to understand what is
 going on...

 Thank in advance!





 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-08-25 Thread Andrija Panic
Make sure you test what ever you decide. We just learned this the hard way
with samsung 850 pro, which is total crap, more than you could imagine...

Andrija
On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote:

 I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
 Very cheap, better than Intel 3610 for sure (and I think it beats even
 3700).

 Jan

  On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de
 wrote:
 
  Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator:
  Hi,
 
  most of the times I do get the recommendation from resellers to go with
  the intel s3700 for the journalling.
 
  Check out the Intel s3610. 3 drive writes per day for 5 years. Plus, it
  is cheaper than S3700.
 
  Regards,
 
  --ck
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-08-25 Thread Andrija Panic
First read please:
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those
are  constant performance numbers, meaning avoiding drives cache and
running for longer period of time...
Also if checking with FIO you will get better latencies on intel s3500
(model tested in our case) along with 20X better IOPS results...

We observed original issue by having high speed at begining of i.e. file
transfer inside VM, which than halts to zero... We moved journals back to
HDDs and performans was acceptable...no we are upgrading to intel S3500...

Best
any details on that ?

On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic
andrija.pa...@gmail.com wrote:

 Make sure you test what ever you decide. We just learned this the hard way
 with samsung 850 pro, which is total crap, more than you could imagine...

 Andrija
 On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote:

  I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
  Very cheap, better than Intel 3610 for sure (and I think it beats even
  3700).
 
  Jan
 
   On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de
  wrote:
  
   Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator:
   Hi,
  
   most of the times I do get the recommendation from resellers to go
with
   the intel s3700 for the journalling.
  
   Check out the Intel s3610. 3 drive writes per day for 5 years. Plus,
it
   is cheaper than S3700.
  
   Regards,
  
   --ck
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 



--
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczew...@efigence.com
mailto:mariusz.gronczew...@efigence.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-08-25 Thread Andrija Panic
We have some 850 pro 256gb ssds if anyone interested to buy:)

And also there was new 850 pro firmware that broke peoples disk which was
revoked later etc... I'm sticking with only vacuum cleaners from Samsung
for now, maybe... :)
On Aug 25, 2015 12:02 PM, Voloshanenko Igor igor.voloshane...@gmail.com
wrote:

 To be honest, Samsung 850 PRO not 24/7 series... it's something about
 desktop+ series, but anyway - results from this drives - very very bad in
 any scenario acceptable by real life...

 Possible 845 PRO more better, but we don't want to experiment anymore...
 So we choose S3500 240G. Yes, it's cheaper than S3700 (about 2x times), and
 no so durable for writes, but we think more better to replace 1 ssd per 1
 year than to pay double price now.

 2015-08-25 12:59 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 And should I mention that in another CEPH installation we had samsung 850
 pro 128GB and all of 6 ssds died in 2 month period - simply disappear from
 the system, so not wear out...

 Never again we buy Samsung :)
 On Aug 25, 2015 11:57 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:

 First read please:

 http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

 We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those
 are  constant performance numbers, meaning avoiding drives cache and
 running for longer period of time...
 Also if checking with FIO you will get better latencies on intel s3500
 (model tested in our case) along with 20X better IOPS results...

 We observed original issue by having high speed at begining of i.e. file
 transfer inside VM, which than halts to zero... We moved journals back to
 HDDs and performans was acceptable...no we are upgrading to intel S3500...

 Best
 any details on that ?

 On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic
 andrija.pa...@gmail.com wrote:

  Make sure you test what ever you decide. We just learned this the hard
 way
  with samsung 850 pro, which is total crap, more than you could
 imagine...
 
  Andrija
  On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote:
 
   I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
   Very cheap, better than Intel 3610 for sure (and I think it beats
 even
   3700).
  
   Jan
  
On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de
   wrote:
   
Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator:
Hi,
   
most of the times I do get the recommendation from resellers to
 go with
the intel s3700 for the journalling.
   
Check out the Intel s3610. 3 drive writes per day for 5 years.
 Plus, it
is cheaper than S3700.
   
Regards,
   
--ck
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  



 --
 Mariusz Gronczewski, Administrator

 Efigence S. A.
 ul. Wołoska 9a, 02-583 Warszawa
 T: [+48] 22 380 13 13
 F: [+48] 22 380 13 14
 E: mariusz.gronczew...@efigence.com
 mailto:mariusz.gronczew...@efigence.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-08-25 Thread Andrija Panic
And should I mention that in another CEPH installation we had samsung 850
pro 128GB and all of 6 ssds died in 2 month period - simply disappear from
the system, so not wear out...

Never again we buy Samsung :)
On Aug 25, 2015 11:57 AM, Andrija Panic andrija.pa...@gmail.com wrote:

 First read please:

 http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

 We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those
 are  constant performance numbers, meaning avoiding drives cache and
 running for longer period of time...
 Also if checking with FIO you will get better latencies on intel s3500
 (model tested in our case) along with 20X better IOPS results...

 We observed original issue by having high speed at begining of i.e. file
 transfer inside VM, which than halts to zero... We moved journals back to
 HDDs and performans was acceptable...no we are upgrading to intel S3500...

 Best
 any details on that ?

 On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic
 andrija.pa...@gmail.com wrote:

  Make sure you test what ever you decide. We just learned this the hard
 way
  with samsung 850 pro, which is total crap, more than you could imagine...
 
  Andrija
  On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote:
 
   I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
   Very cheap, better than Intel 3610 for sure (and I think it beats even
   3700).
  
   Jan
  
On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de
   wrote:
   
Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator:
Hi,
   
most of the times I do get the recommendation from resellers to go
 with
the intel s3700 for the journalling.
   
Check out the Intel s3610. 3 drive writes per day for 5 years. Plus,
 it
is cheaper than S3700.
   
Regards,
   
--ck
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  



 --
 Mariusz Gronczewski, Administrator

 Efigence S. A.
 ul. Wołoska 9a, 02-583 Warszawa
 T: [+48] 22 380 13 13
 F: [+48] 22 380 13 14
 E: mariusz.gronczew...@efigence.com
 mailto:mariusz.gronczew...@efigence.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic
Quentin,

try fio or dd with O_DIRECT and D_SYNC flags, and you will see less than
1MB/s - that is common for most "home" drives - check the post down to
understand

We removed all Samsung 850 pro 256GB from our new CEPH installation and
replaced with Intel S3500 (18.000 (4Kb) IOPS constant write speed with
O_DIRECT, D_SYNC, in comparison to 200 IOPS for Samsun 850pro - you can
imagine the difference...):

http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

Best

On 4 September 2015 at 21:09, Quentin Hartman <qhart...@direwolfdigital.com>
wrote:

> Mine are also mostly 850 Pros. I have a few 840s, and a few 850 EVOs in
> there just because I couldn't find 14 pros at the time we were ordering
> hardware. I have 14 nodes, each with a single 128 or 120GB SSD that serves
> as the boot drive  and the journal for 3 OSDs. And similarly, mine just
> started disappearing a few weeks ago. I've now had four fail (three 850
> Pro, one 840 Pro). I expect the rest to fail any day.
>
> As it turns out I had a phone conversation with the support rep who has
> been helping me with RMA's today and he's putting together a report with my
> pertinent information in it to forward on to someone.
>
> FWIW, I tried to get your 845's for this deploy, but couldn't find them
> anywhere, and since the 850's looked about as durable on paper I figured
> they would do ok. Seems not to be the case.
>
> QH
>
> On Fri, Sep 4, 2015 at 12:53 PM, Andrija Panic <andrija.pa...@gmail.com>
> wrote:
>
>> Hi James,
>>
>> I had 3 CEPH nodes as folowing: 12 OSDs(HDD) and 2 SSDs (2x 6 Journals
>> partitions on each SSD) - SSDs just vanished with no warning, no smartctl
>> errors nothing... so 2 SSDs in 3 servers vanished in...2-3 weeks, after a
>> 3-4 months of being in production (VMs/KVM/CloudStack)
>>
>> Mine were also Samsung 850 PRO 128GB.
>>
>> Best,
>> Andrija
>>
>> On 4 September 2015 at 19:27, James (Fei) Liu-SSI <
>> james@ssi.samsung.com> wrote:
>>
>>> Hi Quentin and Andrija,
>>>
>>> Thanks so much for reporting the problems with Samsung.
>>>
>>>
>>>
>>> Would be possible to get to know your configuration of your system?
>>> What kind of workload are you running?  Do you use Samsung SSD as separate
>>> journaling disk, right?
>>>
>>>
>>>
>>> Thanks so much.
>>>
>>>
>>>
>>> James
>>>
>>>
>>>
>>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
>>> Behalf Of *Quentin Hartman
>>> *Sent:* Thursday, September 03, 2015 1:06 PM
>>> *To:* Andrija Panic
>>> *Cc:* ceph-users
>>> *Subject:* Re: [ceph-users] which SSD / experiences with Samsung 843T
>>> vs. Intel s3700
>>>
>>>
>>>
>>> Yeah, we've ordered some S3700's to replace them already. Should be here
>>> early next week. Hopefully they arrive before we have multiple nodes die at
>>> once and can no longer rebalance successfully.
>>>
>>>
>>>
>>> Most of the drives I have are the 850 Pro 128GB (specifically
>>> MZ7KE128HMGA)
>>>
>>> There are a couple 120GB 850 EVOs in there too, but ironically, none of
>>> them have pooped out yet.
>>>
>>>
>>>
>>> On Thu, Sep 3, 2015 at 1:58 PM, Andrija Panic <andrija.pa...@gmail.com>
>>> wrote:
>>>
>>> I really advise removing the bastards becore they die...no rebalancing
>>> hapening just temp osd down while replacing journals...
>>>
>>> What size and model are yours Samsungs?
>>>
>>> On Sep 3, 2015 7:10 PM, "Quentin Hartman" <qhart...@direwolfdigital.com>
>>> wrote:
>>>
>>> We also just started having our 850 Pros die one after the other after
>>> about 9 months of service. 3 down, 11 to go... No warning at all, the drive
>>> is fine, and then it's not even visible to the machine. According to the
>>> stats in hdparm and the calcs I did they should have had years of life
>>> left, so it seems that ceph journals definitely do something they do not
>>> like, which is not reflected in their stats.
>>>
>>>
>>>
>>> QH
>>>
>>>
>>>
>>> On Wed, Aug 26, 2015 at 7:15 AM, 10 minus <t10te...@gmail.com> wrote:
>>>
>>> Hi ,
>>>
>>> We got a good deal on 843T and we are using it in our Openstack setup
>>> ..as journals .
>>> The

  1   2   >