from:"Lindsay Mathieson"

[ceph-users] Basic Ceph Questions

2014-11-05 Thread Lindsay Mathieson

Morning all ..

I have a simple 3 node 2 osd cluster setup serving VM Images (proxmox). The 
two OSD's are on the two VM hosts. Size is set to 2 for replication on both 
OSD's. SSD journals. 


- if the Ceph Client (VM quest over RBD) is accessing data that is stored on 
the local OSD, will it avoid hitting the network and just access the local 
drive? From monitoring the network bond that seems to be the case.


- if I added an extra OSD to the local node would that same client then use it 
to stripe reads, improving the the read transfer rate?


- Geo Replication - thats done via federated gateways? looks complicated :(
  * The remote slave, it would be read only?

- Disaster strikes, apart from DR backups how easy is it to recover your data 
off ceph OSD's? one of the things I liked about gluster was that if I totally 
screwed up the gluster masters, I could always just copy the data off the 
filesystem. Not so much with ceph.


- Am I abusing ceph? :) I just have a small 3 node VM server cluster with 20 
windows VM;s, some servers, some VDI. The shared store is a QNAP nas which is 
struggling. I'm using ceph for
- Shared Storage
- Replication/Redundancy
- Improved performance

Its serving all of this, but the complexity concerns me sometimes.

Thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] firefly and cache tiers

2014-11-20 Thread Lindsay Mathieson

Are cache tiers reliable in firefly if you *aren't* using erasure pools? 


Secondary to that - do they give a big boost with regard to read/write 
performance for VM images? any real world feedback?

thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] firefly and cache tiers

2014-11-20 Thread Lindsay Mathieson

On Thu, 20 Nov 2014 03:12:44 PM Mark Nelson wrote:
 Personally I'd suggest a lot of testing first.  Not sure if there are 
 any lingering stability issues, but as far as performance goes in 
 firefly you'll only likely see speed ups with very skewed hot/cold 
 distributions and potentially slow downs in the general case unless you 
 have an extremely fast network and cache tier.
 
 In giant and master, there are a lot of improvements being made to 
 decrease unnecessary cache promotions and blocking.


Thanks, good to know. Going to have to add some old disks for testing ;)

I didn't see any mention of ignoring large streaming reads/write - i.e backups 
and disk migrations. I imagine doing that would flush the cache of useful 
data?

Planned for giant?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Create OSD on ZFS Mount (firefly)

2014-11-25 Thread Lindsay Mathieson

Testing ceph on top of ZFS (zfsonlinux), kernel driver.

- Have created ZFS mount:
   /var/lib/ceph/osd/ceph-0

- followed the instructions at:
  http://ceph.com/docs/firefly/rados/operations/add-or-rm-osds/

failing on the step 4. Initialize the OSD data directory.


ceph-osd -i 0 --mkfs --mkkey
2014-11-25 22:12:26.563666 7ff12b466780 -1 filestore(/var/lib/ceph/osd/ceph-0) 
mkjournal error creating journal on /var/lib/ceph/osd/ceph-0/journal: (22) 
Invalid argument
2014-11-25 22:12:26.563691 7ff12b466780 -1 OSD::mkfs: ObjectStore::mkfs failed 
with error -22
2014-11-25 22:12:26.563765 7ff12b466780 -1  ** ERROR: error creating empty 
object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument


Is this supported?

thanks,

--
Lindsay


signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Create OSD on ZFS Mount (firefly)

2014-11-25 Thread Lindsay Mathieson

Thanks Craig, I had a good read of it - from what I read, the std ceph
packages should work with zfs, just not make use of its extra features
(writeparallel support), whose performance was not all that good
anyway.

I did note the set xattr to sa comment which gave me a different error:)

ceph-osd -i 0 --mkfs --mkkey
2014-11-26 10:51:33.559288 7fd10544c780 -1 journal FileJournal::_open:
disabling aio for non-block journal.  Use journal_force_aio to force
use of aio anyway
2014-11-26 10:51:33.559355 7fd10544c780 -1 journal check: ondisk fsid
---- doesn't match expected
c064615f-d692-4eb0-9211-a26dcb186478, invalid (someone else's?)
journal
2014-11-26 10:51:33.559405 7fd10544c780 -1
filestore(/var/lib/ceph/osd/ceph-0) mkjournal error creating journal
on /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument
2014-11-26 10:51:33.559430 7fd10544c780 -1 OSD::mkfs:
ObjectStore::mkfs failed with error -22
2014-11-26 10:51:33.559505 7fd10544c780 -1  ** ERROR: error creating
empty object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument

nb. The command actually succeeds if I unmount zfs and use the
underlying ext4 system.

This is on a proxmox (debian wheezy) box:

  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)

  zfs: 0.6.3-1~wheezy

  kernel: 3.10

thanks,

On 26 November 2014 at 09:43, Craig Lewis cle...@centraldesktop.com wrote:
 There was a good thread on the mailing list a little while ago.  There were
 several recommendations in that thread, maybe some of them will help.

 Found it:
 https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14154.html


 On Tue, Nov 25, 2014 at 4:16 AM, Lindsay Mathieson
 lindsay.mathie...@gmail.com wrote:

 Testing ceph on top of ZFS (zfsonlinux), kernel driver.



 - Have created ZFS mount:

 /var/lib/ceph/osd/ceph-0



 - followed the instructions at:

 http://ceph.com/docs/firefly/rados/operations/add-or-rm-osds/



 failing on the step 4. Initialize the OSD data directory.





 ceph-osd -i 0 --mkfs --mkkey

 2014-11-25 22:12:26.563666 7ff12b466780 -1
 filestore(/var/lib/ceph/osd/ceph-0) mkjournal error creating journal on
 /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument

 2014-11-25 22:12:26.563691 7ff12b466780 -1 OSD::mkfs: ObjectStore::mkfs
 failed with error -22

 2014-11-25 22:12:26.563765 7ff12b466780 -1 ** ERROR: error creating empty
 object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument





 Is this supported?



 thanks,



 --

 Lindsay


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Create OSD on ZFS Mount (firefly)

2014-11-25 Thread Lindsay Mathieson

I've found the cause of the problem - ceph was attempting to create
the journal with direct io which zfs doesn't support. I worked round
it by disabling journal dio in ceph.config

[osd]
journal dio = false

Dunno if this is a good idea or not or whether there is a better way
of doing it :_

On 26 November 2014 at 09:43, Craig Lewis cle...@centraldesktop.com wrote:
 There was a good thread on the mailing list a little while ago.  There were
 several recommendations in that thread, maybe some of them will help.

 Found it:
 https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14154.html


 On Tue, Nov 25, 2014 at 4:16 AM, Lindsay Mathieson
 lindsay.mathie...@gmail.com wrote:

 Testing ceph on top of ZFS (zfsonlinux), kernel driver.



 - Have created ZFS mount:

 /var/lib/ceph/osd/ceph-0



 - followed the instructions at:

 http://ceph.com/docs/firefly/rados/operations/add-or-rm-osds/



 failing on the step 4. Initialize the OSD data directory.





 ceph-osd -i 0 --mkfs --mkkey

 2014-11-25 22:12:26.563666 7ff12b466780 -1
 filestore(/var/lib/ceph/osd/ceph-0) mkjournal error creating journal on
 /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument

 2014-11-25 22:12:26.563691 7ff12b466780 -1 OSD::mkfs: ObjectStore::mkfs
 failed with error -22

 2014-11-25 22:12:26.563765 7ff12b466780 -1 ** ERROR: error creating empty
 object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument





 Is this supported?



 thanks,



 --

 Lindsay


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Quetions abount osd journal configuration

2014-11-26 Thread Lindsay Mathieson

On Wed, 26 Nov 2014 05:37:43 AM Mark Nelson wrote:
 I don't know if things have changed, but I don't think you want to 
 outright move the journal like that.  Instead, something like:
 
 ceph-osd -i N --flush-journal
 delete old journal

link to the new journal device
ln -s /var/lib/ceph/osd/ceph-N  /journal /dev/XXX

 ceph-osd -N --mkjournal

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Create OSD on ZFS Mount (firefly)

2014-11-26 Thread Lindsay Mathieson

On Tue, 25 Nov 2014 03:47:08 PM Eric Eastman wrote:
 It has been almost a year since I last tried ZFS, but I had to add to the
 ceph.conf file:
 
filestore zfs_snap = 1   
journal aio = 0
journal dio = 0
 
 Eric


Thanks Eric, I figured it out in the end, though I haven't tried zfs_snap. 
From your posts it didn't seem like it made much difference?

thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] large reads become 512 kbyte reads on qemu-kvm rbd

2014-11-28 Thread Lindsay Mathieson

On Fri, 28 Nov 2014 08:56:24 PM Ilya Dryomov wrote:
 which you are supposed to change on a per-device basis via sysfs.


Is there a way to do this for windows VM's?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Rebuild OSD's

2014-11-29 Thread Lindsay Mathieson

I have 2 OSD's on two nodes top of zfs that I'd like to rebuild in a more 
standard (xfs) setup.

Would the following be a non destructive if somewhat tedious way of doing so?

Following the instructions from here:

  
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

1. Remove osd.0
2. Recreate osd.0
3. Add. osd.0
4. Wait for health to be restored
i.e all data be copied from osd.1 to osd.0

5. Remove osd.1
6. Recreate osd.1
7. Add. osd.1
8. Wait for health to be restored
i.e all data be copied from osd.0 to osd.1

9. Profit!


There's 1TB of data total. I can do this after hours while the system  
network is not being used

I do have complete backups in case it all goes pear shaped.

thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Rebuild OSD's

2014-11-29 Thread Lindsay Mathieson

I have 2 OSD's on two nodes top of zfs that I'd like to rebuild in a more 
standard (xfs) setup.

Would the following be a non destructive if somewhat tedious way of doing so?

Following the instructions from here:

  
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

1. Remove osd.0
2. Recreate osd.0
3. Add. osd.0
4. Wait for health to be restored
i.e all data be copied from osd.1 to osd.0

5. Remove osd.1
6. Recreate osd.1
7. Add. osd.1
8. Wait for health to be restored
i.e all data be copied from osd.0 to osd.1

9. Profit!


There's 1TB of data total. I can do this after hours while the system  
network is not being used

I do have complete backups in case it all goes pear shaped.

thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Actual size of rbd vm images

2014-11-29 Thread Lindsay Mathieson

According to the docs, Ceph block devices are thin provisioned. But how do I 
list the actual size of vm images hosted on ceph?

I do something like:
  rbd ls -l rbd

But that only lists the provisioned sizes, not the real usage.

thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Actual size of rbd vm images

2014-11-29 Thread Lindsay Mathieson

On Sun, 30 Nov 2014 11:37:06 AM Haomai Wang wrote:
 Yeah, we still have no way to inspect the actual usage of image.
 
 But we already have existing bp to impl it.
 https://wiki.ceph.com/Planning/Blueprints/Hammer/librbd%3A_shared_flag%2C_ob
 ject_map

Thanks, good to know.

I did find this:

  http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/3684

rbd diff {POOL}/{IMAGE} | awk '{ SUM += $2 } END { print SUM/1024/1024/1024  
GB }'

Which does the trick nicely in the meantime.

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] osd was crashing on start - journal flush

2014-11-30 Thread Lindsay Mathieson

I had a problem with an osd starting - log seemed to show the journal was a 
problem. When I tried to flush the journal I got the errors below.

I was in a hurry so attached a spare ssd partion as a new journal, which fixed 
the problem and let it heal.

To fix it for the original ssd journal should I have cleared the ssd partition 
using dd?

errors:


ceph-osd -i 1 --flush-journal
2014-12-01 01:16:05.387607 7f4133d18780 -1 filestore(/var/lib/ceph/osd/ceph-1) 
FileStore::_setattrs: chain_setxattr returned -14
os/FileStore.cc: In function 'unsigned int 
FileStore::_do_transaction(ObjectStore::Transaction, uint64_t, int, 
ThreadPool::TPHandle*)' thread 7f4133d18780 time 2014-12-01 01:16:05.387752
os/FileStore.cc: 2559: FAILED assert(0 == unexpected error)
 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (FileStore::_do_transaction(ObjectStore::Transaction, unsigned long, int, 
ThreadPool::TPHandle*)+0x9ff) [0x9c63ff]
 2: (FileStore::_do_transactions(std::listObjectStore::Transaction*, 
std::allocatorObjectStore::Transaction* , unsigned long, 
ThreadPool::TPHandle*)+0x6c) [0x9c9efc]
 3: (JournalingObjectStore::journal_replay(unsigned long)+0x985) [0x9dfdc5]
 4: (FileStore::mount()+0x32dd) [0x9b654d]
 5: (main()+0xe20) [0x731dc0]
 6: (__libc_start_main()+0xfd) [0x7f4131db8ead]
 7: ceph-osd() [0x736ea9]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.
2014-12-01 01:16:05.390458 7f4133d18780 -1 os/FileStore.cc: In function 
'unsigned int FileStore::_do_transaction(ObjectStore::Transaction, uint64_t, 
int, ThreadPool::TPHandle*)' thread 7f4133d18780 time 2014-12-01 01:16:05.387752
os/FileStore.cc: 2559: FAILED assert(0 == unexpected error)

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (FileStore::_do_transaction(ObjectStore::Transaction, unsigned long, int, 
ThreadPool::TPHandle*)+0x9ff) [0x9c63ff]
 2: (FileStore::_do_transactions(std::listObjectStore::Transaction*, 
std::allocatorObjectStore::Transaction* , unsigned long, 
ThreadPool::TPHandle*)+0x6c) [0x9c9efc]
 3: (JournalingObjectStore::journal_replay(unsigned long)+0x985) [0x9dfdc5]
 4: (FileStore::mount()+0x32dd) [0x9b654d]
 5: (main()+0xe20) [0x731dc0]
 6: (__libc_start_main()+0xfd) [0x7f4131db8ead]
 7: ceph-osd() [0x736ea9]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

-4 2014-12-01 01:16:05.387607 7f4133d18780 -1 
filestore(/var/lib/ceph/osd/ceph-1) FileStore::_setattrs: chain_setxattr 
returned -14
 0 2014-12-01 01:16:05.390458 7f4133d18780 -1 os/FileStore.cc: In function 
'unsigned int FileStore::_do_transaction(ObjectStore::Transaction, uint64_t, 
int, ThreadPool::TPHandle*)' thread 7f4133d18780 time 2014-12-01 01:16:05.387752
os/FileStore.cc: 2559: FAILED assert(0 == unexpected error)

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (FileStore::_do_transaction(ObjectStore::Transaction, unsigned long, int, 
ThreadPool::TPHandle*)+0x9ff) [0x9c63ff]
 2: (FileStore::_do_transactions(std::listObjectStore::Transaction*, 
std::allocatorObjectStore::Transaction* , unsigned long, 
ThreadPool::TPHandle*)+0x6c) [0x9c9efc]
 3: (JournalingObjectStore::journal_replay(unsigned long)+0x985) [0x9dfdc5]
 4: (FileStore::mount()+0x32dd) [0x9b654d]
 5: (main()+0xe20) [0x731dc0]
 6: (__libc_start_main()+0xfd) [0x7f4131db8ead]
 7: ceph-osd() [0x736ea9]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (Aborted) **
 in thread 7f4133d18780
 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: ceph-osd() [0xab54a2]
 2: (()+0xf0a0) [0x7f413325b0a0]
 3: (gsignal()+0x35) [0x7f4131dcc165]
 4: (abort()+0x180) [0x7f4131dcf3e0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f413262389d]
 6: (()+0x63996) [0x7f4132621996]
 7: (()+0x639c3) [0x7f41326219c3]
 8: (()+0x63bee) [0x7f4132621bee]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x40a) [0xb8f97a]
 10: (FileStore::_do_transaction(ObjectStore::Transaction, unsigned long, int, 
ThreadPool::TPHandle*)+0x9ff) [0x9c63ff]
 11: (FileStore::_do_transactions(std::listObjectStore::Transaction*, 
std::allocatorObjectStore::Transaction* , unsigned long, 
ThreadPool::TPHandle*)+0x6c) [0x9c9efc]
 12: (JournalingObjectStore::journal_replay(unsigned long)+0x985) [0x9dfdc5]
 13: (FileStore::mount()+0x32dd) [0x9b654d]
 14: (main()+0xe20) [0x731dc0]
 15: (__libc_start_main()+0xfd) [0x7f4131db8ead]
 16: ceph-osd() [0x736ea9]
2014-12-01 01:16:05.394614 7f4133d18780 -1 *** Caught signal (Aborted) **
 in thread 7f4133d18780

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: ceph-osd() [0xab54a2]
 2: (()+0xf0a0) [0x7f413325b0a0]
 3: (gsignal()+0x35) [0x7f4131dcc165]
 4: (abort()+0x180) [0x7f4131dcf3e0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f413262389d]
 6: (()+0x63996)

Re: [ceph-users] do I have to use sudo for CEPH install

2014-12-01 Thread Lindsay Mathieson

You have to be a root user, either via login, su or sudo.

So no, you don't have to use sudo - just logon as root.

On 2 December 2014 at 00:05, Jiri Kanicky ji...@ganomi.com wrote:
 Hi.

 Do I have to install sudo in Debian Wheezy to deploy CEPH succesfully? I
 dont normally use sudo.

 Thank you
 Jiri
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] VM restore on Ceph very slow

2014-12-11 Thread Lindsay Mathieson


Anyone know why a VM live restore would be excessively slow on Ceph? restoring 
a  small VM with 12GB disk/2GB Ram is taking 18 *minutes*. Larger VM's can be 
over half an hour.

The same VM's on the same disks, but native, or glusterfs take less than 30 
seconds.

VM's are KVM on Proxmox.


thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] pgs stuck degraded, unclean, undersized

2014-12-12 Thread Lindsay Mathieson

Whereabouts to go with this?

ceph -s
cluster f67ef302-5c31-425d-b0fe-cdc0738f7a62
 health HEALTH_WARN 256 pgs degraded; 256 pgs stuck degraded; 256 pgs 
stuck unclean; 256 pgs stuck undersized; 256 pgs undersized; recovery 
10418/447808 objects degraded (2.326%)
 monmap e7: 3 mons at 
{0=10.10.10.240:6789/0,1=10.10.10.241:6789/0,2=10.10.10.242:6789/0}, election 
epoch 514, quorum 0,1,2 0,1,2
 mdsmap e11: 0/0/1 up
 osdmap e524: 2 osds: 2 up, 2 in
  pgmap v231654: 768 pgs, 6 pools, 795 GB data, 213 kobjects
1632 GB used, 1899 GB / 3532 GB avail
10418/447808 objects degraded (2.326%)
 256 active+undersized+degraded
   1 active+clean+scrubbing+deep
 511 active+clean


Nothing stands out in the osd logs. Nothing seems to be happening with ceph -
w


Seems to have started since I added an mds server as per here:
  http://www.sebastien-han.fr/blog/2013/05/13/deploy-a-ceph-mds-server/

I have since removed it:
  
http://www.sebastien-han.fr/blog/2012/07/04/remove-a-mds-server-from-a-ceph-cluster/

thanks.

p.s I never see my own posts or replies to them in the list, only direct 
replies - any idea? they aren't turning up in spam.


-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] pgs stuck degraded, unclean, undersized

2014-12-12 Thread Lindsay Mathieson

Sending a new thread as I can't see my own to reply.

Solved the stuck pg's by deleting the cephfs andf the pools I created for it. 
Health returned to ok instantly.

Side Note: I had to guess the command ceph fs rm as I could not find docs on 
it anywhere, and just doing ceph fs gives:

Invalid command:  missing required parameter ls
fs ls :  list filesystems
Error EINVAL: invalid command

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Test 2 - plain, unsigned

2014-12-15 Thread Lindsay Mathieson

Test Msg, at request of list owner
-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Test 3

2014-12-15 Thread Lindsay Mathieson

Last one, sorry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Test 6

2014-12-15 Thread Lindsay Mathieson

-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rbd snapshot slow restore

2014-12-15 Thread Lindsay Mathieson

I'm finding snapshot restores to be very slow. With a small vm, I can
take a snapshot withing seconds, but restores can take over 15
minutes, sometimes nearly an hou, depending on how I have tweaked
ceph.

The same vm as a QCOW2 image on NFS or native disk can be restored in
under 30 seconds.

Is this normal? is ceph just really slow at restoring rbd snapshots,
or have I really borked my setup? :)

Very basic setup:
- 3 Monitors
- 2 OSD's, ZFS on  (WD 3TB Red). Not fast disks
- 2 !0GB SSD Journals


-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Test 3

2014-12-15 Thread Lindsay Mathieson

Last one, sorry
-- 
Lindsay Mathieson | Senior Developer 
Softlog Australia 
43 Kedron Park Road, Wooloowin, QLD, 4030
[T] +61 7 3632 8804 | [F] +61 1800-818-914| [W] softlog.com.au


DISCLAIMER: This Email and any attachments are a confidential communication 
intended exclusively for the recipient. If you are not the intended recipient 
you must not disclose or use any of the contents of this Email. Should you 
receive this Email in error, contact us immediately by return Email and delete 
this Email and any attachments. If you are the intended recipient of this 
Email and propose to rely on its contents you should contact the writer to 
confirm the same. Copyright and privilege relating to the contents of this 
Email and any attachments are reserved. It is the recipient’s responsibility 
to scan all attachments for viruses prior to use. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Lindsay Mathieson

On Tue, 16 Dec 2014 11:26:35 AM you wrote:
 Is this normal? is ceph just really slow at restoring rbd snapshots,
 or have I really borked my setup?


I'm not looking for a fix or a tuning suggestions, just feedback on whether 
this is normal
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Test 6

2014-12-16 Thread Lindsay Mathieson

On Tue, 16 Dec 2014 07:57:19 AM Leen de Braal wrote:
 If you are trying to see if your mails come through, don't check on the
 list. You have a gmail account, gmail removes mails that you have sent
 yourself.

Not the case, I am on a dozen other mailman lists via gmail, all of them show 
my posts. ceph-users is the only exception.

However ceph-us...@ceph.com seems to work reliably rather than using ceph-
us...@lists.ceph.com

 You can check the archives to see.

A number of my posts are missing from there. Some are there, it seems very 
erratic.

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Lindsay Mathieson

On 17 December 2014 at 04:50, Robert LeBlanc rob...@leblancnet.us wrote:
 There are really only two ways to do snapshots that I know of and they have
 trade-offs:

 COW into the snapshot (like VMware, Ceph, etc):

 When a write is committed, the changes are committed to a diff file and the
 base file is left untouched. This only has a single write penalty,

This is when you are accessing the snapshot image?

I suspect I'm probably looking at this differently - when I take a snapshot
I never access it live, I only ever restore it - would that be merging it
back into the base?


 COW into the base image (like most Enterprise disk systems with snapshots
 for backups):

 When a write is committed, the system reads the blocks to be changed out of
 the base disk and places those original blocks into a diff file, then writes
 the new blocks directly into the base image. The pros to this approach is
 that snapshots can be deleted quickly and the data is merged already. Read
 access for the current data is always fast as it only has to search one
 location. The cons are that each write is really a read and two writes,
 recovering data from a snapshot can be slow as the reads have to search one
 or more snapshots.


Whereabout does qcow2 fall on this spectrum?

Thanks,




-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snapshot slow restore

2014-12-16 Thread Lindsay Mathieson

On 17 December 2014 at 11:50, Robert LeBlanc rob...@leblancnet.us wrote:


 On Tue, Dec 16, 2014 at 5:37 PM, Lindsay Mathieson
 lindsay.mathie...@gmail.com wrote:

 On 17 December 2014 at 04:50, Robert LeBlanc rob...@leblancnet.us wrote:
  There are really only two ways to do snapshots that I know of and they
  have
  trade-offs:
 
  COW into the snapshot (like VMware, Ceph, etc):
 
  When a write is committed, the changes are committed to a diff file and
  the
  base file is left untouched. This only has a single write penalty,

 This is when you are accessing the snapshot image?

 I suspect I'm probably looking at this differently - when I take a
 snapshot
 I never access it live, I only ever restore it - would that be merging
 it
 back into the base?


 I'm not sure what you mean by this. If you take a snapshot then you
 technically only work on the snapshot. If in VMware (sorry, most of my
 experience comes from VMware, but I believe KVM is the same) you take a
 snapshot, then the VM immediately uses the snapshot for all the
 writes/reads. You then have three options: 1. keep the snapshot
 indefinitely, 2. revert back to the snapshot point, or 3. delete the
 snapshot and merge the changes into the base to make it permanent.

I suspect I'm using terms different;y, probably because I don't know
what is really happening underneath. To me a VM snapshot is a static
thing you you can roll back to, but all VM activity takes place on the
main image.



 In case 2 the reverting of the snapshot is fast because it only deletes
 the diff file and points back to the original base disk ready to make a new
 diff file.

What happens if you have multiple snapshots? e.g. Snap 1, 2  3.
Deleting Snap 2 won't be a simple rollback to the base.


 In case 3 depending on how much write activity to new blocks have
 happened, then it may take a long time to copy the blocks into the base
 disk.

 Rereading your previous post, I understand that you are using rbd snapshots
 and then using the rbd rollback command. You are testing this performance
 vs. the rollback feature in QEMU/KVM when on local/NFS disk. Is that
 accurate?

Yes, though the rollback feature is a function of the image format
used (e.g qcow2), not something specific to qemu. If you use RAW then
snapshots are not supported.


 I haven't used the rollback feature. If you want to go back to a snapshot,
 would it be faster to create a clone off the snapshot, then run your VM off
 that, then just delete and recreate the clone?

I'll test that, but wouldn't it involve flattening the clone, which is
also a very slow process?

I don't know if this is relevant, but with qcow2 and vmware rolling
back or deleting snapshots are both operations that only take a few
tens of seconds.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cephfs not mounting on boot

2014-12-17 Thread Lindsay Mathieson

Both fuse and kernel module fail to mount,

The mons  mds are on two other nodes, so they are available when this node is 
booting.

They can be mounted manually after boot.

my fstab:

  idmin  /mnt/cephfs  fuse.ceph defaults,nonempty,_netdev 0 0
  vnb.proxmox.softlog,vng.proxmox.softlog,vnt.proxmox.softlog:/   /mnt/test 
 ceph_netdev,defaults,namemin,secretfile=/etc/pve/priv/admin.secret 0 0
--
Lindsay


signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs not mounting on boot

2014-12-17 Thread Lindsay Mathieson

On Wed, 17 Dec 2014 02:02:52 PM John Spray wrote:
 Can you tell us more about how they fail?  Error messages on console,
 anything in syslog?

Not quite sure what to look for, but I did a quick scan on ceph through dmesg 
 syslog, nothing stood out


 
 In the absence of other clues, you might want to try checking that the
 network is coming up before ceph tries to mount.

Now I think on it, that might just be it - I seem to recall a similar problem 
with cifs mounts, despite having the _netdev option. I had to issue a mount in  
/etc/network/if-up.d/

I'll test than and get back to you

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Reproducable Data Corruption with cephfs kernel driver

2014-12-17 Thread Lindsay Mathieson

I'be been experimenting with CephFS for funning KVM images (proxmox).

cephfs fuse version - 0.87

cephfs kernel module - kernel version 3.10


Part of my testing involves running a Windows 7 VM up and running
CrystalDiskMark to check the I/O in the VM. Its surprisingly good with
both the fuse and the kernel driver, seq reads  writes are actually
faster than the underlying disk, so I presume the FS is aggressively
caching.

With the fuse driver I have no problems.

With the kernel driver, the benchmark runs fine, but when I reboot the
VM the drive is corrupted and unreadable, every time. Rolling back to
a snapshot fixes the disk. This does not happen unless I run the
benchmark, which I presume is writing a lot of data.

No problems with the same test for Ceph rbd, or NFS.


-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help with SSDs

2014-12-18 Thread Lindsay Mathieson

On Thu, 18 Dec 2014 10:05:20 PM Mark Kirkwood wrote:
 My m550 
 work vastly better if the journal is a file on a filesystem as opposed 
 to a partition.


Any particular filesystem? ext4? xfs? or doesn't matter?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help with SSDs

2014-12-18 Thread Lindsay Mathieson

On Thu, 18 Dec 2014 10:05:20 PM Mark Kirkwood wrote:
 The effect of this is *highly* dependent to the SSD make/model. My m550 
 work vastly better if the journal is a file on a filesystem as opposed 
 to a partition.
 
 Obviously the Intel S3700/S3500 are a better choice - but the OP has 
 already purchased Sammy 840's, so I'm trying to suggest options to try 
 that don't require him to buy new SSDs!


I have 120GB Samsung 840 EVO's with 10GB journal partitions and just gave this 
a go.

No real change unfortunately :( using rados bench.

However it does make experimenting with different journal sizes easier.
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Reproducable Data Corruption with cephfs kernel driver

2014-12-18 Thread Lindsay Mathieson

On Thu, 18 Dec 2014 08:41:21 PM Udo Lembke wrote:
 have you tried the different cache-options (no cache, write through,
 ...) which proxmox offer, for the drive?


I tried with writeback and it didn't corrupt.
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Reproducable Data Corruption with cephfs kernel driver

2014-12-18 Thread Lindsay Mathieson

On Thu, 18 Dec 2014 11:23:42 AM Gregory Farnum wrote:
 Do you have any information about *how* the drive is corrupted; what
 part Win7 is unhappy with? 

Failure to find the boot sector I think, I'll run it again and take a screen 
shot.

 I don't know how Proxmox configures it, but
 I assume you're storing the disk images as single files on the FS?

its a single KVM QCOW2 file.

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Need help from Ceph experts

2014-12-18 Thread Lindsay Mathieson

On 19 December 2014 at 11:14, Christian Balzer ch...@gol.com wrote:

 Hello,

 On Thu, 18 Dec 2014 16:12:09 -0800 Craig Lewis wrote:

 Firstly I'd like to confirm what Craig said about small clusters.
 I just changed my four storage node test cluster from 1 OSD per node to 4
 and it can now saturate a 1GbE link (110MB/s) where before it peaked at
 50-60MB/s.

What min//max sizes do you have set? Anything special in your crush map?

Did it improve your write speed and latency?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] 0.88

2014-12-19 Thread Lindsay Mathieson

Will this make its way into the debian repo eventually?

http://ceph.com/debian-giant
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 0.88

2014-12-19 Thread Lindsay Mathieson

On Fri, 19 Dec 2014 03:27:53 PM you wrote:
 On 19/12/2014 15:12, Lindsay Mathieson wrote:
  Will this make its way into the debian repo eventually?
 
 This is a development release that is not meant to be published in
 distributions such as Debian, CentOS etc.

Ah, thanks.

Its not clear from the blog posting:

 http://ceph.com/releases/v0-88-released/

Does ceph follow some numbering standard for dev vs productions or is it 
adhoc?

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 0.88

2014-12-19 Thread Lindsay Mathieson

On Fri, 19 Dec 2014 03:57:42 PM you wrote:
The stable release have real names, that is what makes them different from
 development releases (dumpling, emperor, firefly, giant, hammer).

Ah, so we had two named firefly releases (Firefly 0.86  Firefly 0.87) - they 
were both production and we have Giant 0.87, but 0.88 is just 0.88  :)

thanks,

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snapshot slow restore

2014-12-26 Thread Lindsay Mathieson

On Tue, 16 Dec 2014 11:50:37 AM Robert LeBlanc wrote:
 COW into the snapshot (like VMware, Ceph, etc):
 When a write is committed, the changes are committed to a diff file and the
 base file is left untouched. This only has a single write penalty, if you
 want to discard the child, it is fast as you just delete the diff file. The
 negative side effects is that reads may have to query each diff file before
 being satisfied, and if you want to delete the snapshot, but keep the
 changes (merge the snapshot into the base), then you have to copy all the
 diff blocks into the base image.


Sorry to revive an old thread ...

Does this mean with ceph snapshots if you leave the hanging around then the 
snapshot file will get larger and larger as writes are made? and reads will 
slow down?

So not a good idea to leave snapshots of a VM undeleted for long?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] xfs/nobarrier

2014-12-26 Thread Lindsay Mathieson

I see a lot of people mount their xfs osd's with nobarrier for extra 
performance, certainly it makes a huge difference to my small system.

However I don't do it as my understanding is this runs a risk of data 
corruption in the event of power failure - this is the case, even with ceph?


side note: How do I tell if my disk cache is battery backed? I have WD Red 3TB 
(WD30EFRX-68EUZN0) with 64M cache, but no mention of battery backup in the 
docs. I presume that means it isn't? :)
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] xfs/nobarrier

2014-12-27 Thread Lindsay Mathieson

On Sat, 27 Dec 2014 09:03:16 PM Mark Kirkwood wrote:
 Yep. If you have 'em plugged into a RAID/HBA card with a battery backup 
 (that also disables their individual caches) then it is safe to use 
 nobarrier, otherwise data corruption will result if the server 
 experiences power loss.


Thanks Mark,

do people consider a UPS + Shutdown procedures a suitable substitute?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] xfs/nobarrier

2014-12-27 Thread Lindsay Mathieson

On Sat, 27 Dec 2014 04:59:51 PM you wrote:
 Power supply means bigger capex and less redundancy, as the emergency
 procedure in case of power failure is less deterministic than with
 controlled battery-backed cache. 

Yes, the whole  auto shut-down procedure is rather more complex and fragile 
for a UPS than a controller cache

 Anyway XFS nobarrier
 does not bring enough performance boost to be enabled by my
 experience.

It makes a non-trivial difference on my (admittedly slow) setup, with write 
bandwidth going from 35 MB/s to 51 MB/s

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] xfs/nobarrier

2014-12-27 Thread Lindsay Mathieson

On Sat, 27 Dec 2014 06:02:32 PM you wrote:
 Are you able to separate log with data in your setup and check the
 difference? 

Do you mean putting the OSD journal on a separate disk? I have the journals on 
SSD partitions, which has helped a lot, previously I was getting 13 MB/s

Its not a good SSD - Samsung 840 EVO :( one of my plans for the new year is to 
get SSD's with better seq write speed and IOPS

I've been trying to figure out if adding more OSD's will improve my 
performance, I only have 2 OSD's (one per node)

  So, depending on type of your benchmark
 (sync/async/IOPS-/bandwidth-hungry) you may win something just for
 crossing journal and data between disks (and increase failure domain
 for a single disk as well  ).

One does tend to foxus on raw seq read/writes for becnhmarking, but my actual 
usage is solely for hosting KVM images, so really random R/W is probably more 
important.

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Improving Performance with more OSD's?

2014-12-27 Thread Lindsay Mathieson

I'm looking to improve the raw performance on my small setup (2 Compute Nodes, 
2 OSD's). Only used for hosting KVM images.

Raw read/write is roughly 200/35 MB/s. Starting 4+ VM's simultaneously pushes 
iowaits over 30%, though the system keeps chugging along.

Budget is limited ... :(

I plan to upgrade my SSD journals to something better than the Samsung 840 
EVO's (Intel 520/530?)

One of the things I see mentioned a lot in blogs etc is how ceph's performance 
improves as you add more OSD's and that the quality of the disks does not 
matter so much as the quantity.

How does this work? does ceph stripe reads and writes across the OSD's to 
improve performance?

If I add 3 cheap OSD's to each node (500GB - 1TB) with 10GB SSD journal 
partition each could I expect a big improvement in performance?

What sort of redundancy to setup? currently its min= 1, size=2. Size is not an 
issue, we already have 150% more space than we need, redundancy and 
performance is more important.

Now I think on it, we can live with the slow write performance, but reducing 
iowait would be *really* good.

thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] xfs/nobarrier

2014-12-28 Thread Lindsay Mathieson

On Sat, 27 Dec 2014 09:41:19 PM you wrote:
   I certainly wouldn't, I've seen utility power fail and the transfer
   switch fail to transition to UPS strings. Had this happened to me with
   nobarrier it would have been a very sad day.
  
   
 
  
 
  I'd second that. In addition I've heard of cases where the switchover to
  the UPS worked ok but the damn thing had a flat battery! So the
  switchover process and UPS reliability need to be be well rehearsed +
  monitored if you want to reply on this type of solution.
 
 Right.
 
 nobarrier is definitely *NOT* recommended under almost any circumstances.  

Thanks all, I'll definitely stick with nobarrier :)
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2014-12-28 Thread Lindsay Mathieson

Appreciate the detailed reply Christian.

On Sun, 28 Dec 2014 02:49:08 PM Christian Balzer wrote:
 On Sun, 28 Dec 2014 08:59:33 +1000 Lindsay Mathieson wrote:
  I'm looking to improve the raw performance on my small setup (2 Compute
  Nodes, 2 OSD's). Only used for hosting KVM images.
 
 This doesn't really make things clear, do you mean 2 STORAGE nodes with 2
 OSDs (HDDs) each?

2 Nodes, 1 OSD per node

Hardware is indentical for all nodes  disks
- Mobo: P9X79 WS
- CPU:Intel  Xeon E5-2620
- RAM: 32 GB ECC
- 1GB Nic Public Access
- 2 * 1GB Bond for ceph
- OSD: 3TB WD Red
- Journal: 10GB on Samsung 840 EVO

3rd Node
 - Monitor only, for quorum
- Intel Nuc 
- 8GB RAM
- CPU: Celeron N2820



 In either case that's a very small setup (and with a replication of 2 a
 risky one, too), so don't expect great performance.

Ok.

 
 Throughput numbers aren't exactly worthless, but you will find IOPS to be
 the killer in most cases. Also without describing how you measured these
 numbers (rados bench, fio, bonnie, on the host, inside a VM) they become
 even more muddled.

- rados bench on the node to test raw write 
- fio in a VM
- Crystal DiskMark in a windows VM to test IOPS


 You really, really want size 3 and a third node for both performance
 (reads) and redundancy.

I can probably scare up a desktop PC to use as a fourth node with another 3TB 
disk.

I'd prefer to use the existing third node (the Intel Nuc), but its expansion 
is limited to USB3 devices. Are there USB3 external drives with decent 
performance stats?


thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] xfs/nobarrier

2014-12-29 Thread Lindsay Mathieson

On Mon, 29 Dec 2014 07:04:47 PM Mark Kirkwood wrote:
  Thanks all, I'll definitely stick with nobarrier
 
 Maybe you meant to say *barrier* ?


Oops :) Yah
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson

On Mon, 29 Dec 2014 11:12:06 PM Christian Balzer wrote:
 Is that a private cluster network just between Ceph storage nodes or is
 this for all ceph traffic (including clients)?
 The later would probably be better, a private cluster network twice as
 fast as the client one isn't particular helpful 99% of the time.


The later - all ceph traffic including clients (qemu rbd).

  3rd Node
  
   - Monitor only, for quorum
  
  - Intel Nuc
  - 8GB RAM
  - CPU: Celeron N2820
 
 Uh oh, a bit weak for a monitor. Where does the OS live (on this and the
 other nodes)? The leveldb (/var/lib/ceph/..) of the monitors likes it fast,
 SSDs preferably.

On a SSD (all the nodes have OS on SSD).

Looks like I misunderstood the purpose of the monitors, I presumed they were 
just for monitoring node health. They do more than that?


 The closer it is to the current storage nodes, the better.
 The slowest OSD in a cluster can impede all (most of) the others.

Closer as in similar hardware specs?




-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson

On Sun, 28 Dec 2014 04:08:03 PM Nick Fisk wrote:
 If you can't add another full host, your best bet would be to add another
 2-3 disks to each server. This should give you a bit more performance. It's
 much better to have lots of small disks rather than large multi-TB ones from
 a performance perspective. So maybe look to see if you can get 500GB/1TB
 drives cheap.


Thanks, will do.

Can you set replica 3 with two nodes and 6-8 OSD's? one would have to tweak 
the crush map?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson

On Mon, 29 Dec 2014 11:29:11 PM Christian Balzer wrote:
 Reads will scale up (on a cluster basis, individual clients might
 not benefit as much) linearly with each additional device (host/OSD).

I'm taking that to mean individual clients as a whole will be limited by the 
speed of individual OSD's, but multiple clients will spread their reads 
between multiple OSD's, leading to a higher aggregate bandwidth than 
individual disks could sustain.

I guess the limiting factor there would be network.

 
 Writes will scale up with each additional device divided by replica size. 

So adding OSD's will increase write speed from individual clients? seq writes 
go out to different OSD's simultaneously?

 
 Fun fact, if you have 1 node with replica 1 and add 2 more identical nodes
 and increase the replica to 3, your write performance will be less than 50%
 of the single node. 

Interesting - this seems to imply that writes go to the replica OSD's one 
after another, rather than simultaneously like I expected.

thanks,

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson

On Sun, 28 Dec 2014 04:08:03 PM Nick Fisk wrote:
  This should give you a bit more performance. It's
 much better to have lots of small disks rather than large multi-TB ones from
 a performance perspective. So maybe look to see if you can get 500GB/1TB
 drives cheap.

Is this from the docs still relevant in this case?

/A weight is the relative difference between device capacities. We recommend 
using
1.00 as the relative weight for a 1TB storage device. In such a scenario, a 
weight of
0.5 would represent approximately 500GB, and a weight of 3.00 would represent
approximately 3TB/

So I would have maybe 1 3TB and 2 * 1TB

Kinda regret getting the 3TB drives now  learning experience.


--
Lindsay


signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson

On Tue, 30 Dec 2014 12:48:58 PM Christian Balzer wrote:
  Looks like I misunderstood the purpose of the monitors, I presumed they
  were just for monitoring node health. They do more than that?
 
  
 
 They keep the maps and the pgmap in particular is of course very busy.
 All that action is at: /var/lib/ceph/mon/monitorname/store.db/ .
 
 In addition monitors log like no tomorrow, also straining the OS storage.


Yikes!

Did a quick check, root  data storage at under 10% usage - Phew!

Could the third under spec'd monitor (which only has 1GB Eth) be slowing 
things down? worthwhile removing it as a test?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson

On Tue, 30 Dec 2014 12:48:58 PM Christian Balzer wrote:
  Looks like I misunderstood the purpose of the monitors, I presumed they
  were just for monitoring node health. They do more than that?
 
  
 
 They keep the maps and the pgmap in particular is of course very busy.
 All that action is at: /var/lib/ceph/mon/monitorname/store.db/ .
 
 In addition monitors log like no tomorrow, also straining the OS storage.


Yikes!

Did a quick check, root  data storage at under 10% usage - Phew!

Could the third under spec'd monitor (which only has 1GB Eth) be slowing 
things down? worthwhile removing it as a test?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2014-12-29 Thread Lindsay Mathieson

On 30 December 2014 at 14:28, Christian Balzer ch...@gol.com wrote:

 Use a good monitoring tool like atop to watch how busy things are.

 And do that while running a normal rados bench like this from a client
 node:
 rados -p rbd bench 60 write -t 32

 And again like this:
 rados -p rbd bench 60 write -t 32 -b 4096

 In particular (but not only), compare the CPU usage during those runs.

Interesting results -

First 14 seconds:
CPU : 1 core at sys/user 2%/1%, rest idle
HD :  45% Busy
SDDD: 35% Busy

- After 14 seconds
CPU : 1 core at sys/user 20%/7%, rest idle
  HD : 100% Busy
SDD: 30% - 50% Busy

Journal size is 10GB
max sync interval = 46.5
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to remove mds from cluster

2014-12-30 Thread Lindsay Mathieson

On Tue, 30 Dec 2014 03:11:25 PM debian Only wrote:
 ceph 0.87 , Debian 7.5,   anyone can help ?

 2014-12-29 20:03 GMT+07:00 debian Only onlydeb...@gmail.com:
 i want to move mds from one host to another.

 how to do it ?

 what did i do as below, but ceph health not ok, mds was not removed :

 root@ceph06-vm:~# ceph mds rm 0 mds.ceph06-vm
 mds gid 0 dne

 root@ceph06-vm:~# ceph health detail
 HEALTH_WARN mds ceph06-vm is laggy
 mds.ceph06-vm at 192.168.123.248:6800/4350 is laggy/unresponsive

I removed an mds using this guide:

  
http://www.sebastien-han.fr/blog/2012/07/04/remove-a-mds-server-from-a-ceph-cluster/

and ran into your problem, which is also mentioned there.

resolved it using the guide suggestion:

 $ ceph mds newfs metadata data --yes-i-really-mean-it

--
Lindsay


signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2014-12-30 Thread Lindsay Mathieson

On Tue, 30 Dec 2014 11:26:08 AM Eneko Lacunza wrote:
  have a small setup with such a node (only 4 GB RAM, another 2 good 
 nodes for OSD and virtualization) - it works like a charm and CPU max is 
 always under 5% in the graphs. It only peaks when backups are dumped to 
 its 1TB disk using NFS.

Yes, CPU has not been a problem for em at all, I even occasional run a windows 
VM on the NUC.

Sounds like we have very similar setups - 2 good ndoes that run full osd's, 
mon and VM's, and a third smaller node for quorum.

Do you have OSD's on your thrid ndoe as well?

  I'd advise against it.
  That node doing both monitor and OSDs is not going to end well.
 
 My experience has led me not to trust USB disks for continuous 
 operation, I wouldn't do this either.

Yeah, it doesn't sound like a good idea. Pity, the nucs are so small and quiet

thanks,

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Crush Map and SSD Pools

2014-12-30 Thread Lindsay Mathieson

I looked at the section for setting up different pools with different OSD's 
(e.g SSD Pool):

  
http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds

And it seems to make the assumption that the ssd's and platters all live on 
separate hosts.

Not the case at all for my setup and I imagine for most people I have ssd's 
mixed with
platters on the same hosts.

In that case should one have the root buckets referencing buckets not based on 
hosts, e.g,
something like this:


# devices
# Platters
device 0 osd.0
device 1 osd.1

# SSD
device 2 osd.2
device 3 osd.3

host vnb {
id -2   # do not change unnecessarily
# weight 1.000
alg straw
hash 0  # rjenkins1
item osd.0 weight 1.000
item osd.2 weight 1.000
}
host vng {
id -3   # do not change unnecessarily
# weight 1.000
alg straw
hash 0  # rjenkins1
item osd.1 weight 1.000
item osd.3 weight 1.000
}

row disk-platter {
alg straw
hash 0  # rjenkins1
item osd.0 weight 1.000
item osd.1 weight 1.000
}

row disk-ssd {
alg straw
hash 0  # rjenkins1
item osd.2 weight 1.000
item osd.3 weight 1.000
}


root default {
id -1   # do not change unnecessarily
# weight 2.000
alg straw
hash 0  # rjenkins1
item disk-platter weight 2.000
}

root ssd {
  id -4
  alg straw
  hash 0
  item disk-ssd weight 2.000
  }

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

rule ssd {
  ruleset 1
  type replicated
  min_size 0
  max_size 4
  step take ssd
  step chooseleaf firstn 0 type host
  step emit
  }


--
Lindsay


signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Weights: Hosts vs. OSDs

2014-12-30 Thread Lindsay Mathieson

On Tue, 30 Dec 2014 05:07:31 PM Nico Schottelius wrote:
 While writing this I noted that the relation / factor is exactly 5.5 times
 wrong, so I *guess* that ceph treats all hosts with the same weight (even
 though it looks differently to me in the osd tree and the crushmap)?

I believe If you have the default replication factor of 3, then with 3 hosts 
you will effectively  have a weight of 1 per host no matter what you specify 
because ceph will be forced to place a copy of all data on each host to 
satisfy replication requirements.
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Crush Map and SSD Pools

2014-12-30 Thread Lindsay Mathieson

On Tue, 30 Dec 2014 04:18:07 PM Erik Logtenberg wrote:
 As you can see, I have four hosts: ceph-01 ... ceph-04, but eight host
 entries. This works great.


you have 
 - host ceph-01
 - host ceph-01-ssd

Don't the host names have to match the real host names?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Crush Map and SSD Pools

2014-12-30 Thread Lindsay Mathieson

On Tue, 30 Dec 2014 04:18:07 PM Erik Logtenberg wrote:
 As you can see, I have four hosts: ceph-01 ... ceph-04, but eight host
 entries. This works great.


you have 
 - host ceph-01
 - host ceph-01-ssd

Don't the host names have to match the real host names?
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Crush Map and SSD Pools

2014-12-30 Thread Lindsay Mathieson

On Tue, 30 Dec 2014 10:38:14 PM Erik Logtenberg wrote:
 No, bucket names in crush map are completely arbitrary. In fact, crush
 doesn't really know what a host is. It is just a bucket, like rack
 or datacenter. But they could be called cat and mouse just as well.

Hmmm, I tried that earlier and ran into problems with starting/stopping the 
osd - but maybe I screwed something else up. Will give it another go.

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Adding Crush Rules

2014-12-30 Thread Lindsay Mathieson

Is there a command to do this without decompiling/editing/compiling the crush 
set? makes me nervous ...
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Crush Map and SSD Pools

2014-12-30 Thread Lindsay Mathieson

On Tue, 30 Dec 2014 11:25:40 PM Erik Logtenberg wrote:
 f you want to be able to start your osd's with /etc/init.d/ceph init
 script, then you better make sure that /etc/ceph/ceph.conf does link
 the osd's to the actual hostname

I tried again and it was ok for a short while, then *something* moved the ssd 
osd's from host-ssd to host. Fortunately I had them weighted at 0.

I suspect it was the cluster manager I'm using (proxmox) which adds a simple 
gui layer over ceph, I suspect it doesn't deal with this usecase yet.

I'll take it to the proxmox list.

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Crush Map and SSD Pools

2014-12-31 Thread Lindsay Mathieson

On Wed, 31 Dec 2014 11:09:35 AM you wrote:
 I believe that the upstart scripts will do this by default, they call out to
 a bash script (I can't remember precisely what that is off the top of my
 head) which then returns the crush rule, which will default to host=X osd=X
 unless it's overridden somewhere (ceph.conf).
 
 If memory serves there's the ability to provide your own script to call out
 to in order to provide the crush rule.


Good to know, thanks.
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Weighting question

2014-12-31 Thread Lindsay Mathieson

As mentioned before :) we have two osd ndoes with one 3TB osd each. (replica 
2)

About to add a smaller (1TB) faster drive to each node

From the docs, normal practice would be to weight it in accordance with size, 
i.e 3 for the 3TB OSD, 1 for the 1TB OSD.

But I'd like to spread it 50/50 to take better advantage of the faster drive, 
so weight them all at 1. Bad idea?

We only have 1TB of data so I'm presuming the 1TB drives would get 500GB each.

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] redundancy with 2 nodes

2014-12-31 Thread Lindsay Mathieson

On Thu, 1 Jan 2015 02:59:05 PM Jiri Kanicky wrote:
 I would expect that if I shut down one node, the system will keep 
 running. But when I tested it, I cannot even execute ceph status 
 command on the running node.

2 osd Nodes, 3 Mon nodes here, works perfectly for me.

How many monitors do you have?
Maybe you need a third monitor only node for quorum?


 
 I set osd_pool_default_size = 2 (min_size=1) on all pools, so I 
 thought that each copy will reside on each node. Which means that if 1 
 node goes down the second one will be still operational.


does:
ceph osd pool get {pool name} size
  return 2

ceph osd pool get {pool name} min_size
  return 1


-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] redundancy with 2 nodes

2014-12-31 Thread Lindsay Mathieson

On Thu, 1 Jan 2015 03:46:33 PM Jiri Kanicky wrote:
 Hi,
 
 I have:
 - 2 monitors, one on each node
 - 4 OSDs, two on each node
 - 2 MDS, one on each node

POOMA U here, but I don't think you can reach quorum with one out of two 
monitors, you need a odd number:

http://ceph.com/docs/master/rados/configuration/mon-config-ref/#monitor-quorum

Perhaps try removing one monitor, so you only have one left, then take the 
node without a monitor down.

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Worthwhile setting up Cache tier with small leftover SSD partions?

2015-01-02 Thread Lindsay Mathieson

Expanding my tiny ceph setup from 2 OSD's to six, and two extra SSD's for 
journals (IBM 530 120GB)

Yah, I know the 5300's would be much better 

Assuming I use 10GB ber OSD for journal and 5GB spare to improve the SSD 
lifetime, that leaves 85GB spare per SSD.


Is it worthwhile setting up a 2 *85GB OSD Cache Tier (Replica 2)? Usage is for 
approx 15 Active VM's, used mainly for development and light database work.

Maybe its way to small and would be continually shuffling hot data.

Also - is writeback dangerous for cache tiering? it seems to be safe to me as 
the data is being written safely to the cache tier and will be flushed to the 
backing store on restart after an power failure etc.

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Weighting question

2015-01-01 Thread Lindsay Mathieson

On Thu, 1 Jan 2015 08:27:33 AM Dyweni - Ceph-Users wrote:
 I suspect a better configuration would be to leave your weights alone 
 and to
 change your primary affinity so that the osd with the ssd is used first. 

Interesting 

   You
 might a little improvement on the writes (since the spinners have to 
 work too),
 but the reads should have the most improvement (since ceph only has to 
 read
 from the ssd).

Couple of things:
- The SSD will be partitioned for each OSD to have a journal

- I thought Journals were for writes only, not reads?

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Added OSD's, weighting

2015-01-03 Thread Lindsay Mathieson

I just added 4 OSD's to my 2 OSD cluster (2 Nodes, now have 3 OSD's per
node).

Given its the weekend and not in use, I've set them all to weight 1, but
looks like it going to take a while to rebalance ... :)

Is having them all at weight 1 the fastest way to get back to health, or is
it causing contention?

Current health:

ceph -s
cluster f67ef302-5c31-425d-b0fe-cdc0738f7a62
 health HEALTH_WARN 227 pgs backfill; 2 pgs backfilling; 97 pgs
degraded; 29 pgs recovering; 68 pgs recovery_wait; 97 pgs stuck degraded;
326 pgs stuck unclean; recovery 30464/943028 objects degraded (3.230%);
727189/943028 objects misplaced (77.112%); mds cluster is degraded; mds 1
is laggy
 monmap e9: 3 mons at {0=
10.10.10.240:6789/0,1=10.10.10.241:6789/0,2=10.10.10.242:6789/0}, election
epoch 770, quorum 0,1,2 0,1,2
 mdsmap e212: 1/1/1 up {0=1=up:replay(laggy or crashed)}
 osdmap e1474: 6 osds: 6 up, 6 in
  pgmap v828583: 512 pgs, 4 pools, 1073 GB data, 282 kobjects
2072 GB used, 7237 GB / 9310 GB avail
30464/943028 objects degraded (3.230%); 727189/943028 objects
misplaced (77.112%)
 186 active+clean
 227 active+remapped+wait_backfill
  68 active+recovery_wait+degraded
   2 active+remapped+backfilling
  29 active+recovering+degraded
recovery io 24639 kB/s, 6 objects/s


thanks,

-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Added OSD's, weighting

2015-01-03 Thread Lindsay Mathieson

On Sat, 3 Jan 2015 10:40:30 AM Gregory Farnum wrote:
 You might try temporarily increasing the backfill allowance params so that
 the stuff can move around more quickly. Given the cluster is idle it's
 definitely hitting those limits. ;) -Greg


Thanks Greg, but it finished overnight anyway :) OSD's seem to have 
distributed the data as expected.

Playing with benchmarks now.
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] snapshoting on btrfs vs xfs

2015-02-04 Thread Lindsay Mathieson

On 5 February 2015 at 07:22, Sage Weil s...@newdream.net wrote:

  Is the snapshoting performed by ceph or by the fs? Can we switch to
  xfs and have the same capabilities: instant snapshot + instant boot
  from snapshot?

 The feature set and capabilities are identical.  The difference is that on
 btrfs we are letting btrfs do the efficient copy-on-write cloning when we
 touch a snapshotted object while with XFS we literally copy the object
 file (usually 4MB) on the first write.



Are ceph snapshots really that much faster when using btrfs underneath? one
of the problem we have with ceph is that snapshot take/restore is insanely
slow, tens of minutes - but we are using xfs.


-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs not mounting on boot

2015-02-07 Thread Lindsay Mathieson

On Tue, 3 Feb 2015 05:24:19 PM Daniel Schneller wrote:
  Now I think on it, that might just be it - I seem to recall a similar
 problem
  with cifs mounts, despite having the _netdev option. I had to issue a
  mount in /etc/network/if-up.d/
 
  
 
  I'll test than and get back to you
 
 We had similar issues.

Thanks for your reply Daniel, I only just noticed it.

 As you don't say which version you are using, 

Giant - 0.87


 this is
 what should be in /sbin/mount.fuse.ceph:
 
 # strip out '_netdev' option; libfuse doesn't like it
 opts=`echo $opts | sed 's/,_netdev//' | sed 's/_netdev,//'

It was already there unfortunately.

For now I am making do with an init script.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] two mount points, two diffrent data

2015-01-16 Thread Lindsay Mathieson

On Wed, 14 Jan 2015 02:20:21 PM Rafał Michalak wrote:
 Why data not replicating on mounting fs ?
 I try with filesystems ext4 and xfs
 The data is visible only when unmounted and mounted again


Because you are not using a cluster aware filesystem - the respective mounts 
don't know when changes are made to the underlying block device (rbd) by the 
other mount. What you are doing *will* lead to file corruption.

Your need to use a distributed filesystem such as GFS2 or cephfs.

CephFS would be probably be the easiest to setup.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-16 Thread Lindsay Mathieson

On Fri, 16 Jan 2015 08:48:38 AM Wido den Hollander wrote:
 In Ceph world 0.72.2 is ancient en pretty old. If you want to play with
 CephFS I recommend you upgrade to 0.90 and also use at least kernel 3.18

Does the kernel version matter if you are using ceph-fuse?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs degraded with 3 MONs and 1 OSD node

2015-01-19 Thread Lindsay Mathieson

On 20 January 2015 at 14:10, Jiri Kanicky j...@ganomi.com wrote:

  Hi,

 BTW, is there a way how to achieve redundancy over multiple OSDs in one
 box by changing CRUSH map?



I asked that same question myself a few weeks back :)

The answer was yes - but fiddly and why would you do that?

Its kinda breaking the purpose of ceph, which is large amounts of data
stored redundantly over multiple nodes.

Perhaps you should re-examine your requirements. If what you want is data
redundantly stored on hard disks on one node, perhaps you would be better
served by creating a ZFS raid setup. With just one node it would be easier
and more flexible - better performance as well.

Alternatively, could you put some OSD's on your monitor ndoes? what spec
are they?





 Thank you
 Jiri


 On 20/01/2015 13:37, Jiri Kanicky wrote:

 Hi,

 Thanks for the reply. That clarifies it. I thought that the redundancy can
 be achieved with multiple OSDs (like multiple disks in RAID) in case you
 don't have more nodes. Obviously the single point of failure would be the
 box.

 My current setting is:
 osd_pool_default_size = 2

 Thank you
 Jiri


 On 20/01/2015 13:13, Lindsay Mathieson wrote:

You only have one osd node (ceph4). The default replication
 requirements  for your pools (size = 3) require osd's spread over three
 nodes, so the data can be replicate on three different nodes. That will be
 why your pgs are degraded.

  You need to either add mode osd nodes or reduce your size setting down to
 the number of osd nodes you have.

  Setting your size to 1 would be a bad idea, there would be no redundancy
 in your data at all. Loosing one disk would destroy all your data.

  The command to see you pool size is:

  sudo ceph osd pool get poolname size

  assuming default setup:

 ceph osd pool  get rbd size
  returns: 3

 On 20 January 2015 at 10:51, Jiri Kanicky j...@ganomi.com wrote:

 Hi,

 I just would like to clarify if I should expect degraded PGs with 11 OSD
 in one node. I am not sure if a setup with 3 MON and 1 OSD (11 disks) nodes
 allows me to have healthy cluster.

 $ sudo ceph osd pool create test 512
 pool 'test' created

 $ sudo ceph status
 cluster 4e77327a-118d-450d-ab69-455df6458cd4
  health HEALTH_WARN 512 pgs degraded; 512 pgs stuck unclean; 512 pgs
 undersized
  monmap e1: 3 mons at {ceph1=
 172.16.41.31:6789/0,ceph2=172.16.41.32:6789/0,ceph3=172.16.41.33:6789/0},
 election epoch 36, quorum 0,1,2 ceph1,ceph2,ceph3
  osdmap e190: 11 osds: 11 up, 11 in
   pgmap v342: 512 pgs, 1 pools, 0 bytes data, 0 objects
 53724 kB used, 9709 GB / 9720 GB avail
  512 active+undersized+degraded

 $ sudo ceph osd tree
 # idweight  type name   up/down reweight
 -1  9.45root default
 -2  9.45host ceph4
 0   0.45osd.0   up  1
 1   0.9 osd.1   up  1
 2   0.9 osd.2   up  1
 3   0.9 osd.3   up  1
 4   0.9 osd.4   up  1
 5   0.9 osd.5   up  1
 6   0.9 osd.6   up  1
 7   0.9 osd.7   up  1
 8   0.9 osd.8   up  1
 9   0.9 osd.9   up  1
 10  0.9 osd.10  up  1


 Thank you,
 Jiri
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 Lindsay






-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs degraded with 3 MONs and 1 OSD node

2015-01-21 Thread Lindsay Mathieson

You only have one osd node (ceph4). The default replication requirements
for your pools (size = 3) require osd's spread over three nodes, so the
data can be replicate on three different nodes. That will be why your pgs
are degraded.

You need to either add mode osd nodes or reduce your size setting down to
the number of osd nodes you have.

Setting your size to 1 would be a bad idea, there would be no redundancy in
your data at all. Loosing one disk would destroy all your data.

The command to see you pool size is:

sudo ceph osd pool get poolname size

assuming default setup:

ceph osd pool  get rbd size
returns: 3

On 20 January 2015 at 10:51, Jiri Kanicky j...@ganomi.com wrote:

 Hi,

 I just would like to clarify if I should expect degraded PGs with 11 OSD
 in one node. I am not sure if a setup with 3 MON and 1 OSD (11 disks) nodes
 allows me to have healthy cluster.

 $ sudo ceph osd pool create test 512
 pool 'test' created

 $ sudo ceph status
 cluster 4e77327a-118d-450d-ab69-455df6458cd4
  health HEALTH_WARN 512 pgs degraded; 512 pgs stuck unclean; 512 pgs
 undersized
  monmap e1: 3 mons at {ceph1=172.16.41.31:6789/0,
 ceph2=172.16.41.32:6789/0,ceph3=172.16.41.33:6789/0}, election epoch 36,
 quorum 0,1,2 ceph1,ceph2,ceph3
  osdmap e190: 11 osds: 11 up, 11 in
   pgmap v342: 512 pgs, 1 pools, 0 bytes data, 0 objects
 53724 kB used, 9709 GB / 9720 GB avail
  512 active+undersized+degraded

 $ sudo ceph osd tree
 # idweight  type name   up/down reweight
 -1  9.45root default
 -2  9.45host ceph4
 0   0.45osd.0   up  1
 1   0.9 osd.1   up  1
 2   0.9 osd.2   up  1
 3   0.9 osd.3   up  1
 4   0.9 osd.4   up  1
 5   0.9 osd.5   up  1
 6   0.9 osd.6   up  1
 7   0.9 osd.7   up  1
 8   0.9 osd.8   up  1
 9   0.9 osd.9   up  1
 10  0.9 osd.10  up  1


 Thank you,
 Jiri
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cache pool tiering SSD journal

2015-01-18 Thread Lindsay Mathieson

On Sun, 18 Jan 2015 10:17:50 AM lidc...@redhat.com wrote:
 No, if you used cache tiering, It is no need to use ssd journal again.


Really? writes are as fast as with ssd journals?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] No auto-mount of OSDs after server reboot

2015-01-29 Thread Lindsay Mathieson

On Thu, 29 Jan 2015 03:05:41 PM Alexis KOALLA wrote:
 Hi,
 Today we  encountered an issue in  our Ceph cluster in  LAB.
 Issue: The servers that host the OSDs have rebooted and we have observed 
 that after the reboot there is no auto mount of OSD devices and we need 
 to manually performed the mount and then start the OSD as below:
 
 1- [root@osd.0] mount /dev/sdb2 /var/lib/ceph/osd/ceph-0
 2- [root@osd.0] start ceph-osd id=0


As far as I'm aware, ceph does not handle mounting of the base filesystem - its 
up to you to create an fstab entry for it.

The osd should autostart, but it will of course fail if the filesystem is not 
mounted.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2015-01-05 Thread Lindsay Mathieson

On Mon, 5 Jan 2015 01:15:03 PM Nick Fisk wrote:
 I've been having good results with OMD (Check_MK + Nagios)
 
 There is a plugin for Ceph as well that I made a small modification to, to
 work with a wider range of cluster sizes


Thanks, I'll check it out.

Currently trying zabbix, seems more straightforward than nagios.
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2015-01-05 Thread Lindsay Mathieson

On Mon, 5 Jan 2015 09:21:16 AM Nick Fisk wrote:
 Lindsay did this for performance reasons so that the data is spread evenly
 over the disks, I believe it has been accepted that the remaining 2tb on the
 3tb disks will not be used.

Exactly, thanks Nick.

I only have a terabyte of data, and its not going to grow much, if at all. 
With 3 OSD's per node the 1TB OSD's are only at 40% utilisation, but you can 
bet I'll be keeping a close eye on that.

Next step, get nagios or icinga setup.
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow/Hung IOs

2015-01-06 Thread Lindsay Mathieson

On Tue, 6 Jan 2015 12:07:26 AM Sanders, Bill wrote:
 14 and 18 happened to show up during that run, but its certainly not only
 those OSD's.  It seems to vary each run.  Just from the runs I've done
 today I've seen the following pairs of OSD's:


Could your osd nodes be paging? I know from watching atop, the performance on 
my nodes goes to the toilet when it starts hit the paging file.
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] slow read-performance inside the vm

2015-01-08 Thread Lindsay Mathieson

On Thu, 8 Jan 2015 05:36:43 PM Patrik Plank wrote:

Hi Patrick, just a beginner myself, but have been through a similar process 
recently :)

 With these values above, I get a write performance of 90Mb/s and read
 performance of 29Mb/s, inside the VM. (Windows 2008/R2 with virtio driver
 and writeback-cache enabled) Are these values normal with my configuration
 and hardware? - 

They do seem *very* odd. Your write performance is pretty good, your read 
performance is abysmal - with a similar setup, with 3 OSD's slower than yours 
I was getting 200 MB/s reads. 

Maybe your network setup is dodgy? Jumbo frames can be tricky. Have you run 
iperf between the nodes?

What are you using for benchmark testing on the windows guest?

Also, probably more useful to turn writeback caching off for benchmarking, the 
cache will totally obscure the real performance.


How is the VM mounted? rbd driver?


 The read-performance seems slow. Would the
 read-performance better if I run for every single disk a osd?

I think so - in general the more OSD's the better. Also having 8 HD's in RAID0 
is a recipe for disaster, you'll lost the entire OSD is one of those disks 
fails.

I'd be creating an OSD for each HD (8 per node), with a 5-10GB SSD partition 
per OSD for journal. Tedious, but should make a big difference to reads and 
writes.

Might be worth while trying
[global]
  filestore max sync interval = 30

as well.

-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] combined ceph roles

2015-02-10 Thread Lindsay Mathieson

Similar setup works well for me - 2 vm hosts, 1 Mon only mode. 6 osd's, 3 per 
vm host. Using rbd and cephfs

The more memory on your vm hosts, the better.

Lindsay Mathieson 

-Original Message-
From: David Graham xtn...@gmail.com
Sent: ‎11/‎02/‎2015 3:07 AM
To: ceph-us...@ceph.com ceph-us...@ceph.com
Subject: [ceph-users] combined ceph roles

Hello, I'm giving thought to a minimal footprint scenario with full redundancy. 
I realize it isn't ideal--and may impact overall performance --  but wondering 
if the below example would work, supported, or known to cause issue?


Example, 3x hosts each running:
-- OSD's
-- Mon
-- Client



I thought I read a post a while back about Client+OSD on the same host possibly 
being an issue -- but i am having difficulty finding that reference.


I would appreciate if anyone has insight into such a setup,

thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Worthwhile setting up Cache tier with small leftover SSD partions?

2015-01-04 Thread Lindsay Mathieson

On 5 January 2015 at 13:02, Christian Balzer ch...@gol.com wrote:

 On Fri, 02 Jan 2015 06:38:49 +1000 Lindsay Mathieson wrote:


 If you research the ML archives you will find that cache tiering currently
 isn't just fraught with peril (there are bugs) but most importantly isn't
 really that fast.



Yah, I had wondered that. Also  it seems to involve a lot of manual
tinkering with the crush map which I really want to avoid.





 Also given your setup, you should be able to saturate your network now, so
 probably negating the need for super fast storage to some extent.



Agreed - now I have it installed and configured, performance all round has
vastly improved - user are already commenting that their VM's are much more
responsive.

Pretty sure that we are now at a stage where I can just leave it alone :)

Though the boss now wants to migrate the big ass vsphere server to proxmox
(KVM), so I could use it as a third OSD server ,,,


Thanks for the help, much appreciated,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Improving Performance with more OSD's?

2015-01-04 Thread Lindsay Mathieson

Well I upgraded my cluster over the weekend :)
To each node I added:
- Intel SSD 530 for journals
- 2 * 1TB WD Blue

So two OSD Nodes had:
- Samsung 840 EVO SSD for Op. Sys.
- Intel 530 SSD for Journals (10GB Per OSD)
- 3TB WD Red
- 1 TB WD Blue
- 1 TB WD Blue
- Each disk weighted at 1.0
- Primary affinity of the WD Red (slow) set to 0

Took about 8 hours for 1TB of data to rebalance over the OSD's

Very pleased with results so far.

rados benchmark:
- Write bandwidth has increased from 49 MB/s to 140 MB/s
- Reads have stayed roughly the same at 500 MB/s

VM Benchmarks:
- Actually have stayed much the same, but have more depth - multiple VM's
share the bandwidth nicely.

Users are finding their VM's *much* less laggy.

Thanks for all the help and suggestions.

Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to remove mds from cluster

2015-01-04 Thread Lindsay Mathieson

Did you remove the mds.0 entry from ceph.conf?

On 5 January 2015 at 14:13, debian Only onlydeb...@gmail.com wrote:

 i have tried ' ceph mds newfs 1 0 --yes-i-really-mean-it'but not fix
 the problem

 2014-12-30 17:42 GMT+07:00 Lindsay Mathieson lindsay.mathie...@gmail.com
 :

  On Tue, 30 Dec 2014 03:11:25 PM debian Only wrote:

  ceph 0.87 , Debian 7.5,   anyone can help ?

 

  2014-12-29 20:03 GMT+07:00 debian Only onlydeb...@gmail.com:

  i want to move mds from one host to another.

 

  how to do it ?

 

  what did i do as below, but ceph health not ok, mds was not removed :

 

  root@ceph06-vm:~# ceph mds rm 0 mds.ceph06-vm

  mds gid 0 dne

 

  root@ceph06-vm:~# ceph health detail

  HEALTH_WARN mds ceph06-vm is laggy

  mds.ceph06-vm at 192.168.123.248:6800/4350 is laggy/unresponsive



 I removed an mds using this guide:




 http://www.sebastien-han.fr/blog/2012/07/04/remove-a-mds-server-from-a-ceph-cluster/



 and ran into your problem, which is also mentioned there.



 resolved it using the guide suggestion:



 $ ceph mds newfs metadata data --yes-i-really-mean-it



 --

 Lindsay

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-osd pegging CPU on giant, no snapshots involved this time

2015-02-19 Thread Lindsay Mathieson

On Thu, 19 Feb 2015 05:56:46 PM Florian Haas wrote:
 As it is, a simple perf top basically hosing the system wouldn't be
 something that is generally considered expected.


Could the disk or controller be failing?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph release timeline

2015-03-15 Thread Lindsay Mathieson

Thanks, thats quite helpful.

On 16 March 2015 at 08:29, Loic Dachary l...@dachary.org wrote:

 Hi Ceph,

 In an attempt to clarify what Ceph release is stable, LTS or development.
 a new page was added to the documentation:
 http://ceph.com/docs/master/releases/ It is a matrix where each cell is a
 release number linked to the release notes from
 http://ceph.com/docs/master/release-notes/. One line per month and one
 column per release.

 Cheers

 --
 Loïc Dachary, Artisan Logiciel Libre


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Doesn't Support Qcow2 Disk images

2015-03-12 Thread Lindsay Mathieson

On Thu, 12 Mar 2015 12:49:51 PM Vieresjoki, Juha wrote:
 But there's really no point, block storage is the only viable option for
 virtual machines performance-wise. With images you're dealing with multiple
 filesystem layers on top of the actual block devices, plus Ceph as block
 storage supports pretty much everything that qcow2 as a format does.


Not much difference that I've noticed on the small system I run, in fact cepfs 
seems to do some caching which speeds up things considerably in benchmarks. 
Not a huge difference in actual app performance that I've noticed.

Delete/Restore of snapshots is a lot quicker with qcow2 and for somereasons 
saving memory state is orders of magnitude quicker with qcow2.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Doesn't Support Qcow2 Disk images

2015-03-12 Thread Lindsay Mathieson

On Thu, 12 Mar 2015 09:27:43 AM Andrija Panic wrote:
 ceph is RAW format - should be all fine...so VM will be using that RAW
 format


If you use cephfs you can use qcow2.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Now it seems that could not find keyring

2015-03-10 Thread Lindsay Mathieson

On 11 March 2015 at 06:53, Jesus Chavez (jeschave) jesch...@cisco.com
wrote:

 KeyNotFoundError: Could not find keyring file:
 /etc/ceph/ceph.client.admin.keyring on host aries


 Well - have you verified the keyring is there on host aries and has the
right permissions?


-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] One host failure bring down the whole cluster

2015-03-30 Thread Lindsay Mathieson

On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote:
 Hi, all
 I have a two-node Ceph cluster, and both are monitor and osd. When
 they're both up, osd are all up and in, everything is fine... almost:



Two things.

1 -  You *really* need a min of three monitors. Ceph cannot form a quorum with 
just two monitors and you run a risk of split brain.


2 - You also probably have a min size of two set (the default). This means 
that you need a minimum  of two copies of each data object for writes to work. 
So with just two nodes, if one goes down you can't write to the other.


So:
- Install a extra monitor node - it doesn't have to be powerful, we just use a 
Intel Celeron NUC for that.

- reduce your minimum size to 1 (One).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] One host failure bring down the whole cluster

2015-03-30 Thread Lindsay Mathieson

On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote:
 Hi, all
 I have a two-node Ceph cluster, and both are monitor and osd. When
 they're both up, osd are all up and in, everything is fine... almost:



Two things.

1 -  You *really* need a min of three monitors. Ceph cannot form a quorum with 
just two monitors and you run a risk of split brain.


2 - You also probably have a min size of two set (the default). This means 
that you need a minimum  of two copies of each data object for writes to work. 
So with just two nodes, if one goes down you can't write to the other.


So:
- Install a extra monitor node - it doesn't have to be powerful, we just use a 
Intel Celeron NUC for that.

- reduce your minimum size to 1 (One).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v0.87.1 Giant released

2015-02-26 Thread Lindsay Mathieson

On 27 February 2015 at 16:01, Alexandre DERUMIER aderum...@odiso.com
wrote:

 I just upgraded my debian giant cluster,

 1)on each node:




Just done that too, all looking good.

Thanks all.

-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph 0.87-1

2015-02-25 Thread Lindsay Mathieson

The Ceph Debian Giant repo (http://ceph.com/debian-giant) seems to have had
an update from 0.87 to 0.87-1 on the 24-Feb.

Are there release notes anywhere on what changed etc? is there an upgrade
procedure?

thanks,

-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v0.87.1 Giant released

2015-02-26 Thread Lindsay Mathieson

Thanks for the notes Sage

On 27 February 2015 at 00:46, Sage Weil s...@newdream.net wrote:

 We recommend that all v0.87 Giant users upgrade to this release.


When upgrading from 0.87 to 0.87.1 is there any special procedure that
needs to followed? or is ti sufficient to upgrade each node and restart
ceph services one by one.


Thanks,



 --
 Lindsay

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] deep scrubbing causes osd down

2015-04-13 Thread Lindsay Mathieson

On 13 April 2015 at 16:00, Christian Balzer ch...@gol.com wrote:

 However the vast majority of people with production clusters will be
 running something stable, mostly Firefly at this moment.

  Sorry, 0.87 is giant.
 
  BTW, you could also set osd_scrub_sleep to your cluster. ceph would
  sleep some time as you defined when it has scrub some objects.
  But I am not sure whether is could works good to you.
 
 Yeah, that bit is backported to Firefly and can definitely help, however
 the suggested initial value is too small for most people who have scrub
 issues, starting with 0.5 seconds and see how it goes seems to work better.



Thanks xinze, Christian.

Yah, I'm on 0.87 in production - I can wait for the next release :)

In the meantime, from the prior msgs I've set this:

[osd]
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 5
osd_scrub_sleep = 0.5


Do the values look ok? is the [osd] section the right spot?

Thanks - Lindsay



-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] deep scrubbing causes osd down

2015-04-12 Thread Lindsay Mathieson

On 13 April 2015 at 11:02, Christian Balzer ch...@gol.com wrote:

 Yeah, that's a request/question that comes up frequently.
 And so far there's no option in Ceph to do that (AFAIK), it would be
 really nice along with scheduling options (don't scrub during peak hours),
 which have also been talked about.




I was just about to post a question on that ... :)

Just had devs and support bitching to me that all their VM's were running
like dogs (they were). Ceph decided to run a deep scrub Monday afternoon.

It boggles me that there are no options for controlling the schedule,
considering how critically important the timing is. I'd be happy to run a
deep scrub ever night at 1am. Anytime during the day is a disaster.

So currently I have noscrub and nodeep-scrub set which is really less than
ideal.


-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Is ceph.com down?

2015-04-15 Thread Lindsay Mathieson

Can't open at the moment, niever the website or apt.

Trying from Brisbane, Australia.
-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

1 2 >

1 - 100 of 137 matches

Mail list logo