Re: [ceph-users] v12.2.8 Luminous released

2018-09-05 Thread Adrian Saul
Can I confirm if this bluestore compression assert issue is resolved in 12.2.8? https://tracker.ceph.com/issues/23540 I notice that it has a backport that is listed against 12.2.8 but there is no mention of that issue or backport listed in the release notes. > -Original Message- >

Re: [ceph-users] SSDs for data drives

2018-07-12 Thread Adrian Saul
We started our cluster with consumer (Samsung EVO) disks and the write performance was pitiful, they had periodic spikes in latency (average of 8ms, but much higher spikes) and just did not perform anywhere near where we were expecting. When replaced with SM863 based devices the difference

Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Adrian Saul
with automount would probably work for you. From: Up Safe [mailto:upands...@gmail.com] Sent: Tuesday, 22 May 2018 12:33 AM To: David Turner <drakonst...@gmail.com> Cc: Adrian Saul <adrian.s...@tpgtelecom.com.au>; ceph-users <ceph-users@lists.ceph.com> Subject: Re: [ceph-users] multi site

Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Adrian Saul
We run CephFS in a limited fashion in a stretched cluster of about 40km with redundant 10G fibre between sites – link latency is in the order of 1-2ms. Performance is reasonable for our usage but is noticeably slower than comparable local ceph based RBD shares. Essentially we just setup the

Re: [ceph-users] Ceph iSCSI is a prank?

2018-03-04 Thread Adrian Saul
We are using Ceph+RBD+NFS under pacemaker for VMware. We are doing iSCSI using SCST but have not used it against VMware, just Solaris and Hyper-V. It generally works and performs well enough – the biggest issues are the clustering for iSCSI ALUA support and NFS failover, most of which we have

Re: [ceph-users] Thick provisioning

2017-10-18 Thread Adrian Saul
I concur - at the moment we need to manually sum the RBD images to look at how much we have "provisioned" vs what ceph df shows. in our case we had a rapid run of provisioning new LUNs but it took a while before usage started to catch up with what was provisioned as data was migrated in.

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Adrian Saul
. From: Samuel Soulard [mailto:samuel.soul...@gmail.com] Sent: Thursday, 12 October 2017 11:20 AM To: Adrian Saul <adrian.s...@tpgtelecom.com.au> Cc: Zhu Lingshan <ls...@suse.com>; dilla...@redhat.com; ceph-users <ceph-us...@ceph.com> Subject: RE: [ceph-users] Ceph-I

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Adrian Saul
As an aside, SCST iSCSI will support ALUA and does PGRs through the use of DLM. We have been using that with Solaris and Hyper-V initiators for RBD backed storage but still have some ongoing issues with ALUA (probably our current config, we need to lab later recommendations). >

Re: [ceph-users] bad crc/signature errors

2017-10-04 Thread Adrian Saul
We see the same messages and are similarly on a 4.4 KRBD version that is affected by this. I have seen no impact from it so far that I know about > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Jason Dillaman > Sent: Thursday, 5

Re: [ceph-users] osd create returns duplicate ID's

2017-09-29 Thread Adrian Saul
Do you mean that after you delete and remove the crush and auth entries for the OSD, when you go to create another OSD later it will re-use the previous OSD ID that you have destroyed in the past? Because I have seen that behaviour as well - but only for previously allocated OSD IDs that

Re: [ceph-users] librmb: Mail storage on RADOS with Dovecot

2017-09-22 Thread Adrian Saul
Thanks for bringing this to attention Wido - its of interest to us as we are currently looking to migrate mail platforms onto Ceph using NFS, but this seems far more practical. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Wido den

Re: [ceph-users] ceph-osd restartd via systemd in case of disk error

2017-09-19 Thread Adrian Saul
> I understand what you mean and it's indeed dangerous, but see: > https://github.com/ceph/ceph/blob/master/systemd/ceph-osd%40.service > > Looking at the systemd docs it's difficult though: > https://www.freedesktop.org/software/systemd/man/systemd.service.ht > ml > > If the OSD crashes due to

Re: [ceph-users] Ceph release cadence

2017-09-07 Thread Adrian Saul
> * Drop the odd releases, and aim for a ~9 month cadence. This splits the > difference between the current even/odd pattern we've been doing. > > + eliminate the confusing odd releases with dubious value > + waiting for the next release isn't quite as bad > - required upgrades every 9

Re: [ceph-users] Monitoring a rbd map rbd connection

2017-08-25 Thread Adrian Saul
If you are monitoring to ensure that it is mounted and active, a simple check_disk on the mountpoint should work. If the mount is not present, or the filesystem is non-responsive then this should pick it up. A second check to perhaps test you can actually write files to the file system would

Re: [ceph-users] Ruleset vs replica count

2017-08-25 Thread Adrian Saul
Yes - ams5-ssd would have 2 replicas, ams6-ssd would have 1 (@size 3, -2 = 1) Although for this ruleset the min_size should be set to at least 2, or more practically 3 or 4. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sinan Polat Sent: Friday, 25 August 2017

Re: [ceph-users] Ceph cluster with SSDs

2017-08-20 Thread Adrian Saul
> SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage - MZ- > 75E4T0B/AM | Samsung The performance difference between these and the SM or PM863 range is night and day. I would not use these for anything you care about with performance, particularly IOPS or latency. Their write

Re: [ceph-users] VMware + Ceph using NFS sync/async ?

2017-08-16 Thread Adrian Saul
> I'd be interested in details of this small versus large bit. The smaller shares is just simply to distribute the workload over more RBDs so the bottleneck doesn’t become the RBD device. The size itself doesn’t particularly matter but just the idea to distribute VMs across many shares rather

Re: [ceph-users] VMware + Ceph using NFS sync/async ?

2017-08-16 Thread Adrian Saul
We are using Ceph on NFS for VMWare – we are using SSD tiers in front of SATA and some direct SSD pools. The datastores are just XFS file systems on RBD managed by a pacemaker cluster for failover. Lessons so far are that large datastores quickly run out of IOPS and compete for performance –

Re: [ceph-users] Iscsi configuration

2017-08-08 Thread Adrian Saul
Hi Sam, We use SCST for iSCSI with Ceph, and a pacemaker cluster to orchestrate the management of active/passive presentation using ALUA though SCST device groups. In our case we ended up writing our own pacemaker resources to support our particular model and preferences, but I believe there

Re: [ceph-users] Does ceph pg scrub error affect all of I/O in ceph cluster?

2017-08-03 Thread Adrian Saul
Depends on the error case – usually you will see blocked IO messages as well if there is a condition causing OSDs to be unresponsive. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ??? Sent: Friday, 4 August 2017 1:34 PM To: ceph-users@lists.ceph.com Subject:

Re: [ceph-users] PGs per OSD guidance

2017-07-19 Thread Adrian Saul
Anyone able to offer any advice on this? Cheers, Adrian > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Adrian Saul > Sent: Friday, 14 July 2017 6:05 PM > To: 'ceph-users@lists.ceph.com' > Subject: [ceph-users] PG

[ceph-users] PGs per OSD guidance

2017-07-14 Thread Adrian Saul
Hi All, I have been reviewing the sizing of our PGs with a view to some intermittent performance issues. When we have scrubs running, even when only a few are, we can sometimes get severe impacts on the performance of RBD images, enough to start causing VMs to appear stalled or

[ceph-users] Deep scrub distribution

2017-07-05 Thread Adrian Saul
batches of PGs to deep scrub over time to push out the distribution again? Adrian Saul | Infrastructure Projects Team Lead IT T 02 9009 9041 | M +61 402 075 760 30 Ross St, Glebe NSW 2037 adrian.s...@tpgtelecom.com.au<mailto:adrian.s...@tpgtelecom.com.au> | www.tpg.com.a

Re: [ceph-users] VMware + CEPH Integration

2017-06-18 Thread Adrian Saul
> Hi Alex, > > Have you experienced any problems with timeouts in the monitor action in > pacemaker? Although largely stable, every now and again in our cluster the > FS and Exportfs resources timeout in pacemaker. There's no mention of any > slow requests or any peering..etc from the ceph logs so

Re: [ceph-users] design guidance

2017-06-06 Thread Adrian Saul
> > Early usage will be CephFS, exported via NFS and mounted on ESXi 5.5 > > and > > 6.0 hosts(migrating from a VMWare environment), later to transition to > > qemu/kvm/libvirt using native RBD mapping. I tested iscsi using lio > > and saw much worse performance with the first cluster, so it seems

Re: [ceph-users] rbd iscsi gateway question

2017-04-06 Thread Adrian Saul
for krbd. From: Nick Fisk [mailto:n...@fisk.me.uk] Sent: Thursday, 6 April 2017 5:43 PM To: Adrian Saul; 'Brady Deetz'; 'ceph-users' Subject: RE: [ceph-users] rbd iscsi gateway question I assume Brady is referring to the death spiral LIO gets into with some initiators, including vmware

Re: [ceph-users] rbd iscsi gateway question

2017-04-05 Thread Adrian Saul
I am not sure if there is a hard and fast rule you are after, but pretty much anything that would cause ceph transactions to be blocked (flapping OSD, network loss, hung host) has the potential to block RBD IO which would cause your iSCSI LUNs to become unresponsive for that period. For the

Re: [ceph-users] MySQL and ceph volumes

2017-03-07 Thread Adrian Saul
] Sent: Wednesday, 8 March 2017 10:36 AM To: Adrian Saul Cc: ceph-users Subject: Re: [ceph-users] MySQL and ceph volumes Thank you Adrian! I’ve forgot this option and I can reproduce the problem. Now, what could be the problem on ceph side with O_DSYNC writes? Regards Matteo

Re: [ceph-users] MySQL and ceph volumes

2017-03-07 Thread Adrian Saul
Possibly MySQL is doing sync writes, where as your FIO could be doing buffered writes. Try enabling the sync option on fio and compare results. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Matteo Dacrema > Sent: Wednesday, 8 March

Re: [ceph-users] Review of Ceph on ZFS - or how not to deploy Ceph for RBD + OpenStack

2017-01-10 Thread Adrian Saul
I would concur having spent a lot of time on ZFS on Solaris. ZIL will reduce the fragmentation problem a lot (because it is not doing intent logging into the filesystem itself which fragments the block allocations) and write response will be a lot better. I would use different devices for

Re: [ceph-users] When Zero isn't 0 (Crush weight mysteries)

2016-12-20 Thread Adrian Saul
I found the other day even though I had 0 weighted OSDs, there was still weight in the containing buckets which triggered some rebalancing. Maybe it is something similar, there was weight added to the bucket even though the OSD underneath was 0. > -Original Message- > From:

Re: [ceph-users] Crush rule check

2016-12-12 Thread Adrian Saul
> > > > > > > -Original Message- > > > From: Wido den Hollander [mailto:w...@42on.com] > > > Sent: Monday, 12 December 2016 7:07 PM > > > To: ceph-users@lists.ceph.com; Adrian Saul > > > Subject: Re: [ceph-users] Crush rule check >

Re: [ceph-users] Crush rule check

2016-12-12 Thread Adrian Saul
Thanks Wido. I had found the show-utilization test, but had not seen show-mappings - that confirmed it for me. thanks, Adrian > -Original Message- > From: Wido den Hollander [mailto:w...@42on.com] > Sent: Monday, 12 December 2016 7:07 PM > To: ceph-users@lists.ceph.com;

[ceph-users] Crush rule check

2016-12-10 Thread Adrian Saul
Hi Ceph-users, I just want to double check a new crush ruleset I am creating - the intent here is that over 2 DCs, it will select one DC, and place two copies on separate hosts in that DC. The pools created on this will use size 4 and min-size 2. I just want to check I have crafted this

Re: [ceph-users] [EXTERNAL] Re: osd set noin ignored for old OSD ids

2016-11-23 Thread Adrian Saul
To: Gregory Farnum > Cc: Adrian Saul; ceph-users@lists.ceph.com > Subject: Re: [EXTERNAL] Re: [ceph-users] osd set noin ignored for old OSD > ids > > From my experience noin doesn't stop new OSDs from being marked in. noin > only works on OSDs already in the crushmap. To accomplis

[ceph-users] osd set noin ignored for old OSD ids

2016-11-22 Thread Adrian Saul
Hi , As part of migration between hardware I have been building new OSDs and cleaning up old ones (osd rm osd.x, osd crush rm osd.x, auth del osd.x). To try and prevent rebalancing kicking in until all the new OSDs are created on a host I use "ceph osd set noin", however what I have seen

[ceph-users] Ceph outage - monitoring options

2016-11-21 Thread Adrian Saul
Hi All, We have a jewel cluster (10.2.1) that we built up in a POC state (2 clients also being mons, 12 SSD OSDs on 3 hosts, 20 SATA OSDs on 3 hosts). We have connected up our "prod" environment to it and performed a migration for all the OSDs so it is now 114 OSDs (36 SSD, 78 NL-SAS with

Re: [ceph-users] Snap delete performance impact

2016-09-23 Thread Adrian Saul
limit. Sent from my SAMSUNG Galaxy S7 on the Telstra Mobile Network Original message From: Nick Fisk <n...@fisk.me.uk> Date: 23/09/2016 7:26 PM (GMT+10:00) To: Adrian Saul <adrian.s...@tpgtelecom.com.au>, ceph-users@lists.ceph.com Subject: RE: Snap delete perfor

Re: [ceph-users] Snap delete performance impact

2016-09-23 Thread Adrian Saul
that much. Cheers, Adrian > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Adrian Saul > Sent: Thursday, 22 September 2016 7:15 PM > To: n...@fisk.me.uk; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Snap delete

Re: [ceph-users] Snap delete performance impact

2016-09-22 Thread Adrian Saul
of very small FS metadata updates going on and that is what is killing it. Cheers, Adrian > -Original Message- > From: Nick Fisk [mailto:n...@fisk.me.uk] > Sent: Thursday, 22 September 2016 7:06 PM > To: Adrian Saul; ceph-users@lists.ceph.com > Subject: RE: Snap delete per

Re: [ceph-users] Snap delete performance impact

2016-09-21 Thread Adrian Saul
? It really should not make the entire platform unusable for 10 minutes. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Adrian Saul > Sent: Wednesday, 6 July 2016 3:41 PM > To: 'ceph-users@lists.ceph.com' > Subject: [ceph-

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-14 Thread Adrian Saul
> But shouldn't freezing the fs and doing a snapshot constitute a "clean > unmount" hence no need to recover on the next mount (of the snapshot) - > Ilya? It's what I thought as well, but XFS seems to want to attempt to replay the log regardless on mount and write to the device to do so. This

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-14 Thread Adrian Saul
I found I could ignore the XFS issues and just mount it with the appropriate options (below from my backup scripts): # # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot) # if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then

Re: [ceph-users] Lessons learned upgrading Hammer -> Jewel

2016-07-17 Thread Adrian Saul
I have SELinux disabled and it does the restorecon on /var/lib/ceph regardless from the RPM post upgrade scripts. In my case I chose to kill the restorecon processes to save outage time – it didn’t affect the upgrade package completion. From: ceph-users

Re: [ceph-users] Terrible RBD performance with Jewel

2016-07-14 Thread Adrian Saul
I would suggest caution with " filestore_odsync_write" - its fine on good SSDs, but on poor SSDs or spinning disks it will kill performance. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Somnath Roy Sent: Friday, 15 July 2016 3:12 AM To: Garg, Pankaj;

[ceph-users] Snap delete performance impact

2016-07-05 Thread Adrian Saul
I recently started a process of using rbd snapshots to setup a backup regime for a few file systems contained in RBD images. While this generally works well at the time of the snapshots there is a massive increase in latency (10ms to multiple seconds of rbd device latency) across the entire

[ceph-users] OSD out/down detection

2016-06-19 Thread Adrian Saul
Hi All, We have a Jewel (10.2.1) cluster on Centos 7 - I am using an elrepo 4.4.1 kernel on all machines and we have an issue where some of the machines hang - not sure if its hardware or OS but essentially the host including the console is unresponsive and can only be recovered with a

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-06 Thread Adrian Saul
Centos 7 - the ugrade was done simply with "yum update -y ceph" on each node one by one, so the package order would have been determined by yum. From: Jason Dillaman <jdill...@redhat.com> Sent: Monday, June 6, 2016 10:42 PM To: Adri

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul
at.com] > Sent: Monday, 6 June 2016 12:37 PM > To: Adrian Saul > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > > Odd -- sounds like you might have Jewel and Infernalis class objects and > OSDs intermixed. I would double-chec

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul
they are failing. > -Original Message- > From: Adrian Saul > Sent: Monday, 6 June 2016 12:29 PM > To: Adrian Saul; dilla...@redhat.com > Cc: ceph-users@lists.ceph.com > Subject: RE: [ceph-users] Jewel upgrade - rbd errors after upgrade > > > I have traced it back

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul
/rados-classes/libcls_rbd.so: undefined symbol: _ZN4ceph6buffer4list8iteratorC1EPS1_j Trying to figure out why that is the case. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Adrian Saul > Sent: Monday, 6 June 2016 11:11 AM

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul
]# rados stat -p glebe-sata rbd_directory glebe-sata/rbd_directory mtime 2016-06-06 10:18:28.00, size 0 > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Adrian Saul > Sent: Monday, 6 June 2016 11:11 AM > To: dilla...@

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul
- > From: Jason Dillaman [mailto:jdill...@redhat.com] > Sent: Monday, 6 June 2016 11:00 AM > To: Adrian Saul > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > > Are you able to successfully run the following command successf

[ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul
I upgraded my Infernalis semi-production cluster to Jewel on Friday. While the upgrade went through smoothly (aside from a time wasting restorecon /var/lib/ceph in the selinux package upgrade) and the services continued running without interruption. However this morning when I went to create

Re: [ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread Adrian Saul
> > For two links it should be quite good - it seemed to balance across > > that quite well, but with 4 links it seemed to really prefer 2 in my case. > > > Just for the record, did you also change the LACP policies on the switches? > > From what I gather, having fancy pants L3+4 hashing on the

Re: [ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread Adrian Saul
I am currently running our Ceph POC environment using dual Nexus 9372TX 10G-T switches, each OSD host has two connections to each switch and they are formed into a single 4 link VPC (MC-LAG), which is bonded under LACP on the host side. What I have noticed is that the various hashing policies

Re: [ceph-users] Fwd: [Ceph-community] Wasting the Storage capacity when using Ceph based On high-end storage systems

2016-06-01 Thread Adrian Saul
Also if for political reasons you need a “vendor” solution – ask Dell about their DSS 7000 servers – 90 8TB disks and two compute nodes in 4RU would go a long way to making up a multi-PB Ceph solution. Supermicro also do a similar solution with some 36, 60 and 90 disk in 4RU models. Cisco

Re: [ceph-users] seqwrite gets good performance but random rw gets worse

2016-05-25 Thread Adrian Saul
Sync will always be lower – it will cause it to wait for previous writes to complete before issuing more so it will effectively throttle writes to a queue depth of 1. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ken Peng Sent: Wednesday, 25 May 2016 6:36 PM To:

Re: [ceph-users] seqwrite gets good performance but random rw gets worse

2016-05-25 Thread Adrian Saul
Are you using image-format 2 RBD images? We found a major performance hit using format 2 images under 10.2.0 today in some testing. When we switched to using format 1 images we literally got 10x random write IOPS performance (1600 IOPs up to 3 IOPS for the same test). From: ceph-users

Re: [ceph-users] RBD removal issue

2016-05-23 Thread Adrian Saul
Thanks - all sorted. > -Original Message- > From: Nick Fisk [mailto:n...@fisk.me.uk] > Sent: Monday, 23 May 2016 6:58 PM > To: Adrian Saul; ceph-users@lists.ceph.com > Subject: RE: RBD removal issue > > See here: > > http://cephnotes.ksperis.com/blog/2014

[ceph-users] RBD removal issue

2016-05-23 Thread Adrian Saul
A while back I attempted to create an RBD volume manually - intending it to be an exact size of another LUN around 100G. The command line instead took this to be the default MB argument for size and so I ended up with a 102400 TB volume. Deletion was painfully slow (I never used the volume,

Re: [ceph-users] NVRAM cards as OSD journals

2016-05-22 Thread Adrian Saul
I am using Intel P3700DC 400G cards in a similar configuration (two per host) - perhaps you could look at cards of that capacity to meet your needs. I would suggest having such small journals would mean you will be constantly blocking on journal flushes which will impact write performance and

Re: [ceph-users] fibre channel as ceph storage interconnect

2016-04-22 Thread Adrian Saul
> from the responses I've gotten, it looks like there's no viable option to use > fibre channel as an interconnect between the nodes of the cluster. > Would it be worth while development effort to establish a block protocol > between the nodes so that something like fibre channel could be used to

Re: [ceph-users] fibre channel as ceph storage interconnect

2016-04-21 Thread Adrian Saul
I could only see it being done using FCIP as the OSD processes use IP to communicate. I guess it would depend on why you are looking to use something like FC instead of Ethernet or IB. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >

Re: [ceph-users] Mon placement over wide area

2016-04-12 Thread Adrian Saul
om: Maxime Guyot [mailto:maxime.gu...@elits.com] > Sent: Tuesday, 12 April 2016 5:49 PM > To: Adrian Saul; Christian Balzer; 'ceph-users@lists.ceph.com' > Subject: Re: [ceph-users] Mon placement over wide area > > Hi Adrian, > > Looking at the documentation RadosGW has multi region s

Re: [ceph-users] Mon placement over wide area

2016-04-11 Thread Adrian Saul
Hello again Christian :) > > We are close to being given approval to deploy a 3.5PB Ceph cluster that > > will be distributed over every major capital in Australia.The config > > will be dual sites in each city that will be coupled as HA pairs - 12 > > sites in total. The vast majority of

[ceph-users] Mon placement over wide area

2016-04-11 Thread Adrian Saul
We are close to being given approval to deploy a 3.5PB Ceph cluster that will be distributed over every major capital in Australia.The config will be dual sites in each city that will be coupled as HA pairs - 12 sites in total. The vast majority of CRUSH rules will place data either

Re: [ceph-users] Ceph.conf

2016-03-30 Thread Adrian Saul
It is the monitors that ceph clients/daemons can connect to initially to connect with the cluster. Once they connect to one of the initial mons they will get a full list of all monitors and be able to connect to any of them to pull updated maps. From: ceph-users

[ceph-users] OSD crash after conversion to bluestore

2016-03-30 Thread Adrian Saul
I upgraded my lab cluster to 10.1.0 specifically to test out bluestore and see what latency difference it makes. I was able to one by one zap and recreate my OSDs to bluestore and rebalance the cluster (the change to having new OSDs start with low weight threw me at first, but once I worked

Re: [ceph-users] Ceph RBD latencies

2016-03-06 Thread Adrian Saul
> >The Samsungs are the 850 2TB > > (MZ-75E2T0BW). Chosen primarily on price. > > These are spec'ed at 150TBW, or an amazingly low 0.04 DWPD (over 5 years). > Unless you have a read-only cluster, you will wind up spending MORE on > replacing them (and/or loosing data when 2 fail at the same time)

Re: [ceph-users] Ceph RBD latencies

2016-03-03 Thread Adrian Saul
> Samsung EVO... > Which exact model, I presume this is not a DC one? > > If you had put your journals on those, you would already be pulling your hairs > out due to abysmal performance. > > Also with Evo ones, I'd be worried about endurance. No, I am using the P3700DCs for journals. The