[ceph-users] Tiering to object storage

2015-04-20 Thread Blair Bethwaite
Hi all, I understand the present pool tiering infrastructure is intended to work for 2 layers? We're presently considering backup strategies for large pools and wondered how much of a stretch it would be to have a base tier sitting in e.g. an S3 store... I imagine a pg in the base+1 tier mapping

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-20 Thread Colin Corr
On 04/20/2015 01:46 PM, Robert LeBlanc wrote: On Mon, Apr 20, 2015 at 2:34 PM, Colin Corr co...@pc-doctor.com mailto:co...@pc-doctor.com wrote: On 04/20/2015 11:02 AM, Robert LeBlanc wrote: We have a similar issue, but we wanted three copies across two racks. Turns out,

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-20 Thread Robert LeBlanc
You usually won't end up with more than the size number of replicas, even in a failure situation. Although technically more than size number of OSDs may have the data (if the OSD comes back in service, the journal may be used to quickly get the OSD back up to speed), these would not be active.

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-20 Thread Colin Corr
On 04/20/2015 11:02 AM, Robert LeBlanc wrote: We have a similar issue, but we wanted three copies across two racks. Turns out, that we increased size to 4 and left min_size at 2. We didn't want to risk having less than two copies and if we only had thee copies, losing a rack would block

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-20 Thread Robert LeBlanc
On Mon, Apr 20, 2015 at 2:34 PM, Colin Corr co...@pc-doctor.com wrote: On 04/20/2015 11:02 AM, Robert LeBlanc wrote: We have a similar issue, but we wanted three copies across two racks. Turns out, that we increased size to 4 and left min_size at 2. We didn't want to risk having less than

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-20 Thread Christian Balzer
On Mon, 20 Apr 2015 13:17:18 -0400 J-P Methot wrote: On 4/20/2015 11:01 AM, Christian Balzer wrote: Hello, On Mon, 20 Apr 2015 10:30:41 -0400 J-P Methot wrote: Hi, This is similar to another thread running right now, but since our current setup is completely different from the

[ceph-users] CephFS development since Firefly

2015-04-20 Thread Gregory Farnum
We’ve been hard at work on CephFS over the last year since Firefly was released, and with Hammer coming out it seemed like a good time to go over some of the big developments users will find interesting. Much of this is cribbed from John’s Linux Vault talk

[ceph-users] Online Ceph Tech Talk - This Thursday

2015-04-20 Thread Patrick McGarry
Hey Cephers, Just to remind you, our monthly Online Ceph Tech Talk is coming up again this Thursday at 1p EDT via the BlueJeans tool. We will be recording it and publishing to YouTube for those who can't make it, but if you'd like to ask questions make sure you are there! This month we'll be

[ceph-users] Is CephFS ready for production?

2015-04-20 Thread Ray Sun
Cephers, Many people told me ceph is ready for production except the cephFS, is this true? Any why it is? Can any one explain this to me? Thanks a lot. Best Regards -- Ray ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] CephFS development since Firefly

2015-04-20 Thread Robert LeBlanc
Thanks for all your hard work on CephFS. This progress is very exciting to hear about. I am constantly amazed at the amount of work that gets done in Ceph in so short an amount of time. On Mon, Apr 20, 2015 at 6:26 PM, Gregory Farnum gfar...@redhat.com wrote: We’ve been hard at work on CephFS

[ceph-users] Single OSD down

2015-04-20 Thread Quenten Grasso
Hi All, Ceph: 80.9 (as of Monday 13th) previously 80.8 OS: Ubuntu 12.04, 3.13.0-44-generic #73~precise1-Ubuntu SMP Wed Dec 17 00:39:15 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Servers: DELL R515 1 x 2.7GHZ 6C AMD CPU w/ 32GB RAM 10 x 3TB OSD's w/ 2x Intel DCS3700 100GB Journals (5 OSD per SSD)

Re: [ceph-users] Ceph.com

2015-04-20 Thread Ferber, Dan
Thanks Kurt. I did not use the --repo-url, which I should have in retrospect, but instead I had edited install.py and changed all occurrences of ceph.com to eu.ceph.com And Wido, Kurt has your answers then (see below) as to the one file that was missing for him. Dan Dan Ferber Software

[ceph-users] RBD volume to PG mapping

2015-04-20 Thread Межов Игорь Александрович
Hi! In case of scub error we get some PGs in inconsistent state. What is a best method to check, what RBD volumes are mapped into this inconsistent PG? Now we invent a long and not easy way to to this: - from 'ceph health detail' we take PGnums in inconsistent state - we check logs for

Re: [ceph-users] hammer (0.94.1) - still getting feature set mismatch for cephfs mount requests

2015-04-20 Thread Ilya Dryomov
On Mon, Apr 20, 2015 at 2:10 PM, Nikola Ciprich nikola.cipr...@linuxbox.cz wrote: Hello Ilya, Have you set your crush tunables to hammer? I've set crush tunables to optimal (therefore I guess they got set to hammer). Your crushmap has straw2 buckets (alg straw2). That's going to be

Re: [ceph-users] RADOS Bench slow write speed

2015-04-20 Thread Kris Gillespie
Hi Pedro, Without knowing much about your actual Ceph config file setup (ceph.conf) or any other factors (pool/replication setup) I'd say you're probably suffering due to the journal sitting on your OSDs. As in, you made the OSDs and didn't specify a SSD (or other disk) as the journal

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Nick Fisk
If possible, it might be worth trying an EXT4 formatted RBD. I've had problems with XFS hanging in the past on simple LVM volumes and never really got to the bottom of it, whereas the same volumes formatted with EXT4 has been running for years without a problem. -Original Message-

Re: [ceph-users] What is a dirty object

2015-04-20 Thread Francois Lafont
Hi, John Spray wrote: As far as I can see, this is only meaningful for cache pools, and object is dirty in the sense of having been created or modified since their its last flush. For a non-cache-tier pool, everything is logically dirty since it is never flushed. I hadn't noticed

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Christian Eichelmann
I'm using xfs on the rbd disks. They are between 1 and 10TB in size. Am 20.04.2015 um 14:32 schrieb Nick Fisk: Ah ok, good point What FS are you using on the RBD? -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Eichelmann

Re: [ceph-users] hammer (0.94.1) - still getting feature set mismatch for cephfs mount requests

2015-04-20 Thread Nikola Ciprich
Your crushmap has straw2 buckets (alg straw2). That's going to be supported in 4.1 kernel - when 3.18 was released none of the straw2 stuff existed. I see.. maybe this is a bit too radical setting for optimal preset? Well, it depends on how you look at it. Generally optimal is

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Christian Eichelmann
Hi Nick, I forgot to mention that I was also trying a workaround using the userland (rbd-fuse). The behaviour was exactly the same (worked fine for several hours, testing parallel reading and writing, then IO Wait and system load increased). This is why I don't think it is an issue with the rbd

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Nick Fisk
Ah ok, good point What FS are you using on the RBD? -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Eichelmann Sent: 20 April 2015 13:16 To: Nick Fisk; ceph-users@lists.ceph.com Subject: Re: [ceph-users] 100% IO Wait with CEPH

[ceph-users] hammer (0.94.1) - still getting feature set mismatch for cephfs mount requests

2015-04-20 Thread Nikola Ciprich
Hello, I'm quite new to ceph, so please forgive my ignorance. Yesterday, I've deployed small test cluster (3 nodes, 2 SATA + 1 SSD OSD / node) I enabled MDS server and created cephfs data + metadata pools and created filesystem. However upon mount requests, I'm getting following error: [Apr20

Re: [ceph-users] OSDs failing on upgrade from Giant to Hammer

2015-04-20 Thread Scott Laird
They're kind of big; here are links: https://dl.dropboxusercontent.com/u/104949139/osdmap https://dl.dropboxusercontent.com/u/104949139/ceph-osd.36.log On Sun, Apr 19, 2015 at 8:42 PM Samuel Just sj...@redhat.com wrote: I have a suspicion about what caused this. Can you restart one of the

[ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Christian Eichelmann
Hi Ceph-Users! We currently have a problem where I am not sure if the it has it's cause in Ceph or something else. First, some information about our ceph-setup: * ceph version 0.87.1 * 5 MON * 12 OSD with 60x2TB each * 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1, Debian

Re: [ceph-users] hammer (0.94.1) - still getting feature set mismatch for cephfs mount requests

2015-04-20 Thread Nikola Ciprich
Hello Ilya, Have you set your crush tunables to hammer? I've set crush tunables to optimal (therefore I guess they got set to hammer). Your crushmap has straw2 buckets (alg straw2). That's going to be supported in 4.1 kernel - when 3.18 was released none of the straw2 stuff existed. I

Re: [ceph-users] RADOS Bench slow write speed

2015-04-20 Thread Alexandre DERUMIER
Hi, for writes, ceph write twice to the disk, 1 for journal 1 for datas. (so half write bandwith) and journal is writen with O_DSYNC (you should test your disk with fio --sync=1 to compare). That's why the recommandation is to use ssd for journal disks. - Mail original - De:

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Nick Fisk
Hi Christian, A very non-technical answer but as the problem seems related to the RBD client it might be worth trying the latest Kernel if possible. The RBD client is Kernel based and so there may be a fix which might stop this from happening. Nick -Original Message- From: ceph-users

Re: [ceph-users] What is a dirty object

2015-04-20 Thread John Spray
On 19/04/2015 05:33, Francois Lafont wrote: If I understand well, all objects in the cluster are dirty. Is it normal? What is a dirty object? As far as I can see, this is only meaningful for cache pools, and object is dirty in the sense of having been created or modified since their its last

[ceph-users] RADOS Bench slow write speed

2015-04-20 Thread Pedro Miranda
Hi all!! I'm setting up a Ceph (version 0.80.6) cluster and I'mbenchmarking the infrastructure and Ceph itself. I've got 3 rack servers (Dell R630) each with it's own disks in enclosures. The cluster network bandwidth is of 10Gbps, the bandwidth between the RAID controller (Dell H830) and

Re: [ceph-users] hammer (0.94.1) - still getting feature set mismatch for cephfs mount requests

2015-04-20 Thread Ilya Dryomov
On Mon, Apr 20, 2015 at 1:33 PM, Nikola Ciprich nikola.cipr...@linuxbox.cz wrote: Hello, I'm quite new to ceph, so please forgive my ignorance. Yesterday, I've deployed small test cluster (3 nodes, 2 SATA + 1 SSD OSD / node) I enabled MDS server and created cephfs data + metadata pools and

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-20 Thread Barclay Jameson
Are your journals on separate disks? What is your ratio of journal disks to data disks? Are you doing replication size 3 ? On Mon, Apr 20, 2015 at 9:30 AM, J-P Methot jpmet...@gtcomm.net wrote: Hi, This is similar to another thread running right now, but since our current setup is completely

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-20 Thread J-P Methot
My journals are on-disk, each disk being a SSD. The reason I didn't go with dedicated drives for journals is that when designing the setup, I was told that having dedicated journal SSDs on a full-SSD setup would not give me performance increases. So that makes the journal disk to data disk

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-20 Thread Mark Nelson
The big question is how fast these drives can do O_DSYNC writes. The basic gist of this is that for every write to the journal, an ATA_CMD_FLUSH call is made to ensure that the device (or potentially the controller) know that this data really needs to be stored safely before the flush is

[ceph-users] hammer (0.94.1) - image must support layering(38) Function not implemented on v2 image

2015-04-20 Thread Nikola Ciprich
4096M 2 template-win2k8-20150420 40960M 2 template-win2k8-20150420@snap 40960M 2 [root@vfnphav1a ~]# rbd snap protect ssd2r/template-win2k8-20150420@snap rbd: protecting snap failed: 2015-04-20 16:47:31.587489 7f5e9e4fa760 -1 librbd

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Dan van der Ster
Hi, This is similar to what you would observe if you hit the ulimit on open files/sockets in a Ceph client. Though that normally only affects clients in user mode, not the kernel. What are the ulimits of your rbd-fuse client? Also, you could increase the client logging debug levels to see why the

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Onur BEKTAS
Hi, Check xfs fregmentation factor for rbd disks i.e. xfs_db -c frag -r /dev/sdX if it is high, try defrag xfs_fsr /dev/sdX Regards, Onur. On 4/20/2015 4:41 PM, Nick Fisk wrote: If possible, it might be worth trying an EXT4 formatted RBD. I've had problems with XFS hanging in the past

[ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-20 Thread J-P Methot
Hi, This is similar to another thread running right now, but since our current setup is completely different from the one described in the other thread, I thought it may be better to start a new one. We are running Ceph Firefly 0.80.8 (soon to be upgraded to 0.80.9). We have 6 OSD hosts

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-20 Thread Alexandre DERUMIER
Hi, I'm currently benching full ssd setup (don't have finished yet), but with 4osd, ssd intel s3500, (replication x1), with randwrite 4M, I'm around 550MB/S with random 4K, i'm around 4iops (1iops by osd, limit is the disk write o_dsync speed) This is with hammer. - Mail

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-20 Thread Christian Balzer
Hello, On Mon, 20 Apr 2015 10:30:41 -0400 J-P Methot wrote: Hi, This is similar to another thread running right now, but since our current setup is completely different from the one described in the other thread, I thought it may be better to start a new one. We are running Ceph

Re: [ceph-users] hammer (0.94.1) - image must support layering(38) Function not implemented on v2 image

2015-04-20 Thread Jason Dillaman
Can you please run 'rbd info' on template-win2k8-20150420 and template-win2k8-20150420@snap? I just want to verify which RBD features are currently enabled on your images. Have you overridden the value of rbd_default_features in your ceph.conf? Did you use the new rbd CLI option '--image

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-20 Thread Robert LeBlanc
We have a similar issue, but we wanted three copies across two racks. Turns out, that we increased size to 4 and left min_size at 2. We didn't want to risk having less than two copies and if we only had thee copies, losing a rack would block I/O. Once we expand to a third rack, we will adjust our

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-20 Thread Gregory Farnum
On Mon, Apr 20, 2015 at 11:17 AM, Dan van der Ster d...@vanderster.com wrote: I haven't tried, but wouldn't something like this work: step take default step chooseleaf firstn 2 type host step emit step take default step chooseleaf firstn -2 type osd step emit We use something like that

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-20 Thread Barclay Jameson
Using rados benchmark. It's just a test pool anyway. I will stick with my current OSD setup (16HDDs 4 SSDs for a 1:4 ration of SSD to HDD). I can get 800 MB/s write and about 1GB read. On Mon, Apr 20, 2015 at 11:19 AM, Mark Nelson mnel...@redhat.com wrote: How are you measuring the 300MB/s and

[ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-20 Thread Colin Corr
Greetings Cephers, I have hit a bit of a wall between the available documentation and my understanding of it with regards to CRUSH rules. I am trying to determine if it is possible to replicate 3 copies across 2 hosts, such that if one host is completely lost, at least 1 copy is available. The

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-20 Thread Gregory Farnum
On Mon, Apr 20, 2015 at 10:46 AM, Colin Corr co...@pc-doctor.com wrote: Greetings Cephers, I have hit a bit of a wall between the available documentation and my understanding of it with regards to CRUSH rules. I am trying to determine if it is possible to replicate 3 copies across 2 hosts,

Re: [ceph-users] hammer (0.94.1) - image must support layering(38) Function not implemented on v2 image

2015-04-20 Thread Nikola Ciprich
Hello Jason, On Mon, Apr 20, 2015 at 01:48:14PM -0400, Jason Dillaman wrote: Can you please run 'rbd info' on template-win2k8-20150420 and template-win2k8-20150420@snap? I just want to verify which RBD features are currently enabled on your images. Have you overridden the value

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-20 Thread Dan van der Ster
I haven't tried, but wouldn't something like this work: step take default step chooseleaf firstn 2 type host step emit step take default step chooseleaf firstn -2 type osd step emit We use something like that for an asymmetric multi-room rule. Cheers, Dan On Apr 20, 2015 20:02, Robert LeBlanc

[ceph-users] Is it possible to reinitialize the cluster

2015-04-20 Thread 10 minus
Hi , i have an issue with my ceph cluster were two nodes wereby accident and have been recreated. ceph osd tree # idweight type name up/down reweight -1 14.56 root default -6 14.56 datacenter dc1 -7 14.56 row row1 -9 14.56 rack

Re: [ceph-users] What is a dirty object

2015-04-20 Thread Craig Lewis
On Mon, Apr 20, 2015 at 3:38 AM, John Spray john.sp...@redhat.com wrote: I hadn't noticed that we presented this as nonzero for regular pools before, it is a bit weird. Perhaps we should show zero here instead for non-cache-tier pools. I have always planned to add a cold EC tier later,

Re: [ceph-users] hammer (0.94.1) - image must support layering(38) Function not implemented on v2 image

2015-04-20 Thread Nikola Ciprich
@vfnphav1a ~]# rbd ls -l ssd2r NAMESIZE PARENT FMT PROT LOCK fio_test 4096M 2 template-win2k8-20150420 40960M 2 template-win2k8-20150420@snap 40960M 2 [root@vfnphav1a ~]# rbd snap

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-20 Thread Barclay Jameson
I have a SSD pool for testing (only 8 Drives) but when I do a 1 SSD with journal and 1 SSD with Data I get 300 MB/s write. When I change all 8 Disks to house the journal I get 184MB/s write. On Mon, Apr 20, 2015 at 10:16 AM, Mark Nelson mnel...@redhat.com wrote: The big question is how fast

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-20 Thread Mark Nelson
How are you measuring the 300MB/s and 184MB/s? IE is it per drive, or the client throughput? Also what controller do you have? We've seen some controllers from certain manufacturers start to top out at around 1-2GB/s with write cache enabled. Mark On 04/20/2015 11:15 AM, Barclay Jameson

Re: [ceph-users] hammer (0.94.1) - image must support layering(38) Function not implemented on v2 image

2015-04-20 Thread Jason Dillaman
snapshot I'm getting following error: [root@vfnphav1a ~]# rbd ls -l ssd2r NAMESIZE PARENT FMT PROT LOCK fio_test 4096M 2 template-win2k8-20150420 40960M 2 template-win2k8-20150420@snap 40960M 2

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-20 Thread Dan van der Ster
On Apr 20, 2015 20:22, Gregory Farnum g...@gregs42.com wrote: On Mon, Apr 20, 2015 at 11:17 AM, Dan van der Ster d...@vanderster.com wrote: I haven't tried, but wouldn't something like this work: step take default step chooseleaf firstn 2 type host step emit step take default step