Re: [ceph-users] PG num calculator live on Ceph.com

2015-01-07 Thread Sanders, Bill
This is interesting. Kudos to you guys for getting the calculator up, I think this'll help some folks. I have 1 pool, 40 OSDs, and replica of 3. I based my PG count on: http://ceph.com/docs/master/rados/operations/placement-groups/ ''' Less than 5 OSDs set pg_num to 128 Between 5 and 10 OSDs

[ceph-users] Ceph Minimum Cluster Install (ARM)

2015-01-07 Thread Garg, Pankaj
Hi, I am trying to get a very minimal Ceph cluster up and running (on ARM) and I'm wondering what is the smallest unit that I can run rados-bench on ? Documentation at (http://ceph.com/docs/next/start/quick-ceph-deploy/) seems to refer to 4 different nodes. Admin Node, Monitor Node and 2 OSD

Re: [ceph-users] PG num calculator live on Ceph.com

2015-01-07 Thread Sanders, Bill
Excellent, thanks for the detailed breakdown. Take care, Bill From: Michael J. Kidd [michael.k...@inktank.com] Sent: Wednesday, January 07, 2015 4:50 PM To: Sanders, Bill Cc: Loic Dachary; ceph-us...@ceph.com Subject: Re: [ceph-users] PG num calculator live on

Re: [ceph-users] PG num calculator live on Ceph.com

2015-01-07 Thread Mark Nelson
Hi Michael, Good job! It would be really useful to add in calculations to show the expected distribution and max deviation from the mean. I'm dredging this up from an old email I sent out a year ago, but if we treat this as a balls into bins problem ala Raab Steger:

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-07 Thread Craig Lewis
On Mon, Dec 29, 2014 at 4:49 PM, Alexandre Oliva ol...@gnu.org wrote: However, I suspect that temporarily setting min size to a lower number could be enough for the PGs to recover. If ceph osd pool pool set min_size 1 doesn't get the PGs going, I suppose restarting at least one of the OSDs

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-07 Thread Christian Balzer
On Wed, 7 Jan 2015 17:07:46 -0800 Craig Lewis wrote: On Mon, Dec 29, 2014 at 4:49 PM, Alexandre Oliva ol...@gnu.org wrote: However, I suspect that temporarily setting min size to a lower number could be enough for the PGs to recover. If ceph osd pool pool set min_size 1 doesn't get the

Re: [ceph-users] Slow/Hung IOs

2015-01-07 Thread Christian Balzer
Hello, On Thu, 8 Jan 2015 00:17:11 + Sanders, Bill wrote: Thanks for your reply, Christian. Sorry for my delay in responding. The kernel logs are silent. Forgot to mention before that ntpd is running and the nodes are sync'd. I'm working on some folks for an updated kernel, but

Re: [ceph-users] Slow/Hung IOs

2015-01-07 Thread Sanders, Bill
Thanks for your reply, Christian. Sorry for my delay in responding. The kernel logs are silent. Forgot to mention before that ntpd is running and the nodes are sync'd. I'm working on some folks for an updated kernel, but I'm not holding my breath. That said, If I'm seeing this problem by

Re: [ceph-users] PG num calculator live on Ceph.com

2015-01-07 Thread Michael J. Kidd
Hello Bill, Either 2048 or 4096 should be acceptable. 4096 gives about a 300 PG per OSD ratio, which would leave room for tripling the OSD count without needing to increase the PG number. While 2048 gives about 150 PGs per OSD, not leaving room but for about a 50% OSD count expansion. The

Re: [ceph-users] PG num calculator live on Ceph.com

2015-01-07 Thread Christopher O'Connell
Ah, so I've been doing it wrong all this time (I thought we had to take the size multiple into account ourselves). Thanks! On Wed, Jan 7, 2015 at 4:25 PM, Michael J. Kidd michael.k...@inktank.com wrote: Hello Christopher, Keep in mind that the PGs per OSD (and per pool) calculations take

Re: [ceph-users] PG num calculator live on Ceph.com

2015-01-07 Thread Loic Dachary
On 07/01/2015 23:08, Michael J. Kidd wrote: Hello all, Just a quick heads up that we now have a PG calculator to help determine the proper PG per pool numbers to achieve a target PG per OSD ratio. http://ceph.com/pgcalc Please check it out! Happy to answer any questions, and always

[ceph-users] PG num calculator live on Ceph.com

2015-01-07 Thread Michael J. Kidd
Hello all, Just a quick heads up that we now have a PG calculator to help determine the proper PG per pool numbers to achieve a target PG per OSD ratio. http://ceph.com/pgcalc Please check it out! Happy to answer any questions, and always welcome any feedback on the tool / verbiage, etc...

Re: [ceph-users] PG num calculator live on Ceph.com

2015-01-07 Thread Christopher O'Connell
Hi, Im playing with this with a modest sized ceph cluster (36x6TB disks). Based on this it says that small pools (such as .users) would have just 16 PGs. Is this correct? I've historically always made even these small pools have at least as many PGs as the next power of 2 over my number of OSDs

Re: [ceph-users] PG num calculator live on Ceph.com

2015-01-07 Thread Michael J. Kidd
Hello Christopher, Keep in mind that the PGs per OSD (and per pool) calculations take into account the replica count ( pool size= parameter ). So, for example.. if you're using a default of 3 replicas.. 16 * 3 = 48 PGs which allows for at least one PG per OSD on that pool. Even with a size=2,

Re: [ceph-users] PG num calculator live on Ceph.com

2015-01-07 Thread Michael J. Kidd
Where is the source ? On the page.. :) It does link out to jquery and jquery-ui, but all the custom bits are embedded in the HTML. Glad it's helpful :) Michael J. Kidd Sr. Storage Consultant Inktank Professional Services - by Red Hat On Wed, Jan 7, 2015 at 3:46 PM, Loic Dachary

Re: [ceph-users] rbd directory listing performance issues

2015-01-07 Thread Christian Balzer
Hello, On Tue, 6 Jan 2015 15:29:50 + Shain Miley wrote: Hello, We currently have a 12 node (3 monitor+9 OSD) ceph cluster, made up of 107 x 4TB drives formatted with xfs. The cluster is running ceph version 0.80.7: I assume journals on the same HDD then. How much memory per node?

Re: [ceph-users] rbd resize (shrink) taking forever and a day

2015-01-07 Thread Robert LeBlanc
Seems like a message bus would be nice. Each opener of an RBD could subscribe for messages on the bus for that RBD. Anytime the map is modified a message could be put on the bus to update the others. That opens up a whole other can of worms though. Robert LeBlanc Sent from a mobile device please

Re: [ceph-users] rbd directory listing performance issues

2015-01-07 Thread Robert LeBlanc
I think your free memory is just fine. If you have lots of data change (read/write) then I think it is just aging out your directory cache. If fast directory listing is important to you, you can always write a script to periodically read the directory listing so it stays in cache or use

Re: [ceph-users] v0.90 released

2015-01-07 Thread Alfredo Deza
On Sat, Dec 20, 2014 at 1:15 AM, Anthony Alba ascanio.al...@gmail.com wrote: Hi Sage, Has the repo metadata been regenerated? One of my reposync jobs can only see up to 0.89, using http://ceph.com/rpm-testing. It was generated but we somehow missed out on properly syncing it. You should now

[ceph-users] EC + RBD Possible?

2015-01-07 Thread deeepdish
Hello. I wasn’t able to obtain a clear answer in my googling and reading official Ceph docs if Erasure Coded pools are possible/supported for RBD access? The idea is to have block (cold) storage for archival purposes. I would access an RBD device and format it as EXT or XFS for block use.

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-07 Thread Lionel Bouton
On 12/30/14 16:36, Nico Schottelius wrote: Good evening, we also tried to rescue data *from* our old / broken pool by map'ing the rbd devices, mounting them on a host and rsync'ing away as much as possible. However, after some time rsync got completly stuck and eventually the host which

[ceph-users] Cache Tiering vs. OSD Journal

2015-01-07 Thread deeepdish
Hello. Quick question RE: cache tiering vs. OSD journals. As I understand it, SSD acceleration is possible at the pool or OSD level. When considering cache tiering, should I still put OSD journals on SSDs or should they be disabled altogether. Can a single SSD pool function as a

Re: [ceph-users] Archives haven't been updated since Dec 8?

2015-01-07 Thread Patrick McGarry
Looks like there was (is) a technical issue at Dreamhost that is being actively worked on. I put in a request to get mmarch run manually for now until the issue is resolved. You can always browse the posts in real time from the archive pages: http://lists.ceph.com/pipermail/ceph-users-ceph.com/

Re: [ceph-users] Ceph on Centos 7

2015-01-07 Thread Travis Rhoden
Hello, Can you give the link the exact instructions you followed? For CentOS7 (EL7) ceph-extras should not be necessary. The instructions at [1] do not have you enabled the ceph-extras repo. You will find that there are EL7 packages at [2]. I recently found a README that was incorrectly

Re: [ceph-users] [Ceph-community] Problem with Rados gateway

2015-01-07 Thread Patrick McGarry
This is probably more suited to the ceph-user list. Moving it there. Thanks. Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph On Wed, Jan 7, 2015 at 9:17 AM, Walter Valenti waltervale...@yahoo.it wrote:

Re: [ceph-users] Regarding Federated Gateways - Zone Sync Issues

2015-01-07 Thread Yehuda Sadeh
I created a ceph tracker issue: http://tracker.ceph.com/issues/10471 Thanks, Yehuda On Tue, Jan 6, 2015 at 10:19 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 07/01/15 17:43, hemant burman wrote: Hello Yehuda, The issue seem to be with the user data file for swift subser not

[ceph-users] ceph-deploy dependency errors on fc20 with firefly

2015-01-07 Thread Noah Watkins
I'm trying to install Firefly on an up-to-date FC20 box. I'm getting the following errors: [nwatkins@kyoto cluster]$ ../ceph-deploy/ceph-deploy install --release firefly kyoto [ceph_deploy.conf][DEBUG ] found configuration file at: /home/nwatkins/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked

Re: [ceph-users] Hanging VMs with Qemu + RBD

2015-01-07 Thread Nico Schottelius
Hello Achim, good to hear someone else running this setup. We have changed the number of backfills using ceph tell osd.\* injectargs '--osd-max-backfills 1' and it seems to work mostly in regards of issues when rebalancing. One unsolved problem we have is machines kernel panic'ing, when

Re: [ceph-users] Block and NAS Services for Non Linux OS

2015-01-07 Thread Steven Sim
Hello Eneko; Firstly, thanks for your comments! You mentioned that machines see a QEMU IDE/SCSI disk, they don't know whether its on ceph, NFS, local, LVM, ... so it works OK for any VM guest SO. But what if I want to CEPH cluster to serve a whole range of clients in the data center, ranging

Re: [ceph-users] rbd resize (shrink) taking forever and a day

2015-01-07 Thread Josh Durgin
On 01/06/2015 10:24 AM, Robert LeBlanc wrote: Can't this be done in parallel? If the OSD doesn't have an object then it is a noop and should be pretty quick. The number of outstanding operations can be limited to 100 or a 1000 which would provide a balance between speed and performance impact if

[ceph-users] Rebuilding Cluster from complete MON failure with existing OSDs

2015-01-07 Thread Dan Geist
Hi, I have a situation where I moved the interfaces over which my ceph-public network is connected (only the interfaces, not the IPs, etc.) this was done to increase available bandwidth, but it backfired catastrophically. My monitors all failed and somehow became corrupted, but I was unable to

[ceph-users] Erasure code pool overhead

2015-01-07 Thread Italo Santos
Hello, I’d like to know how can I calculate the overhead of a erasure pool? Regards. Italo Santos http://italosantos.com.br/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitors and read/write latency

2015-01-07 Thread Robert LeBlanc
Monitors are in charge of the CRUSH map. When ever there is a change to the CRUSH map, an OSD goes down, a new OSD is added, PGs are increased, etc, the monitor(s) builds a new CRUSH map and distributes it to all clients and OSDs. Once the client has the CRUSH map, it does not need to contact the

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-07 Thread Dan Van Der Ster
Hi Nico, Yes Ceph is production ready. Yes people are using it in production for qemu. Last time I heard, Ceph was surveyed as the most popular backend for OpenStack Cinder in production. When using RBD in production, it really is critically important to (a) use 3 replicas and (b) pay

Re: [ceph-users] Data recovery after RBD I/O error

2015-01-07 Thread Jérôme Poulin
On Mon, Jan 5, 2015 at 6:59 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote: Secondly, I would highly recommend not using ANY non-cluster-aware FS on top of a clustered block device like RBD For my use-case, this is just a single server using the RBD device. No clustering involved on the

[ceph-users] Monitors and read/write latency

2015-01-07 Thread Logan Barfield
Do monitors have any impact on read/write latencies? Everything I've read says no, but since a client needs to talk to a monitor before reading or writing to OSDs it would seem like that would introduce some overhead. I ask for two reasons: 1) We are currently using SSD based OSD nodes for our

Re: [ceph-users] rbd resize (shrink) taking forever and a day

2015-01-07 Thread Josh Durgin
On 01/06/2015 04:45 PM, Robert LeBlanc wrote: Seems like a message bus would be nice. Each opener of an RBD could subscribe for messages on the bus for that RBD. Anytime the map is modified a message could be put on the bus to update the others. That opens up a whole other can of worms though.

Re: [ceph-users] Different disk usage on different OSDs

2015-01-07 Thread Christian Balzer
On Wed, 7 Jan 2015 00:54:13 +0900 Christian Balzer wrote: On Tue, 6 Jan 2015 19:28:44 +0400 ivan babrou wrote: Restarting OSD fixed PGs that were stuck: http://i.imgur.com/qd5vuzV.png Good to hear that. Funny (not really) how often restarting OSDs fixes stuff like that. Still

[ceph-users] 回复: Re: rbd resize (shrink) taking forever and a day

2015-01-07 Thread Chen, Xiaoxi
it is already in parallel, the outstanding ops are limited to ~10 per client(tuneable),enlarge this may help. BUut pls note that there is no noop here, OSD has no idea wherher it has an object until it failed to find it in the disk, that means the op had almost traveled the code path.

Re: [ceph-users] Regarding Federated Gateways - Zone Sync Issues

2015-01-07 Thread Mark Kirkwood
On 07/01/15 16:22, Mark Kirkwood wrote: FWIW I can reproduce this too (ceph 0.90-663-ge1384af). The *user* replicates ok (complete with its swift keys and secret). I can authenticate to both zones ok using S3 api (boto version 2.29), but only to the master using swift (swift client versions

Re: [ceph-users] Erasure code pool overhead

2015-01-07 Thread Italo Santos
Thanks Nick. At. Italo Santos http://italosantos.com.br/ On Wednesday, January 7, 2015 at 18:44, Nick Fisk wrote: Hi Italo, =k/(k+m) Where k is data chunks and m is coding chunks. For example k=8 m=2 would give you =8/(8+2) .8 or 80% usable storage and 20% used for

Re: [ceph-users] ceph-deploy dependency errors on fc20 with firefly

2015-01-07 Thread Travis Rhoden
Hi Noah, I'll try to recreate this on a fresh FC20 install as well. Looks to me like there might be a repo priority issue. It's mixing packages from Fedora downstream repos and the ceph.com upstream repos. That's not supposed to happen. - Travis On Wed, Jan 7, 2015 at 2:15 PM, Noah Watkins

Re: [ceph-users] Regarding Federated Gateways - Zone Sync Issues

2015-01-07 Thread Mark Kirkwood
On 06/01/15 06:45, hemant burman wrote: One more thing Yehuda, In radosgw log in Slave Zone: 2015-01-05 17:22:42.188108 7fe4b66d2780 20 enqueued request req=0xbc1f50 2015-01-05 17:22:42.188125 7fe4b66d2780 20 RGWWQ: 2015-01-05 17:22:42.188126 7fe4b66d2780 20 req: 0xbc1f50 2015-01-05

Re: [ceph-users] Regarding Federated Gateways - Zone Sync Issues

2015-01-07 Thread Mark Kirkwood
On 07/01/15 17:43, hemant burman wrote: Hello Yehuda, The issue seem to be with the user data file for swift subser not getting synced properly. FWIW, I'm seeing exactly the same thing as well (Hermant - that was well spotted)! ___ ceph-users

[ceph-users] Fwd: Multi-site deployment RBD and Federated Gateways

2015-01-07 Thread Logan Barfield
Hello, I'm re-sending this message since I didn't see it picked up on the list archives yesterday. My apologies if it was received previously. We are currently running a single datacenter Ceph deployment. Our setup is as follows: - 4 HDD OSD nodes (primarily used for RadosGW/Object Storage) -

[ceph-users] pg repair unsuccessful

2015-01-07 Thread Jiri Kanicky
Hi, I have been experiencing issues with several PGs which remained in inconsistent state (I use BTRFS). ceph pg repair is not able to repair them. The only way I can delete the corresponding file, which is causing the issue (see logs bellow) from the OSDs. This however means loss of data.

[ceph-users] Undeleted objects - is there a garbage collector?

2015-01-07 Thread Max Power
Hi, my osd folder current has a size of ~360MB but I do not have any objects inside the corresponding pool; ceph status reports '8 bytes data'. Even with 'rados -p mypool ls --all' I do not see any objects. But there are a few current/12._head folders with files consuming disk space. How to

Re: [ceph-users] 回复: Re: rbd resize (shrink) taking forever and a day

2015-01-07 Thread Sage Weil
On Tue, 6 Jan 2015, Chen, Xiaoxi wrote: it is already in parallel, the outstanding ops are limited to ~10 per client(tuneable),enlarge this may help. BUut pls note that there is no noop here, OSD has no idea wherher it has an object until it failed to find it in the disk, that means the op

Re: [ceph-users] Regarding Federated Gateways - Zone Sync Issues

2015-01-07 Thread hemant burman
Hello Yehuda, The issue seem to be with the user data file for swift subser not getting synced properly. MasterZone: root@ceph-all:/var/local# ceph osd map .us-1-east-1.users.uid johndoe2 osdmap e796 pool '.us-1-east-1.users.uid' (286) object 'johndoe2' - pg 286.c384ed51 (286.51) - up [2]

Re: [ceph-users] librbd cache

2015-01-07 Thread Stuart Longland
Hi all, apologies for the slow reply. Been flat out lately and so any cluster work has been relegated to the back-burner. I'm only just starting to get back to it now. On 06/06/14 01:00, Sage Weil wrote: On Thu, 5 Jun 2014, Wido den Hollander wrote: On 06/05/2014 08:59 AM, Stuart Longland

Re: [ceph-users] rbd resize (shrink) taking forever and a day

2015-01-07 Thread Robert LeBlanc
The bitmap certainly sounds like it would help shortcut a lot of code that Xiaoxi mentions. Is the idea that the client caches the bitmap for the RBD so it know which OSDs to contact (thus saving a round trip to the OSD), or only for the OSD to know which objects exist on it's disk? On Tue, Jan

Re: [ceph-users] rbd resize (shrink) taking forever and a day

2015-01-07 Thread Josh Durgin
On 01/06/2015 04:19 PM, Robert LeBlanc wrote: The bitmap certainly sounds like it would help shortcut a lot of code that Xiaoxi mentions. Is the idea that the client caches the bitmap for the RBD so it know which OSDs to contact (thus saving a round trip to the OSD), or only for the OSD to know

Re: [ceph-users] Hanging VMs with Qemu + RBD

2015-01-07 Thread Achim Ledermüller
Hi, We have the same setup including OpenNebula 4.10.1. We had some backfilling due to node failures and node expansion. If we throttle osd_max_backfills there is not a problem at all. If the value for backfilling jobs is too high, we can see delayed reactions within the shell, eg. `ls -lh` needs

[ceph-users] Making objects available via FTP

2015-01-07 Thread Carlo Santos
Hi all, I'm wondering if it's possible to make the files in Ceph available via FTP by just configuring Ceph. If this is not possible, what are the typical steps on how to make the files available via FTP? Thanks! ___ ceph-users mailing list

Re: [ceph-users] Data recovery after RBD I/O error

2015-01-07 Thread Austin S Hemmelgarn
On 2015-01-04 15:26, Jérôme Poulin wrote: Happy holiday everyone, TL;DR: Hardware corruption is really bad, if btrfs-restore work, kernel Btrfs can! I'm cross-posting this message since the root cause for this problem is the Ceph RBD device however, my main concern is data loss from a BTRFS

[ceph-users] CEPH: question on journal placement

2015-01-07 Thread Marco Kuendig
Hello newbie on CEPH here.I do have three lab servers with CEPH. Each server got two 2 x 3TB SATA disks. Up to now I run 2 OSD per server and partitioned the 2 disks in to 4 partitions and had 2 OSD split over the 4 partitions. 1 Disk = 1 OSD = 2 partitions (data and journal). Now I

Re: [ceph-users] osd tree to show primary-affinity value

2015-01-07 Thread Mykola Golub
On Thu, Dec 25, 2014 at 03:57:15PM +1100, Dmitry Smirnov wrote: Please don't withhold this improvement -- go ahead and submit pull request to let developers decide whether they want this or not. IMHO it is a very useful improvement. Thank you very much for implementing it. Done.

Re: [ceph-users] cephfs usable or not?

2015-01-07 Thread Jiri Kanicky
Hi Max, Thanks for this info. I am planing to use CephFS (ceph version 0.87) at home, because its more convenient than NFS over RBD. I dont have large environment; about 20TB, so hopefully it will hold. I backup all important data just in case. :) Thank you. Jiri On 29/12/2014 21:09,

Re: [ceph-users] OSDs with btrfs are down

2015-01-07 Thread Dyweni - BTRFS
Hi, BTRFS crashed because the system ran out of memory... I see these entries in your logs: Jan 4 17:11:06 ceph1 kernel: [756636.535661] kworker/0:2: page allocation failure: order:1, mode:0x204020 Jan 4 17:11:06 ceph1 kernel: [756636.536112] BTRFS: error (device sdb1) in

[ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-07 Thread Nico Schottelius
Good evening, we also tried to rescue data *from* our old / broken pool by map'ing the rbd devices, mounting them on a host and rsync'ing away as much as possible. However, after some time rsync got completly stuck and eventually the host which mounted the rbd mapped devices decided to kernel

[ceph-users] Multi-site deployment RBD and Federated Gateways

2015-01-07 Thread Logan Barfield
Hello, We are currently running a single datacenter Ceph deployment. Our setup is as follows: - 4 HDD OSD nodes (primarily used for RadosGW/Object Storage) - 2 SSD OSD nodes (used for RBD/VM block devices) - 3 Monitor daemons running on 3 of the HDD OSD nodes - The CRUSH rules are set to push