Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up
Sounds to me like you've put yourself at too much risk - *if* I'm reading your message right about your configuration, you have multiple hosts accessing OSDs that are stored on a single shared box - so if that single shared box (single point of failure for multiple nodes) goes down it's possible for multiple replicas to disappear at the same time which could halt the operation of your cluster if the masters and the replicas are both on OSDs within that single shared storage system... On Thu, Jul 9, 2015 at 5:42 AM, Mallikarjun Biradar mallikarjuna.bira...@gmail.com wrote: Hi all, Setup details: Two storage enclosures each connected to 4 OSD nodes (Shared storage). Failure domain is Chassis (enclosure) level. Replication count is 2. Each host has allotted with 4 drives. I have active client IO running on cluster. (Random write profile with 4M block size 64 Queue depth). One of enclosure had power loss. So all OSD's from hosts that are connected to this enclosure went down as expected. But client IO got paused. After some time enclosure hosts connected to it came up. And all OSD's on that hosts came up. Till this time, cluster was not serving IO. Once all hosts OSD's pertaining to that enclosure came up, client IO resumed. Can anybody help me why cluster not serving IO during enclosure failure. OR its a bug? -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] The first infernalis dev release will be v9.0.0
So with this, will even numbers then be LTS? Since 9.0.0 is following 0.94.x/Hammer, and every other release is normally LTS, I'm guessing 10.x.x, 12.x.x, etc. will be LTS... On Tue, May 5, 2015 at 11:45 AM, Sage Weil sw...@redhat.com wrote: On Tue, 5 May 2015, Joao Eduardo Luis wrote: On 05/04/2015 05:09 PM, Sage Weil wrote: The first Ceph release back in Jan of 2008 was 0.1. That made sense at the time. We haven't revised the versioning scheme since then, however, and are now at 0.94.1 (first Hammer point release). To avoid reaching 0.99 (and 0.100 or 1.00?) we have a new strategy. This was discussed a bit on ceph-devel and in #ceph-devel and there doesn't appear to be any scheme that everyone likes. So, we're going to go with something that only a few people dislike: x.0.z - development releases (for early testers and the brave at heart) x.1.z - release candidates (for test clusters, brave users) x.2.z - stable/bugfix releases (for users) x will start at 9 for Infernalis (I is the 9th letter), making our first development release of the 9th release cycle 9.0.0. Subsequent development releases will be 9.0.1, 9.0.2, etc. In a couple months we'll have a 9.1.0 (and maybe 9.1.1) release candidate. A few weeks after that we'll have the Infernalis release 9.2.0, followed by stable bug fix updates 9.2.1, 9.2.2, etc., and then begin work on the Jewel (10.y.z) release. We'll see how this works out. We can adjust this in the future to any other 9.y.z scheme (e.g., 9.1, 9.2 etc dev releases and 9.8.z stable releases); the main commitment here is to the 9 part, indicating Infernalis is the 9th major release cycle. Looks sane! I'm guessing once 9.1.0 is frozen the dev cycles will move on to 10.0.1? Yep! Or 10.0.0 I guess since we just did 9.0.0. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Hammer question..
Hi all, I have a cluster currently on Giant - is Hammer stable/ready for production use? -Tony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Do I have enough pgs?
Hi all, I have a cluster of 3 nodes, 18 OSDs. I used the pgcalc to give a suggested number of PGs - here was my list: Group1 3 rep 18 OSDs 30% data 512PGs Group2 3 rep 18 OSDs 30% data 512PGs Group3 3 rep 18 OSDs 30% data 512PGs Group4 2 rep 18 OSDs 5% data 256PGs Group5 2 rep 18 OSDs 5% data 256PGs My estimated growth is to 27-36 OSDs within the next 18 months, after that probably pretty stagnant for the next several years. Thoughts? -Tony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread
I know I'm not even close to this type of a problem yet with my small cluster (both test and production clusters) - but it would be great if something like that could appear in the cluster HEALTHWARN, if Ceph could determine the amount of used processes and compare them against the current limit then throw a health warning if it gets within say 10 or 15% of the max value. That would be a really quick indicator for anyone who frequently checks the health status (like through a web portal) as they may see it more quickly then during their regular log check interval. Just a thought. -Tony On Mon, Mar 9, 2015 at 2:01 PM, Sage Weil s...@newdream.net wrote: On Mon, 9 Mar 2015, Karan Singh wrote: Thanks Guys kernel.pid_max=4194303 did the trick. Great to hear! Sorry we missed that you only had it at 65536. This is a really common problem that people hit when their clusters start to grow. Is there somewhere in the docs we can put this to catch more users? Or maybe a warning issued by the osds themselves or something if they see limits that are low? sage - Karan - On 09 Mar 2015, at 14:48, Christian Eichelmann christian.eichelm...@1und1.de wrote: Hi Karan, as you are actually writing in your own book, the problem is the sysctl setting kernel.pid_max. I've seen in your bug report that you were setting it to 65536, which is still to low for high density hardware. In our cluster, one OSD server has in an idle situation about 66.000 Threads (60 OSDs per Server). The number of threads increases when you increase the number of placement groups in the cluster, which I think has triggered your problem. Set the kernel.pid_max setting to 4194303 (the maximum) like Azad Aliyar suggested, and the problem should be gone. Regards, Christian Am 09.03.2015 11:41, schrieb Karan Singh: Hello Community need help to fix a long going Ceph problem. Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to restart OSD?s i am getting this error /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc http://Thread.cc: In function 'void Thread::create(size_t)' thread 7f760dac9700 time 2015-03-09 12:22:16.311970/ /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/ *Environment *: 4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5 , 3.17.2-1.el6.elrepo.x86_64 Tried upgrading from 0.80.7 to 0.80.8 but no Luck Tried centOS stock kernel 2.6.32 but no Luck Memory is not a problem more then 150+GB is free Did any one every faced this problem ?? *Cluster status * * * / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/ / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs incomplete; 1735 pgs peering; 8938 pgs stale; 1/ /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean; recovery 6061/31080 objects degraded (19/ /.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02, mon.pouta-s03/ / monmap e3: 3 mons at {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX .50.3:6789/ //0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03/ / * osdmap e26633: 239 osds: 85 up, 196 in*/ / pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects/ /4699 GB used, 707 TB / 711 TB avail/ /6061/31080 objects degraded (19.501%)/ / 14 down+remapped+peering/ / 39 active/ /3289 active+clean/ / 547 peering/ / 663 stale+down+peering/ / 705 stale+active+remapped/ / 1 active+degraded+remapped/ / 1 stale+down+incomplete/ / 484 down+peering/ / 455 active+remapped/ /3696 stale+active+degraded/ / 4 remapped+peering/ / 23 stale+down+remapped+peering/ / 51 stale+active/ /3637 active+degraded/ /3799 stale+active+clean/ *OSD : Logs * /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc http://Thread.cc: In function 'void Thread::create(size_t)' thread 7f760dac9700 time 2015-03-09 12:22:16.311970/ /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/ / / / ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)/ / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/ / 2:
[ceph-users] Question about rados bench
Hi all, In my reading on the net about various implementations of Ceph, I came across this website blog page (really doesn't give a lot of good information but caused me to wonder): http://avengermojo.blogspot.com/2014/12/cubieboard-cluster-ceph-test.html near the bottom, the person did a rados bench test. During the write phase, there were several areas where there was a 0 in the cur MB/s. I figure there must have been a bottleneck somewhere slowing down the operation where data wasn't getting written. Is something like that during a benchmark test something that one should be concerned about? Is there a good procedure for tracking down where the bottleneck is (like if it's a given OSD?) Is the data cached and just taking a long time to write or is it lost in an instance like that? -Tony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] New SSD Question
Hi all, After the previous thread, I'm doing my SSD shopping for and I came across an SSD called an Edge Boost Pro w/ Power Fail, it seems to have some impressive specs - in most places decent user reviews, in once place a poor one - I was wondering if anyone has had any experience with these drives with Ceph? Does it work well? Reliability issues? etc. Right now I'm looking at getting Intel DC S3700's, but the price on these Edge drives are pretty good for the 240G model, but almost TGTBT for the speed and power fail caps, so I didn't want to take a chance if they were really problematical as I'd rather just use a drive I know people have had quality success with. -Tony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD selection
On Sun, Mar 1, 2015 at 11:19 PM, Christian Balzer ch...@gol.com wrote: I'll be honest, the pricing on Intel's website is far from reality. I haven't been able to find any OEMs, and retail pricing on the 200GB 3610 is ~231 (the $300 must have been a different model in the line). Although $231 does add up real quick if I need to get 6 of them :( Using the google shopping (which isn't ideal, but for simplicities sake) search I see the 100GB DC S3700 from 170USD and the 160GB DC S3500 from 150USD, which are a pretty good match to the OEM price on the Intel site of 180 and 160 respectively. If I have to buy them personally, that'll work well. If I can get work to get them, then I kinda have to limit myself to whom we have marked as suppliers as it's a pain to get a new company in the mix. You really wouldn't want less than 200MB/s, even in your setup which I take to be 2Gb/s from what you wrote below. Note that the 100GB 3700 is going to perform way better and last immensely longer than the 160GB 3500 while being moderately more expensive, while the the 200GB 3610 is faster (IOPS), lasting 10 times long AND cheaper than the 240GB 3500. It is pretty much those numbers that made me use 4 100GB 3700s instead of 3500s (240GB), much more bang for the buck and it still did fit my budget and could deal with 80% of the network bandwidth. So the 3710's would be an ok solution? No, because they start from 200GB and with a 300USD price tag. The 3710s do not replace the 3700s, they extend the selection upwards (in size mostly). I thought I had corrected that - I was thinking the 3700's and typed 3710 :) I have seen the 3700s for right about $200, which although doesn't seem a lot cheaper, when getting 6, that does shave about $200 after shipping costs as well... See above, google shopping. The lowballer is Walmart, of all places: http://www.walmart.com/ip/26972768?wmlspartner=wlpaselectedSellerId=0 Guestimate the amount of data written to your cluster per day, break that down to the load a journal SSD will see and then multiply by at least 5 to be on the safe side. Then see which SSD will fit your expected usage pattern. Luckily I don't think there will be a ton of data per day written. The majority of servers whose VHDs will be stored in our cluster don't have a lot of frequent activity - aside from a few windows servers that have DBs servers in them (and even they don't write a ton of data per day really). Being able to put even a coarse number on this will tell you if you can skim on the endurance and have your cluster last like 5 years or if getting a higher endurance SSD is going to be cheaper. Any suggestions on how I can get a really accurate number on this? I mean, I could probably get some good numbers from the database servers in terms of their writes in a given day, but when it comes to other processes running in the background I'm not sure how much these might really affect this number. If you have existing servers that run linux and have been up for reasonably long time (months), iostat will give you a very good idea. No ideas about Windows, but I bet those stats exist someplace, too. I can't say months, but at least a month, maybe two - trying to remember when our last extended power outage was - I can find out later. For example a Ceph storage node, up 74 days with OS and journals on the first 4 drives and OSD HDDs on the other 8: Device:tpskB_read/skB_wrtn/skB_read kB_wrtn sda 9.8229.88 187.87 191341125 1203171718 sdb 9.7929.57 194.22 189367432 1243850846 sdc 9.7729.83 188.89 191061000 1209676622 sdd 8.7729.57 175.40 189399240 1123294410 sde 5.24 354.1955.68 2268306443 356604748 sdi 5.02 335.6163.60 2149338787 407307544 sdj 4.96 350.3352.43 2243590803 335751320 sdl 5.04 374.6248.49 2399170183 310559488 sdf 4.85 354.5250.43 2270401571 322947192 sdh 4.77 332.3850.60 2128622471 324065888 sdg 6.26 403.9765.42 2587109283 418931316 sdk 5.86 385.3655.61 2467921295 356120140 I do have some linux vms that have been up for a while, can't say how many months since the last extended power outage off hand (granted I know once I look at the uptime), but hopefully it will at least give me an idea. So it's 2x1Gb/s then? client side 2x1, cluster side, 3x1. So 500MB/s with trailing wind on a sunny day. Meaning that something that can do about 400MB/s will do nicely, as you're only even going to get near that when
Re: [ceph-users] SSD selection
Now, I've never setup a journal on a separate disk, I assume you have 4 partitions at 10GB / partition, I noticed in the docs they referred to 10 GB, as a good starter. Would it be better to have 4 partitions @ 10g ea or 4 @20? I know I'll take a speed hit, but unless I can get my work to buy the drives, they will have to sit with what my personal budget can afford and be willing to donate ;) -Tony On Sun, Mar 1, 2015 at 2:54 PM, Andrei Mikhailovsky and...@arhont.com wrote: I am not sure about the enterprise grade and underprovisioning, but for the Intel 520s i've got 240gbs (the speeds of 240 is a bit better than 120s). and i've left 50% underprovisioned. I've got 10GB for journals and I am using 4 osds per ssd. Andrei -- *From: *Tony Harris neth...@gmail.com *To: *Andrei Mikhailovsky and...@arhont.com *Cc: *ceph-users@lists.ceph.com, Christian Balzer ch...@gol.com *Sent: *Sunday, 1 March, 2015 8:49:56 PM *Subject: *Re: [ceph-users] SSD selection Ok, any size suggestion? Can I get a 120 and be ok? I see I can get DCS3500 120GB for within $120/drive so it's possible to get 6 of them... -Tony On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky and...@arhont.com wrote: I would not use a single ssd for 5 osds. I would recommend the 3-4 osds max per ssd or you will get the bottleneck on the ssd side. I've had a reasonable experience with Intel 520 ssds (which are not produced anymore). I've found Samsung 840 Pro to be horrible! Otherwise, it seems that everyone here recommends the DC3500 or DC3700 and it has the best wear per $ ratio out of all the drives. Andrei -- *From: *Tony Harris neth...@gmail.com *To: *Christian Balzer ch...@gol.com *Cc: *ceph-users@lists.ceph.com *Sent: *Sunday, 1 March, 2015 4:19:30 PM *Subject: *Re: [ceph-users] SSD selection Well, although I have 7 now per node, you make a good point and I'm in a position where I can either increase to 8 and split 4/4 and have 2 ssds, or reduce to 5 and use a single osd per node (the system is not in production yet). Do all the DC lines have caps in them or just the DC s line? -Tony On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer ch...@gol.com wrote: On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote: Hi all, I have a small cluster together and it's running fairly well (3 nodes, 21 osds). I'm looking to improve the write performance a bit though, which I was hoping that using SSDs for journals would do. But, I was wondering what people had as recommendations for SSDs to act as journal drives. If I read the docs on ceph.com correctly, I'll need 2 ssds per node (with 7 drives in each node, I think the recommendation was 1ssd per 4-5 drives?) so I'm looking for drives that will work well without breaking the bank for where I work (I'll probably have to purchase them myself and donate, so my budget is somewhat small). Any suggestions? I'd prefer one that can finish its write in a power outage case, the only one I know of off hand is the intel dcs3700 I think, but at $300 it's WAY above my affordability range. Firstly, an uneven number of OSDs (HDDs) per node will bite you in the proverbial behind down the road when combined with journal SSDs, as one of those SSDs will wear our faster than the other. Secondly, how many SSDs you need is basically a trade-off between price, performance, endurance and limiting failure impact. I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing the write paths and IOPS and failure domain, but not the sequential speed or cost. Depending on what your write load is and the expected lifetime of this cluster, you might be able to get away with DC S3500s or even better the new DC S3610s. Keep in mind that buying a cheap, low endurance SSD now might cost you more down the road if you have to replace it after a year (TBW/$). All the cheap alternatives to DC level SSDs tend to wear out too fast, have no powercaps and tend to have unpredictable (caused by garbage collection) and steadily decreasing performance. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD selection
On Sun, Mar 1, 2015 at 10:18 PM, Christian Balzer ch...@gol.com wrote: On Sun, 1 Mar 2015 21:26:16 -0600 Tony Harris wrote: On Sun, Mar 1, 2015 at 6:32 PM, Christian Balzer ch...@gol.com wrote: Again, penultimately you will need to sit down, compile and compare the numbers. Start with this: http://ark.intel.com/products/family/83425/Data-Center-SSDs Pay close attention to the 3610 SSDs, while slightly more expensive they offer 10 times the endurance. Unfortunately, $300 vs $100 isn't really slightly more expensive ;) Although I did notice that the 3710's can be gotten for ~210. I'm not sure where you get those prices from or what you're comparing with what but if you look at the OEM prices in the URL up there (which compare quite closely to what you can find when looking at shopping prices) a comparison with closely matched capabilities goes like this: http://ark.intel.com/compare/71913,86640,75680,75679 I'll be honest, the pricing on Intel's website is far from reality. I haven't been able to find any OEMs, and retail pricing on the 200GB 3610 is ~231 (the $300 must have been a different model in the line). Although $231 does add up real quick if I need to get 6 of them :( You really wouldn't want less than 200MB/s, even in your setup which I take to be 2Gb/s from what you wrote below. Note that the 100GB 3700 is going to perform way better and last immensely longer than the 160GB 3500 while being moderately more expensive, while the the 200GB 3610 is faster (IOPS), lasting 10 times long AND cheaper than the 240GB 3500. It is pretty much those numbers that made me use 4 100GB 3700s instead of 3500s (240GB), much more bang for the buck and it still did fit my budget and could deal with 80% of the network bandwidth. So the 3710's would be an ok solution? I have seen the 3700s for right about $200, which although doesn't seem a lot cheaper, when getting 6, that does shave about $200 after shipping costs as well... Guestimate the amount of data written to your cluster per day, break that down to the load a journal SSD will see and then multiply by at least 5 to be on the safe side. Then see which SSD will fit your expected usage pattern. Luckily I don't think there will be a ton of data per day written. The majority of servers whose VHDs will be stored in our cluster don't have a lot of frequent activity - aside from a few windows servers that have DBs servers in them (and even they don't write a ton of data per day really). Being able to put even a coarse number on this will tell you if you can skim on the endurance and have your cluster last like 5 years or if getting a higher endurance SSD is going to be cheaper. Any suggestions on how I can get a really accurate number on this? I mean, I could probably get some good numbers from the database servers in terms of their writes in a given day, but when it comes to other processes running in the background I'm not sure how much these might really affect this number. So it's 2x1Gb/s then? client side 2x1, cluster side, 3x1. At that speed a single SSD from the list above would do, if you're a) aware of the risk that this SSD failing will kill all OSDs on that node and b) don't expect your cluster to be upgraded I'd really prefer 2 per node from our discussions so far - it's all a matter of cost, but I also don't want to jump to a poor decision just because it can't be afforded immediately. I'd rather gradually upgrade nodes as can be afforded then jump into cheap now only to have to pay a bigger price later. Well, I'd like to steer away from the consumer models if possible since they (AFAIK) don't contain caps to finish writes should a power loss occur, unless there is one that does? Not that I'm aware of. Also note that while Andrei is happy with his 520s (especially compared to the Samsungs) I have various 5x0 Intel SSDs in use as well and while they are quite nice the 3700s are so much faster (consistently) in comparison that one can't believe it ain't butter. ^o^ I'll have to see if I can get funding, I've already donated enough to get the (albeit used) servers and nic cards, I just can't personally afford to donate another 1K-1200, but hopefully I'll soon have it nailed down what exact model I would like to have and maybe I can get them to pay for at least 1/2 of them... God working for a school can be taxing at times. -Tony Christian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD selection
On Sun, Mar 1, 2015 at 6:32 PM, Christian Balzer ch...@gol.com wrote: Again, penultimately you will need to sit down, compile and compare the numbers. Start with this: http://ark.intel.com/products/family/83425/Data-Center-SSDs Pay close attention to the 3610 SSDs, while slightly more expensive they offer 10 times the endurance. Unfortunately, $300 vs $100 isn't really slightly more expensive ;) Although I did notice that the 3710's can be gotten for ~210. Guestimate the amount of data written to your cluster per day, break that down to the load a journal SSD will see and then multiply by at least 5 to be on the safe side. Then see which SSD will fit your expected usage pattern. Luckily I don't think there will be a ton of data per day written. The majority of servers whose VHDs will be stored in our cluster don't have a lot of frequent activity - aside from a few windows servers that have DBs servers in them (and even they don't write a ton of data per day really). You didn't mention your network, but I assume it's 10Gb/s? Would be nice, if I had access to the kind of cash to get a 10Gb network, I wouldn't be stressing the cost of a set of SSDs ;) At 135MB/s writes the 100GB DC S3500 will not cut the mustard in any shape or form when journaling for 4 HDDs. With 2 HDDs it might be a so-so choice, but still falling short. Most currenth 7.2K RPM HDDs these days can do around 150MB/s writes, however that's neither uniform, nor does Ceph do anything resembling a sequential write (which is where these speeds come from), so in my book 80-120MB/s on the SSD journal per HDD are enough. The drives I have access to that are in the cluster aren't the fastest, current drives out there; but what you're describing, to have even 3 HDD's per SSD, you'd need an SSD running 240-360MB/s write capability... Why does the ceph documentation then talk 1ssd per 4-5 osd drives? It would be near impossible to get an SSD to meet that level of speeds.. A speed hit is one thing, more than halving your bandwidth is bad, especially when thinking about backfilling. Although I'm working with more than 1Gb/s, it's a lot less than 10Gb/s, so there might be a threshold there where we wouldn't experience an issue where someone using 10G would (God I'd love a 10G network, but no budget for it) Journal size doesn't matter that much, 10GB is fine, 20GB x4 is OK with the 100GB DC drives, with 5xx consumer models I'd leave at least 50% free. Well, I'd like to steer away from the consumer models if possible since they (AFAIK) don't contain caps to finish writes should a power loss occur, unless there is one that does? -Tony Christian On Sun, 1 Mar 2015 15:08:10 -0600 Tony Harris wrote: Now, I've never setup a journal on a separate disk, I assume you have 4 partitions at 10GB / partition, I noticed in the docs they referred to 10 GB, as a good starter. Would it be better to have 4 partitions @ 10g ea or 4 @20? I know I'll take a speed hit, but unless I can get my work to buy the drives, they will have to sit with what my personal budget can afford and be willing to donate ;) -Tony On Sun, Mar 1, 2015 at 2:54 PM, Andrei Mikhailovsky and...@arhont.com wrote: I am not sure about the enterprise grade and underprovisioning, but for the Intel 520s i've got 240gbs (the speeds of 240 is a bit better than 120s). and i've left 50% underprovisioned. I've got 10GB for journals and I am using 4 osds per ssd. Andrei -- *From: *Tony Harris neth...@gmail.com *To: *Andrei Mikhailovsky and...@arhont.com *Cc: *ceph-users@lists.ceph.com, Christian Balzer ch...@gol.com *Sent: *Sunday, 1 March, 2015 8:49:56 PM *Subject: *Re: [ceph-users] SSD selection Ok, any size suggestion? Can I get a 120 and be ok? I see I can get DCS3500 120GB for within $120/drive so it's possible to get 6 of them... -Tony On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky and...@arhont.com wrote: I would not use a single ssd for 5 osds. I would recommend the 3-4 osds max per ssd or you will get the bottleneck on the ssd side. I've had a reasonable experience with Intel 520 ssds (which are not produced anymore). I've found Samsung 840 Pro to be horrible! Otherwise, it seems that everyone here recommends the DC3500 or DC3700 and it has the best wear per $ ratio out of all the drives. Andrei -- *From: *Tony Harris neth...@gmail.com *To: *Christian Balzer ch...@gol.com *Cc: *ceph-users@lists.ceph.com *Sent: *Sunday, 1 March, 2015 4:19:30 PM *Subject: *Re: [ceph-users] SSD selection Well, although I have 7 now per node, you make a good point and I'm in a position where I can either increase to 8 and split 4/4 and have 2 ssds, or reduce to 5 and use a single osd per node (the system
Re: [ceph-users] SSD selection
Well, although I have 7 now per node, you make a good point and I'm in a position where I can either increase to 8 and split 4/4 and have 2 ssds, or reduce to 5 and use a single osd per node (the system is not in production yet). Do all the DC lines have caps in them or just the DC s line? -Tony On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer ch...@gol.com wrote: On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote: Hi all, I have a small cluster together and it's running fairly well (3 nodes, 21 osds). I'm looking to improve the write performance a bit though, which I was hoping that using SSDs for journals would do. But, I was wondering what people had as recommendations for SSDs to act as journal drives. If I read the docs on ceph.com correctly, I'll need 2 ssds per node (with 7 drives in each node, I think the recommendation was 1ssd per 4-5 drives?) so I'm looking for drives that will work well without breaking the bank for where I work (I'll probably have to purchase them myself and donate, so my budget is somewhat small). Any suggestions? I'd prefer one that can finish its write in a power outage case, the only one I know of off hand is the intel dcs3700 I think, but at $300 it's WAY above my affordability range. Firstly, an uneven number of OSDs (HDDs) per node will bite you in the proverbial behind down the road when combined with journal SSDs, as one of those SSDs will wear our faster than the other. Secondly, how many SSDs you need is basically a trade-off between price, performance, endurance and limiting failure impact. I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing the write paths and IOPS and failure domain, but not the sequential speed or cost. Depending on what your write load is and the expected lifetime of this cluster, you might be able to get away with DC S3500s or even better the new DC S3610s. Keep in mind that buying a cheap, low endurance SSD now might cost you more down the road if you have to replace it after a year (TBW/$). All the cheap alternatives to DC level SSDs tend to wear out too fast, have no powercaps and tend to have unpredictable (caused by garbage collection) and steadily decreasing performance. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD selection
Ok, any size suggestion? Can I get a 120 and be ok? I see I can get DCS3500 120GB for within $120/drive so it's possible to get 6 of them... -Tony On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky and...@arhont.com wrote: I would not use a single ssd for 5 osds. I would recommend the 3-4 osds max per ssd or you will get the bottleneck on the ssd side. I've had a reasonable experience with Intel 520 ssds (which are not produced anymore). I've found Samsung 840 Pro to be horrible! Otherwise, it seems that everyone here recommends the DC3500 or DC3700 and it has the best wear per $ ratio out of all the drives. Andrei -- *From: *Tony Harris neth...@gmail.com *To: *Christian Balzer ch...@gol.com *Cc: *ceph-users@lists.ceph.com *Sent: *Sunday, 1 March, 2015 4:19:30 PM *Subject: *Re: [ceph-users] SSD selection Well, although I have 7 now per node, you make a good point and I'm in a position where I can either increase to 8 and split 4/4 and have 2 ssds, or reduce to 5 and use a single osd per node (the system is not in production yet). Do all the DC lines have caps in them or just the DC s line? -Tony On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer ch...@gol.com wrote: On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote: Hi all, I have a small cluster together and it's running fairly well (3 nodes, 21 osds). I'm looking to improve the write performance a bit though, which I was hoping that using SSDs for journals would do. But, I was wondering what people had as recommendations for SSDs to act as journal drives. If I read the docs on ceph.com correctly, I'll need 2 ssds per node (with 7 drives in each node, I think the recommendation was 1ssd per 4-5 drives?) so I'm looking for drives that will work well without breaking the bank for where I work (I'll probably have to purchase them myself and donate, so my budget is somewhat small). Any suggestions? I'd prefer one that can finish its write in a power outage case, the only one I know of off hand is the intel dcs3700 I think, but at $300 it's WAY above my affordability range. Firstly, an uneven number of OSDs (HDDs) per node will bite you in the proverbial behind down the road when combined with journal SSDs, as one of those SSDs will wear our faster than the other. Secondly, how many SSDs you need is basically a trade-off between price, performance, endurance and limiting failure impact. I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing the write paths and IOPS and failure domain, but not the sequential speed or cost. Depending on what your write load is and the expected lifetime of this cluster, you might be able to get away with DC S3500s or even better the new DC S3610s. Keep in mind that buying a cheap, low endurance SSD now might cost you more down the road if you have to replace it after a year (TBW/$). All the cheap alternatives to DC level SSDs tend to wear out too fast, have no powercaps and tend to have unpredictable (caused by garbage collection) and steadily decreasing performance. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Mail not reaching the list?
Hi,I've sent a couple of emails to the list since subscribing, but I've never seen them reach the list; I was just wondering if there was something wrong?___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Am I reaching the list now?
I was subscribed with a yahoo email address, but it was getting some grief so I decided to try using my gmail address, hopefully this one is working -Tony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] SSD selection
Hi all, I have a small cluster together and it's running fairly well (3 nodes, 21 osds). I'm looking to improve the write performance a bit though, which I was hoping that using SSDs for journals would do. But, I was wondering what people had as recommendations for SSDs to act as journal drives. If I read the docs on ceph.com correctly, I'll need 2 ssds per node (with 7 drives in each node, I think the recommendation was 1ssd per 4-5 drives?) so I'm looking for drives that will work well without breaking the bank for where I work (I'll probably have to purchase them myself and donate, so my budget is somewhat small). Any suggestions? I'd prefer one that can finish its write in a power outage case, the only one I know of off hand is the intel dcs3700 I think, but at $300 it's WAY above my affordability range. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph - networking question
Hi all, I've only been using ceph for a few months now and currently have a small cluster (3 nodes, 18 OSDs). I get decent performance based upon the configuration. My question is, should I have a larger pipe on the client/public network or on the ceph cluster private network? I can only have a larger pipe on one of the two. The most Ceph nodes we'd have in the foreseeable future is 7, current client VM Host count is 3 with a max of 5 in the future. Currently I can near about max out the throughput on the larger pipe with the read, but not even close on the write when the larger pipe is connected to the public/client side when benchmarking with rados. The smaller pipe, I still max out with read, but still not close with the write (well, close being relative to how many replications, if I use 2 replications, I can get 60% of theoretical max, 3 replications I get about 40% when using the smaller pipe on the client/public side). Basically, I'm not sure how to determine when I'm getting to a point of when the back end cluster private network starts to become the bottleneck that needs to be expanded. -Tony___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com