Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-09 Thread Tony Harris
Sounds to me like you've put yourself at too much risk - *if* I'm reading
your message right about your configuration, you have multiple hosts
accessing OSDs that are stored on a single shared box - so if that single
shared box (single point of failure for multiple nodes) goes down it's
possible for multiple replicas to disappear at the same time which could
halt the operation of your cluster if the masters and the replicas are both
on OSDs within that single shared storage system...

On Thu, Jul 9, 2015 at 5:42 AM, Mallikarjun Biradar 
mallikarjuna.bira...@gmail.com wrote:

 Hi all,

 Setup details:
 Two storage enclosures each connected to 4 OSD nodes (Shared storage).
 Failure domain is Chassis (enclosure) level. Replication count is 2.
 Each host has allotted with 4 drives.

 I have active client IO running on cluster. (Random write profile with
 4M block size  64 Queue depth).

 One of enclosure had power loss. So all OSD's from hosts that are
 connected to this enclosure went down as expected.

 But client IO got paused. After some time enclosure  hosts connected
 to it came up.
 And all OSD's on that hosts came up.

 Till this time, cluster was not serving IO. Once all hosts  OSD's
 pertaining to that enclosure came up, client IO resumed.


 Can anybody help me why cluster not serving IO during enclosure
 failure. OR its a bug?

 -Thanks  regards,
 Mallikarjun Biradar
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] The first infernalis dev release will be v9.0.0

2015-05-05 Thread Tony Harris
So with this, will even numbers then be LTS?  Since 9.0.0 is following
0.94.x/Hammer, and every other release is normally LTS, I'm guessing
10.x.x, 12.x.x, etc. will be LTS...

On Tue, May 5, 2015 at 11:45 AM, Sage Weil sw...@redhat.com wrote:

 On Tue, 5 May 2015, Joao Eduardo Luis wrote:
  On 05/04/2015 05:09 PM, Sage Weil wrote:
   The first Ceph release back in Jan of 2008 was 0.1.  That made sense at
   the time.  We haven't revised the versioning scheme since then,
 however,
   and are now at 0.94.1 (first Hammer point release).  To avoid reaching
   0.99 (and 0.100 or 1.00?) we have a new strategy.  This was discussed a
   bit on ceph-devel and in #ceph-devel and there doesn't appear to be any
   scheme that everyone likes.
  
   So, we're going to go with something that only a few people dislike:
  
x.0.z - development releases (for early testers and the brave at
 heart)
x.1.z - release candidates (for test clusters, brave users)
x.2.z - stable/bugfix releases (for users)
  
   x will start at 9 for Infernalis (I is the 9th letter), making our
 first
   development release of the 9th release cycle 9.0.0.  Subsequent
   development releases will be 9.0.1, 9.0.2, etc.
  
   In a couple months we'll have a 9.1.0 (and maybe 9.1.1) release
 candidate.
  
   A few weeks after that we'll have the Infernalis release 9.2.0,
 followed
   by stable bug fix updates 9.2.1, 9.2.2, etc., and then begin work on
 the
   Jewel (10.y.z) release.
  
   We'll see how this works out.  We can adjust this in the future to any
   other 9.y.z scheme (e.g., 9.1, 9.2 etc dev releases and 9.8.z stable
   releases); the main commitment here is to the 9 part, indicating
   Infernalis is the 9th major release cycle.
 
  Looks sane!
 
  I'm guessing once 9.1.0 is frozen the dev cycles will move on to 10.0.1?

 Yep!  Or 10.0.0 I guess since we just did 9.0.0.

 sage
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Hammer question..

2015-04-22 Thread Tony Harris
Hi all,

I have a cluster currently on Giant - is Hammer stable/ready for production
use?

-Tony
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Do I have enough pgs?

2015-04-15 Thread Tony Harris
Hi all,

I have a cluster of 3 nodes, 18 OSDs.  I used the pgcalc to give a
suggested number of PGs - here was my list:

Group1   3 rep  18 OSDs  30% data  512PGs
Group2   3 rep  18 OSDs  30% data  512PGs
Group3   3 rep  18 OSDs  30% data  512PGs
Group4   2 rep  18 OSDs  5% data  256PGs
Group5   2 rep  18 OSDs  5% data  256PGs

My estimated growth is to 27-36 OSDs within the next 18 months, after that
probably pretty stagnant for the next several years.

Thoughts?

-Tony
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Tony Harris
I know I'm not even close to this type of a problem yet with my small
cluster (both test and production clusters) - but it would be great if
something like that could appear in the cluster HEALTHWARN, if Ceph could
determine the amount of used processes and compare them against the current
limit then throw a health warning if it gets within say 10 or 15% of the
max value.  That would be a really quick indicator for anyone who
frequently checks the health status (like through a web portal) as they may
see it more quickly then during their regular log check interval.  Just a
thought.

-Tony

On Mon, Mar 9, 2015 at 2:01 PM, Sage Weil s...@newdream.net wrote:

 On Mon, 9 Mar 2015, Karan Singh wrote:
  Thanks Guys kernel.pid_max=4194303 did the trick.

 Great to hear!  Sorry we missed that you only had it at 65536.

 This is a really common problem that people hit when their clusters start
 to grow.  Is there somewhere in the docs we can put this to catch more
 users?  Or maybe a warning issued by the osds themselves or something if
 they see limits that are low?

 sage

  - Karan -
 
On 09 Mar 2015, at 14:48, Christian Eichelmann
christian.eichelm...@1und1.de wrote:
 
  Hi Karan,
 
  as you are actually writing in your own book, the problem is the
  sysctl
  setting kernel.pid_max. I've seen in your bug report that you were
  setting it to 65536, which is still to low for high density hardware.
 
  In our cluster, one OSD server has in an idle situation about 66.000
  Threads (60 OSDs per Server). The number of threads increases when you
  increase the number of placement groups in the cluster, which I think
  has triggered your problem.
 
  Set the kernel.pid_max setting to 4194303 (the maximum) like Azad
  Aliyar suggested, and the problem should be gone.
 
  Regards,
  Christian
 
  Am 09.03.2015 11:41, schrieb Karan Singh:
Hello Community need help to fix a long going Ceph
problem.
 
Cluster is unhealthy , Multiple OSDs are DOWN. When i am
trying to
restart OSD?s i am getting this error
 
 
/2015-03-09 12:22:16.312774 7f760dac9700 -1
common/Thread.cc
http://Thread.cc: In function 'void
Thread::create(size_t)' thread
7f760dac9700 time 2015-03-09 12:22:16.311970/
/common/Thread.cc http://Thread.cc: 129: FAILED
assert(ret == 0)/
 
 
*Environment *:  4 Nodes , OSD+Monitor , Firefly latest ,
CentOS6.5
, 3.17.2-1.el6.elrepo.x86_64
 
Tried upgrading from 0.80.7 to 0.80.8  but no Luck
 
Tried centOS stock kernel 2.6.32  but no Luck
 
Memory is not a problem more then 150+GB is free
 
 
Did any one every faced this problem ??
 
*Cluster status *
*
*
/ cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
/ health HEALTH_WARN 7334 pgs degraded; 1185 pgs down;
1 pgs
incomplete; 1735 pgs peering; 8938 pgs stale; 1/
/736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs
stuck unclean;
recovery 6061/31080 objects degraded (19/
/.501%); 111/196 in osds are down; clock skew detected on
mon.pouta-s02,
mon.pouta-s03/
/ monmap e3: 3 mons at
 
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX
.50.3:6789/
//0}, election epoch 1312, quorum 0,1,2
pouta-s01,pouta-s02,pouta-s03/
/   * osdmap e26633: 239 osds: 85 up, 196 in*/
/  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data,
10360 objects/
/4699 GB used, 707 TB / 711 TB avail/
/6061/31080 objects degraded (19.501%)/
/  14 down+remapped+peering/
/  39 active/
/3289 active+clean/
/ 547 peering/
/ 663 stale+down+peering/
/ 705 stale+active+remapped/
/   1 active+degraded+remapped/
/   1 stale+down+incomplete/
/ 484 down+peering/
/ 455 active+remapped/
/3696 stale+active+degraded/
/   4 remapped+peering/
/  23 stale+down+remapped+peering/
/  51 stale+active/
/3637 active+degraded/
/3799 stale+active+clean/
 
*OSD :  Logs *
 
/2015-03-09 12:22:16.312774 7f760dac9700 -1
common/Thread.cc
http://Thread.cc: In function 'void
Thread::create(size_t)' thread
7f760dac9700 time 2015-03-09 12:22:16.311970/
/common/Thread.cc http://Thread.cc: 129: FAILED
assert(ret == 0)/
/
/
/ ceph version 0.80.8
(69eaad7f8308f21573c604f121956e64679a52a7)/
/ 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/
/ 2: 

[ceph-users] Question about rados bench

2015-03-03 Thread Tony Harris
Hi all,

In my reading on the net about various implementations of Ceph, I came
across this website blog page (really doesn't give a lot of good
information but caused me to wonder):

http://avengermojo.blogspot.com/2014/12/cubieboard-cluster-ceph-test.html

near the bottom, the person did a rados bench test.  During the write
phase, there were several areas where there was a 0 in the cur MB/s.  I
figure there must have been a bottleneck somewhere slowing down the
operation where data wasn't getting written.  Is something like that during
a benchmark test something that one should be concerned about?  Is there a
good procedure for tracking down where the bottleneck is (like if it's a
given OSD?)  Is the data cached and just taking a long time to write or is
it lost in an instance like that?

-Tony
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New SSD Question

2015-03-02 Thread Tony Harris
Hi all,

After the previous thread, I'm doing my SSD shopping for  and I came across
an SSD called an Edge Boost Pro w/ Power Fail, it seems to have some
impressive specs - in most places decent user reviews, in once place a poor
one - I was wondering if anyone has had any experience with these drives
with Ceph?  Does it work well?  Reliability issues?  etc.  Right now I'm
looking at getting Intel DC S3700's, but the price on these Edge drives are
pretty good for the 240G model, but almost TGTBT for the speed and power
fail caps, so I didn't want to take a chance if they were really
problematical as I'd rather just use a drive I know people have had quality
success with.

-Tony
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD selection

2015-03-02 Thread Tony Harris
On Sun, Mar 1, 2015 at 11:19 PM, Christian Balzer ch...@gol.com wrote:


  
  I'll be honest, the pricing on Intel's website is far from reality.  I
  haven't been able to find any OEMs, and retail pricing on the 200GB 3610
  is ~231 (the $300 must have been a different model in the line).
  Although $231 does add up real quick if I need to get 6 of them :(
 
 
 Using the google shopping (which isn't ideal, but for simplicities sake)
 search I see the 100GB DC S3700 from 170USD and the 160GB DC S3500 from
 150USD, which are a pretty good match to the OEM price on the Intel site
 of 180 and 160 respectively.


If I have to buy them personally, that'll work well.  If I can get work to
get them, then I kinda have to limit myself to whom we have marked as
suppliers as it's a pain to get a new company in the mix.



   You really wouldn't want less than 200MB/s, even in your setup which I
   take to be 2Gb/s from what you wrote below.
 
 
 
   Note that the 100GB 3700 is going to perform way better and last
   immensely longer than the 160GB 3500 while being moderately more
   expensive, while the the 200GB 3610 is faster (IOPS), lasting 10 times
   long AND cheaper than the 240GB 3500.
  
   It is pretty much those numbers that made me use 4 100GB 3700s instead
   of 3500s (240GB), much more bang for the buck and it still did fit my
   budget and could deal with 80% of the network bandwidth.
  
 
  So the 3710's would be an ok solution?

 No, because they start from 200GB and with a 300USD price tag. The 3710s
 do not replace the 3700s, they extend the selection upwards (in size
 mostly).


I thought I had corrected that - I was thinking the 3700's and typed 3710 :)



 I have seen the 3700s for right
  about $200, which although doesn't seem a lot cheaper, when getting 6,
  that does shave about $200 after shipping costs as well...
 
 See above, google shopping. The lowballer is Walmart, of all places:

 http://www.walmart.com/ip/26972768?wmlspartner=wlpaselectedSellerId=0


 
  
   

 Guestimate the amount of data written to your cluster per day,
 break that down to the load a journal SSD will see and then
 multiply by at least 5 to be on the safe side. Then see which SSD
 will fit your expected usage pattern.

   
Luckily I don't think there will be a ton of data per day written.
The majority of servers whose VHDs will be stored in our cluster
don't have a lot of frequent activity - aside from a few windows
servers that have DBs servers in them (and even they don't write a
ton of data per day really).
   
  
   Being able to put even a coarse number on this will tell you if you can
   skim on the endurance and have your cluster last like 5 years or if
   getting a higher endurance SSD is going to be cheaper.
  
 
  Any suggestions on how I can get a really accurate number on this?  I
  mean, I could probably get some good numbers from the database servers
  in terms of their writes in a given day, but when it comes to other
  processes running in the background I'm not sure how much these  might
  really affect this number.
 

 If you have existing servers that run linux and have been up for
 reasonably long time (months), iostat will give you a very good idea.
 No ideas about Windows, but I bet those stats exist someplace, too.


I can't say months, but at least a month, maybe two - trying to remember
when our last extended power outage was - I can find out later.



 For example a Ceph storage node, up 74 days with OS and journals on the
 first 4 drives and OSD HDDs on the other 8:

 Device:tpskB_read/skB_wrtn/skB_read kB_wrtn
 sda   9.8229.88   187.87  191341125 1203171718
 sdb   9.7929.57   194.22  189367432 1243850846
 sdc   9.7729.83   188.89  191061000 1209676622
 sdd   8.7729.57   175.40  189399240 1123294410
 sde   5.24   354.1955.68 2268306443  356604748
 sdi   5.02   335.6163.60 2149338787  407307544
 sdj   4.96   350.3352.43 2243590803  335751320
 sdl   5.04   374.6248.49 2399170183  310559488
 sdf   4.85   354.5250.43 2270401571  322947192
 sdh   4.77   332.3850.60 2128622471  324065888
 sdg   6.26   403.9765.42 2587109283  418931316
 sdk   5.86   385.3655.61 2467921295  356120140


I do have some linux vms that have been up for a while, can't say how many
months since the last extended power outage off hand (granted I know once I
look at the uptime), but hopefully it will at least give me an idea.


 
  
  
   
   So it's 2x1Gb/s then?
  
 
  client side 2x1, cluster side, 3x1.
 
 So 500MB/s with trailing wind on a sunny day.

 Meaning that something that can do about 400MB/s will do nicely, as you're
 only even going to get near that when 

Re: [ceph-users] SSD selection

2015-03-01 Thread Tony Harris
Now, I've never setup a journal on a separate disk, I assume you have 4
partitions at 10GB / partition, I noticed in the docs they referred to 10
GB, as a good starter.  Would it be better to have 4 partitions @ 10g ea or
4 @20?

I know I'll take a speed hit, but unless I can get my work to buy the
drives, they will have to sit with what my personal budget can afford and
be willing to donate ;)

-Tony

On Sun, Mar 1, 2015 at 2:54 PM, Andrei Mikhailovsky and...@arhont.com
wrote:

 I am not sure about the enterprise grade and underprovisioning, but for
 the Intel 520s i've got 240gbs (the speeds of 240 is a bit better than
 120s). and i've left 50% underprovisioned. I've got 10GB for journals and I
 am using 4 osds per ssd.

 Andrei


 --

 *From: *Tony Harris neth...@gmail.com
 *To: *Andrei Mikhailovsky and...@arhont.com
 *Cc: *ceph-users@lists.ceph.com, Christian Balzer ch...@gol.com
 *Sent: *Sunday, 1 March, 2015 8:49:56 PM

 *Subject: *Re: [ceph-users] SSD selection

 Ok, any size suggestion?  Can I get a 120 and be ok?  I see I can get
 DCS3500 120GB for within $120/drive so it's possible to get 6 of them...

 -Tony

 On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky and...@arhont.com
 wrote:


 I would not use a single ssd for 5 osds. I would recommend the 3-4 osds
 max per ssd or you will get the bottleneck on the ssd side.

 I've had a reasonable experience with Intel 520 ssds (which are not
 produced anymore). I've found Samsung 840 Pro to be horrible!

 Otherwise, it seems that everyone here recommends the DC3500 or DC3700
 and it has the best wear per $ ratio out of all the drives.

 Andrei


 --

 *From: *Tony Harris neth...@gmail.com
 *To: *Christian Balzer ch...@gol.com
 *Cc: *ceph-users@lists.ceph.com
 *Sent: *Sunday, 1 March, 2015 4:19:30 PM
 *Subject: *Re: [ceph-users] SSD selection


 Well, although I have 7 now per node, you make a good point and I'm in a
 position where I can either increase to 8 and split 4/4 and have 2 ssds, or
 reduce to 5 and use a single osd per node (the system is not in production
 yet).

 Do all the DC lines have caps in them or just the DC s line?

 -Tony

 On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer ch...@gol.com wrote:

 On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:

  Hi all,
 
  I have a small cluster together and it's running fairly well (3 nodes,
 21
  osds).  I'm looking to improve the write performance a bit though,
 which
  I was hoping that using SSDs for journals would do.  But, I was
 wondering
  what people had as recommendations for SSDs to act as journal drives.
  If I read the docs on ceph.com correctly, I'll need 2 ssds per node
  (with 7 drives in each node, I think the recommendation was 1ssd per
 4-5
  drives?) so I'm looking for drives that will work well without breaking
  the bank for where I work (I'll probably have to purchase them myself
  and donate, so my budget is somewhat small).  Any suggestions?  I'd
  prefer one that can finish its write in a power outage case, the only
  one I know of off hand is the intel dcs3700 I think, but at $300 it's
  WAY above my affordability range.

 Firstly, an uneven number of OSDs (HDDs) per node will bite you in the
 proverbial behind down the road when combined with journal SSDs, as one
 of
 those SSDs will wear our faster than the other.

 Secondly, how many SSDs you need is basically a trade-off between price,
 performance, endurance and limiting failure impact.

 I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing
 the write paths and IOPS and failure domain, but not the sequential speed
 or cost.

 Depending on what your write load is and the expected lifetime of this
 cluster, you might be able to get away with DC S3500s or even better the
 new DC S3610s.
 Keep in mind that buying a cheap, low endurance SSD now might cost you
 more down the road if you have to replace it after a year (TBW/$).

 All the cheap alternatives to DC level SSDs tend to wear out too fast,
 have no powercaps and tend to have unpredictable (caused by garbage
 collection) and steadily decreasing performance.

 Christian
 --
 Christian BalzerNetwork/Systems Engineer
 ch...@gol.com   Global OnLine Japan/Fusion Communications
 http://www.gol.com/



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD selection

2015-03-01 Thread Tony Harris
On Sun, Mar 1, 2015 at 10:18 PM, Christian Balzer ch...@gol.com wrote:

 On Sun, 1 Mar 2015 21:26:16 -0600 Tony Harris wrote:

  On Sun, Mar 1, 2015 at 6:32 PM, Christian Balzer ch...@gol.com wrote:
 
  
   Again, penultimately you will need to sit down, compile and compare the
   numbers.
  
   Start with this:
   http://ark.intel.com/products/family/83425/Data-Center-SSDs
  
   Pay close attention to the 3610 SSDs, while slightly more expensive
   they offer 10 times the endurance.
  
 
  Unfortunately, $300 vs $100 isn't really slightly more expensive ;)
   Although I did notice that the 3710's can be gotten for ~210.
 
 
 I'm not sure where you get those prices from or what you're comparing with
 what but if you look at the OEM prices in the URL up there (which compare
 quite closely to what you can find when looking at shopping prices) a
 comparison with closely matched capabilities goes like this:

 http://ark.intel.com/compare/71913,86640,75680,75679


I'll be honest, the pricing on Intel's website is far from reality.  I
haven't been able to find any OEMs, and retail pricing on the 200GB 3610 is
~231 (the $300 must have been a different model in the line).  Although
$231 does add up real quick if I need to get 6 of them :(


 You really wouldn't want less than 200MB/s, even in your setup which I
 take to be 2Gb/s from what you wrote below.



 Note that the 100GB 3700 is going to perform way better and last immensely
 longer than the 160GB 3500 while being moderately more expensive, while
 the the 200GB 3610 is faster (IOPS), lasting 10 times long AND cheaper than
 the 240GB 3500.

 It is pretty much those numbers that made me use 4 100GB 3700s instead of
 3500s (240GB), much more bang for the buck and it still did fit my budget
 and could deal with 80% of the network bandwidth.


So the 3710's would be an ok solution?  I have seen the 3700s for right
about $200, which although doesn't seem a lot cheaper, when getting 6, that
does shave about $200 after shipping costs as well...



 
  
   Guestimate the amount of data written to your cluster per day, break
   that down to the load a journal SSD will see and then multiply by at
   least 5 to be on the safe side. Then see which SSD will fit your
   expected usage pattern.
  
 
  Luckily I don't think there will be a ton of data per day written.  The
  majority of servers whose VHDs will be stored in our cluster don't have a
  lot of frequent activity - aside from a few windows servers that have DBs
  servers in them (and even they don't write a ton of data per day really).
 

 Being able to put even a coarse number on this will tell you if you can
 skim on the endurance and have your cluster last like 5 years or if
 getting a higher endurance SSD is going to be cheaper.


Any suggestions on how I can get a really accurate number on this?  I mean,
I could probably get some good numbers from the database servers in terms
of their writes in a given day, but when it comes to other processes
running in the background I'm not sure how much these  might really affect
this number.




 
 So it's 2x1Gb/s then?


client side 2x1, cluster side, 3x1.



 At that speed a single SSD from the list above would do, if you're
 a) aware of the risk that this SSD failing will kill all OSDs on that node
 and
 b) don't expect your cluster to be upgraded


I'd really prefer 2 per node from our discussions so far - it's all a
matter of cost, but I also don't want to jump to a poor decision just
because it can't be afforded immediately.  I'd rather gradually upgrade
nodes as can be afforded then jump into cheap now only to have to pay a
bigger price later.



  Well, I'd like to steer away from the consumer models if possible since
  they (AFAIK) don't contain caps to finish writes should a power loss
  occur, unless there is one that does?
 
 Not that I'm aware of.

 Also note that while Andrei is happy with his 520s (especially compared to
 the Samsungs) I have various 5x0 Intel SSDs in use as well and while they
 are quite nice the 3700s are so much faster (consistently) in comparison
 that one can't believe it ain't butter. ^o^


I'll have to see if I can get funding, I've already donated enough to get
the (albeit used) servers and nic cards, I just can't personally afford to
donate another 1K-1200, but hopefully I'll soon have it nailed down what
exact model I would like to have and maybe I can get them to pay for at
least 1/2 of them...  God working for a school can be taxing at times.

-Tony




 Christian


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD selection

2015-03-01 Thread Tony Harris
On Sun, Mar 1, 2015 at 6:32 PM, Christian Balzer ch...@gol.com wrote:


 Again, penultimately you will need to sit down, compile and compare the
 numbers.

 Start with this:
 http://ark.intel.com/products/family/83425/Data-Center-SSDs

 Pay close attention to the 3610 SSDs, while slightly more expensive they
 offer 10 times the endurance.


Unfortunately, $300 vs $100 isn't really slightly more expensive ;)
 Although I did notice that the 3710's can be gotten for ~210.




 Guestimate the amount of data written to your cluster per day, break that
 down to the load a journal SSD will see and then multiply by at least 5 to
 be on the safe side. Then see which SSD will fit your expected usage
 pattern.


Luckily I don't think there will be a ton of data per day written.  The
majority of servers whose VHDs will be stored in our cluster don't have a
lot of frequent activity - aside from a few windows servers that have DBs
servers in them (and even they don't write a ton of data per day really).




 You didn't mention your network, but I assume it's 10Gb/s?


Would be nice, if I had access to the kind of cash to get a 10Gb network, I
wouldn't be stressing the cost of a set of SSDs ;)



 At 135MB/s writes the 100GB DC S3500 will not cut the mustard in any shape
 or form when journaling for 4 HDDs.
 With 2 HDDs it might be a so-so choice, but still falling short.
 Most currenth 7.2K RPM HDDs these days can do around 150MB/s writes,
 however that's neither uniform, nor does Ceph do anything resembling a
 sequential write (which is where these speeds come from), so in my book
 80-120MB/s on the SSD journal per HDD are enough.


The drives I have access to that are in the cluster aren't the fastest,
current drives out there; but what you're describing, to have even 3 HDD's
per SSD, you'd need an SSD running 240-360MB/s write capability...  Why
does the ceph documentation then talk 1ssd per 4-5 osd drives?  It would be
near impossible to get an SSD to meet that level of speeds..



 A speed hit is one thing, more than halving your bandwidth is bad,
 especially when thinking about backfilling.


Although I'm working with more than 1Gb/s, it's a lot less than 10Gb/s, so
there might be a threshold there where we wouldn't experience an issue
where someone using 10G would (God I'd love a 10G network, but no budget
for it)



 Journal size doesn't matter that much, 10GB is fine, 20GB x4 is OK with
 the 100GB DC drives, with 5xx consumer models I'd leave at least 50% free.


Well, I'd like to steer away from the consumer models if possible since
they (AFAIK) don't contain caps to finish writes should a power loss occur,
unless there is one that does?

-Tony



 Christian

 On Sun, 1 Mar 2015 15:08:10 -0600 Tony Harris wrote:

  Now, I've never setup a journal on a separate disk, I assume you have 4
  partitions at 10GB / partition, I noticed in the docs they referred to 10
  GB, as a good starter.  Would it be better to have 4 partitions @ 10g ea
  or 4 @20?
 
  I know I'll take a speed hit, but unless I can get my work to buy the
  drives, they will have to sit with what my personal budget can afford and
  be willing to donate ;)
 
  -Tony
 
  On Sun, Mar 1, 2015 at 2:54 PM, Andrei Mikhailovsky and...@arhont.com
  wrote:
 
   I am not sure about the enterprise grade and underprovisioning, but for
   the Intel 520s i've got 240gbs (the speeds of 240 is a bit better than
   120s). and i've left 50% underprovisioned. I've got 10GB for journals
   and I am using 4 osds per ssd.
  
   Andrei
  
  
   --
  
   *From: *Tony Harris neth...@gmail.com
   *To: *Andrei Mikhailovsky and...@arhont.com
   *Cc: *ceph-users@lists.ceph.com, Christian Balzer ch...@gol.com
   *Sent: *Sunday, 1 March, 2015 8:49:56 PM
  
   *Subject: *Re: [ceph-users] SSD selection
  
   Ok, any size suggestion?  Can I get a 120 and be ok?  I see I can get
   DCS3500 120GB for within $120/drive so it's possible to get 6 of
   them...
  
   -Tony
  
   On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky
   and...@arhont.com wrote:
  
  
   I would not use a single ssd for 5 osds. I would recommend the 3-4
   osds max per ssd or you will get the bottleneck on the ssd side.
  
   I've had a reasonable experience with Intel 520 ssds (which are not
   produced anymore). I've found Samsung 840 Pro to be horrible!
  
   Otherwise, it seems that everyone here recommends the DC3500 or DC3700
   and it has the best wear per $ ratio out of all the drives.
  
   Andrei
  
  
   --
  
   *From: *Tony Harris neth...@gmail.com
   *To: *Christian Balzer ch...@gol.com
   *Cc: *ceph-users@lists.ceph.com
   *Sent: *Sunday, 1 March, 2015 4:19:30 PM
   *Subject: *Re: [ceph-users] SSD selection
  
  
   Well, although I have 7 now per node, you make a good point and I'm
   in a position where I can either increase to 8 and split 4/4 and have
   2 ssds, or reduce to 5 and use a single osd per node (the system

Re: [ceph-users] SSD selection

2015-03-01 Thread Tony Harris
Well, although I have 7 now per node, you make a good point and I'm in a
position where I can either increase to 8 and split 4/4 and have 2 ssds, or
reduce to 5 and use a single osd per node (the system is not in production
yet).

Do all the DC lines have caps in them or just the DC s line?

-Tony

On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer ch...@gol.com wrote:

 On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:

  Hi all,
 
  I have a small cluster together and it's running fairly well (3 nodes, 21
  osds).  I'm looking to improve the write performance a bit though, which
  I was hoping that using SSDs for journals would do.  But, I was wondering
  what people had as recommendations for SSDs to act as journal drives.
  If I read the docs on ceph.com correctly, I'll need 2 ssds per node
  (with 7 drives in each node, I think the recommendation was 1ssd per 4-5
  drives?) so I'm looking for drives that will work well without breaking
  the bank for where I work (I'll probably have to purchase them myself
  and donate, so my budget is somewhat small).  Any suggestions?  I'd
  prefer one that can finish its write in a power outage case, the only
  one I know of off hand is the intel dcs3700 I think, but at $300 it's
  WAY above my affordability range.

 Firstly, an uneven number of OSDs (HDDs) per node will bite you in the
 proverbial behind down the road when combined with journal SSDs, as one of
 those SSDs will wear our faster than the other.

 Secondly, how many SSDs you need is basically a trade-off between price,
 performance, endurance and limiting failure impact.

 I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing
 the write paths and IOPS and failure domain, but not the sequential speed
 or cost.

 Depending on what your write load is and the expected lifetime of this
 cluster, you might be able to get away with DC S3500s or even better the
 new DC S3610s.
 Keep in mind that buying a cheap, low endurance SSD now might cost you
 more down the road if you have to replace it after a year (TBW/$).

 All the cheap alternatives to DC level SSDs tend to wear out too fast,
 have no powercaps and tend to have unpredictable (caused by garbage
 collection) and steadily decreasing performance.

 Christian
 --
 Christian BalzerNetwork/Systems Engineer
 ch...@gol.com   Global OnLine Japan/Fusion Communications
 http://www.gol.com/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD selection

2015-03-01 Thread Tony Harris
Ok, any size suggestion?  Can I get a 120 and be ok?  I see I can get
DCS3500 120GB for within $120/drive so it's possible to get 6 of them...

-Tony

On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky and...@arhont.com
wrote:


 I would not use a single ssd for 5 osds. I would recommend the 3-4 osds
 max per ssd or you will get the bottleneck on the ssd side.

 I've had a reasonable experience with Intel 520 ssds (which are not
 produced anymore). I've found Samsung 840 Pro to be horrible!

 Otherwise, it seems that everyone here recommends the DC3500 or DC3700 and
 it has the best wear per $ ratio out of all the drives.

 Andrei


 --

 *From: *Tony Harris neth...@gmail.com
 *To: *Christian Balzer ch...@gol.com
 *Cc: *ceph-users@lists.ceph.com
 *Sent: *Sunday, 1 March, 2015 4:19:30 PM
 *Subject: *Re: [ceph-users] SSD selection


 Well, although I have 7 now per node, you make a good point and I'm in a
 position where I can either increase to 8 and split 4/4 and have 2 ssds, or
 reduce to 5 and use a single osd per node (the system is not in production
 yet).

 Do all the DC lines have caps in them or just the DC s line?

 -Tony

 On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer ch...@gol.com wrote:

 On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:

  Hi all,
 
  I have a small cluster together and it's running fairly well (3 nodes,
 21
  osds).  I'm looking to improve the write performance a bit though, which
  I was hoping that using SSDs for journals would do.  But, I was
 wondering
  what people had as recommendations for SSDs to act as journal drives.
  If I read the docs on ceph.com correctly, I'll need 2 ssds per node
  (with 7 drives in each node, I think the recommendation was 1ssd per 4-5
  drives?) so I'm looking for drives that will work well without breaking
  the bank for where I work (I'll probably have to purchase them myself
  and donate, so my budget is somewhat small).  Any suggestions?  I'd
  prefer one that can finish its write in a power outage case, the only
  one I know of off hand is the intel dcs3700 I think, but at $300 it's
  WAY above my affordability range.

 Firstly, an uneven number of OSDs (HDDs) per node will bite you in the
 proverbial behind down the road when combined with journal SSDs, as one of
 those SSDs will wear our faster than the other.

 Secondly, how many SSDs you need is basically a trade-off between price,
 performance, endurance and limiting failure impact.

 I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing
 the write paths and IOPS and failure domain, but not the sequential speed
 or cost.

 Depending on what your write load is and the expected lifetime of this
 cluster, you might be able to get away with DC S3500s or even better the
 new DC S3610s.
 Keep in mind that buying a cheap, low endurance SSD now might cost you
 more down the road if you have to replace it after a year (TBW/$).

 All the cheap alternatives to DC level SSDs tend to wear out too fast,
 have no powercaps and tend to have unpredictable (caused by garbage
 collection) and steadily decreasing performance.

 Christian
 --
 Christian BalzerNetwork/Systems Engineer
 ch...@gol.com   Global OnLine Japan/Fusion Communications
 http://www.gol.com/



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mail not reaching the list?

2015-02-28 Thread Tony Harris
Hi,I've sent a couple of emails to the list since subscribing, but I've never 
seen them reach the list; I was just wondering if there was something wrong?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Am I reaching the list now?

2015-02-28 Thread Tony Harris
I was subscribed with a yahoo email address, but it was getting some grief
so I decided to try using my gmail address, hopefully this one is working

-Tony
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSD selection

2015-02-28 Thread Tony Harris
Hi all,

I have a small cluster together and it's running fairly well (3 nodes, 21
osds).  I'm looking to improve the write performance a bit though, which I
was hoping that using SSDs for journals would do.  But, I was wondering
what people had as recommendations for SSDs to act as journal drives.  If I
read the docs on ceph.com correctly, I'll need 2 ssds per node (with 7
drives in each node, I think the recommendation was 1ssd per 4-5 drives?)
so I'm looking for drives that will work well without breaking the bank for
where I work (I'll probably have to purchase them myself and donate, so my
budget is somewhat small).  Any suggestions?  I'd prefer one that can
finish its write in a power outage case, the only one I know of off hand is
the intel dcs3700 I think, but at $300 it's WAY above my affordability
range.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph - networking question

2015-02-27 Thread Tony Harris
Hi all,
I've only been using ceph for a few months now and currently have a small 
cluster (3 nodes, 18 OSDs).  I get decent performance based upon the 
configuration.
My question is, should I have a larger pipe on the client/public network or on 
the ceph cluster private network?  I can only have a larger pipe on one of the 
two.  The most Ceph nodes we'd have in the foreseeable future is 7, current 
client VM Host count is 3 with a max of 5 in the future.
Currently I can near about max out the throughput on the larger pipe with the 
read, but not even close on the write when the larger pipe is connected to the 
public/client side when benchmarking with rados.  The smaller pipe, I still max 
out with read, but still not close with the write (well, close being relative 
to how many replications, if I use 2 replications, I can get 60% of theoretical 
max, 3 replications I get about 40% when using the smaller pipe on the 
client/public side).
Basically, I'm not sure how to determine when I'm getting to a point of when 
the back end cluster private network starts to become the bottleneck that needs 
to be expanded.
-Tony___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com