Wow, thanks for an awesome reply, Steve! On Friday, October 12, 2012, Steve Loughran wrote:
> > > On 11 October 2012 20:47, Goldstone, Robin J. > <[email protected]<javascript:_e({}, 'cvml', '[email protected]');> > > wrote: > >> Be sure you are comparing apples to apples. The E5-2650 has a larger >> cache than the E5-2640, faster system bus and can support faster (1600Ghz >> vs 1333Ghz) DRAM resulting in greater potential memory bandwidth. >> >> http://ark.intel.com/compare/64590,64591 >> >> > mmm. There is more $L3, and in-CPU sync can be done better than over the > inter-socket bus -you're also less vulnerable to NUMA memory allocation > issues (*). > > There's another issue that drives these recommendations, namely the price > curve that server parts follow over time, the Bill-of-Materials curve, aka > the "BOM Curve". Most parts come in at one price, and that price drops over > time as a function of: volume parts shipped covering > Non-Recoverable-Engineering (NRE costs), improvements in yield and > manufacturing quality in that specific process, ...etc), until it levels > out a actual selling price (ASP) to the people who make the boxes (Original > Design Manufacturers==ODMs) where it tends to stay for the rest of that > part's lifespan. > > DRAM, HDDs follow a fairly predictable exponential decay curve. You can > look at the cost of a part, it's history, determine the variables and then > come up with a prediction of how much it will cost at a time in the near > future. It's these BOM curves that was key to Dell's business model -direct > sales to customer meant they didn't need so much inventory and could > actually get into a situation where they had the cash from the customer > before the ODM had built the box, let alone been paid for it. There was a > price: utter unpredictability of what DRAM and HDDs you were going to get. > Server-side things have stabilised and all the tier-1 PC vendors qualify a > set of DRAM and storage options, so they can source from multiple vendors, > so eliminating a single vendor as a SPOF and allowing them to negotiate > better on cost of parts -which again changes that BOM curve. > > This may seem strange but you should all know that the retail price of a > laptop, flatscreen TV, etc comes down over time -what's not so obvious are > the maths behind the changes in it's price. > > One of the odd parts in this business is the CPU. There is a near-monopoly > in supplies, and intel don't want their business at the flat bit of the > curve. They need the money not just to keep their shareholders happy, but > for the $B needed to build the next generation of Fabs and hence continue > to keep their shareholders happy in future. Intel parts come in high when > they initially ship, and stay at that price until the next time Intel > change their price list, which is usually quarterly. The first price change > is very steep, then the gradient d$/dT reduces, as it gets low enough that > part drops off the price list never to be seen again, except maybe in > embedded designs. > > What does that mean? It means you pay a lot for the top of the line x86 > CPUs, and unless you are 100% sure that you really need it, you may be > better off investing your money in: > -more DRAM with better ECCs (product placement: Chip-kill), buffering, : > less swapping, ability to run more reducers/node. > -more HDDs : more storage in same #of racks, assuming your site can take > the weight. > -SFF HDDs : less storage but more IO bandwidth off the disks. > -SSD: faster storage > -GPUs: very good performance for algorithms you can recompile onto them > -support from Hortonworks to can keep your Hadoop cluster going. > -10 GbE networking, or multiple bonded 1GbE > -more servers (this becomes more of a factor on larger clusters, where > the cost savings of the less expensive parts scale up) > -paying the electricity bill. > -keeping the cost of building up a hadoop cluster down, so making it more > affordable to store PB of data whose value will only appreciate over time. > -paying your ops team more money, keeping them happier and so increasing > the probability they will field the 4am support crisis. > > That's why it isn't clear cut that 8 cores are better. It's not just a > simple performance question -it's the opportunity cost of the price > difference scaled up by the number of nodes. You do -as Ted pointed out- > need to know what you actually want. > > Finally, as a basic "data science" exercise for the reader: > > 1. calculate the price curves of, say, a Dell laptop, and compare with the > price curve of an apple laptop introduced with the same CPU and at the same > time. Don't look at the absolute values -normalising them to a percentage > is better to view. > 2. Look at which one follows a soft gradient and which follows more of a > step function. > 3. add to the graph the intel pricing and see how that correlates with the > ASP. > 4. Determine from this which vendor has the best margins -not just at time > of release, but over the lifespan of a product. Integration is a useful > technique here. Bear in mind Apple's NRE costs on laptop are higher due to > the better HW design but also the software development is only funded from > their sales alone. > 5. Using this information, decide when is the best time to buy a dell or > an apple laptop. > > > I should make a blog post of this, "server prices: it's all down to the > exponential decay equations of the individual parts" > > Steve "why yes, I have spent time in the PC industry" Loughran > > > > (*) If you don't know what NUMA this is, do some research and think about > its implications in heap allocation. > > > >> >> From: Patrick Angeles <[email protected] <javascript:_e({}, 'cvml', >> '[email protected]');>> >> Reply-To: "[email protected] <javascript:_e({}, 'cvml', >> '[email protected]');>" <[email protected] <javascript:_e({}, >> 'cvml', '[email protected]');>> >> Date: Thursday, October 11, 2012 12:36 PM >> To: "[email protected] <javascript:_e({}, 'cvml', >> '[email protected]');>" <[email protected] <javascript:_e({}, >> 'cvml', '[email protected]');>> >> Subject: Re: Why they recommend this (CPU) ? >> >> If you look at comparable Intel parts: >> >> Intel E5-2640 >> 6 cores @ 2.5 Ghz >> 95W - $885 >> >> Intel E5-2650 >> 8 cores @ 2.0 Ghz >> 95W - $1107 >> >> So, for $400 more on a dual proc system -- which really isn't much -- >> you get 2 more cores for a 20% drop in speed. I can believe that for some >> scenarios, the faster cores would fare better. Gzip compression is one that >> comes to mind, where you are aggressively trading CPU for lower storage >> volume and IO. An HBase cluster is another example. >> >> On Thu, Oct 11, 2012 at 3:03 PM, Russell Jurney >> <[email protected]<javascript:_e({}, 'cvml', >> '[email protected]');> >> > wrote: >> >>> My own clusters are too temporary and virtual for me to notice. I >>> haven't thought of clock speed as having mattered in a long time, so I'm >>> curious what kind of use cases might benefit from faster cores. Is there a >>> category in some way where this sweet spot for faster cores occurs? >>> >>> Russell Jurney http://datasyndrome.com >>> >>> On Oct 11, 2012, at 11:39 AM, Ted Dunning >>> <[email protected]<javascript:_e({}, 'cvml', '[email protected]');>> >>> wrote: >>> >>> You should measure your workload. Your experience will vary >>> dramatically with different computations. >>> >>> On Thu, Oct 11, 2012 at 10:56 AM, Russell Jurney < >>> [email protected] <javascript:_e({}, 'cvml', >>> '[email protected]');>> wrote: >>> >>>> Anyone got data on this? This is interesting, and somewhat >>>> counter-intuitive. >>>> >>>> Russell Jurney http://datasyndrome.com >>>> >>>> On Oct 11, 2012, at 10:47 AM, Jay Vyas >>>> <[email protected]<javascript:_e({}, 'cvml', '[email protected]');>> >>>> wrote: >>>> >>>> > Presumably, if you have a reasonable number of cores - speeding the >>>> cores up will be better than forking a task into smaller and smaller chunks >>>> - because at some point the overhead of multiple processes would be a >>>> bottleneck - maybe due to streaming reads and writes? I'm sure each and >>>> every problem has a different sweet spot. >>>> >>> >>> >> > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
