One issue is that 'big' becomes 'not so big' reasonably quickly. A couple of TeraBytes is not that challenging (depending on the algorithm) these days where as 5 years ago it was a big challenge. We have a bit over a PetaByte (not using Spark) and using a distributed system is the only viable way to get reasonable performance for reasonable cost
cheers On Tue, Mar 31, 2015 at 4:55 AM, Steve Loughran <ste...@hortonworks.com> wrote: > > On 30 Mar 2015, at 13:27, jay vyas <jayunit100.apa...@gmail.com> wrote: > > Just the same as spark was disrupting the hadoop ecosystem by changing > the assumption that "you can't rely on memory in distributed > analytics"...now maybe we are challenging the assumption that "big data > analytics need to distributed"? > > I've been asking the same question lately and seen similarly that spark > performs quite reliably and well on local single node system even for an > app which I ran for a streaming app which I ran for ten days in a row... I > almost felt guilty that I never put it on a cluster....! > > > Modern machines can be pretty powerful: 16 physical cores HT'd to 32, > 384+MB, GPU, giving you lots of compute. What you don't get is the storage > capacity to match, and especially, the IO bandwidth. RAID-0 striping 2-4 > HDDs gives you some boost, but if you are reading, say, a 4 GB file from > HDFS broken in to 256MB blocks, you have that data replicated into (4*4*3) > blocks: 48. Algorithm and capacity permitting, you've just massively > boosted your load time. Downstream, if data can be thinned down, then you > can start looking more at things you can do on a single host : a machine > that can be in your Hadoop cluster. Ask YARN nicely and you can get a > dedicated machine for a couple of days (i.e. until your Kerberos tokens > expire). > > -- *Franc Carter* I Systems Architect I RoZetta Technology [image: Description: Description: Description: cid:image003.jpg@01D02903.9B540580] L4. 55 Harrington Street, THE ROCKS, NSW, 2000 PO Box H58, Australia Square, Sydney NSW, 1215, AUSTRALIA *T* +61 2 8355 2515 I www.rozettatechnology.com [image: cid:image002.jpg@01D02903.0B41B280] DISCLAIMER: The contents of this email, inclusive of attachments, may be legally privileged and confidential. Any unauthorised use of the contents is expressly prohibited.