There's that. There's also the fact that a 32-way machine almost certainly doesn't have 32 times the I/O bandwidth, let alone 32 times faster seek latency. (That is, it doesn't have 32 disks.) For a lof these kinds of jobs you could end up with an I/O bottleneck.
Speaking of AWS and EMR, I find that I/O bottleneck is by far the issue there. I spread my jobs there as far across instances and racks as possible just to try to steal more little machine's I/O seeks! On Sat, Jun 25, 2011 at 3:17 AM, edwin <[email protected]> wrote: > Hi Ted, > I'm wondering for "isn't going to work well", you refer to inevitable > unnecessary hadoop overhead running on a single machine or there are other > implications to run big jobs on a single machine? > > - edwin > > On Jun 24, 2011, at 7:11 PM, Ted Dunning wrote: > > > I have done this with VM's but I would not generally recommend it. > Without > > VM's you will have a pretty ugly configuration issue because Hadoop > usually > > assumes it owns the machine. > > > > Besides, this is a seriously square peg into a round hole kind of problem > > here. Hadoop (map-reduce) was designed so that you could use several > little > > machines instead of one big one. It just isn't going to work well on a > > single computer. > > > > On Fri, Jun 24, 2011 at 6:49 PM, XiaoboGu <[email protected]> > wrote: > > > >> Do you have any experience in running multiple data nodes and task > >> trackers on a single SMP server. > >> > >>> -----Original Message----- > >>> From: Ted Dunning [mailto:[email protected]] > >>> Sent: Saturday, June 25, 2011 9:26 AM > >>> To: [email protected] > >>> Cc: [email protected] > >>> Subject: Re: Can all the algorithms in Mahout be run locally without a > >> Hadoop cluster. > >>> > >>> Pretty big. SHould scream for local classifier learning. > >>> > >>> Local Hadoop should run pretty fast as well. > >>> > >>> On Fri, Jun 24, 2011 at 5:54 PM, XiaoboGu <[email protected]> > >> wrote: > >>> > >>>> 32Core, 256G RAM > >>>> > >>>>> -----Original Message----- > >>>>> From: Ted Dunning [mailto:[email protected]] > >>>>> Sent: Saturday, June 25, 2011 1:37 AM > >>>>> To: [email protected] > >>>>> Cc: [email protected] > >>>>> Subject: Re: Can all the algorithms in Mahout be run locally without > >> a > >>>> Hadoop cluster. > >>>>> > >>>>> Big iron is fine for some of the classifier stuff, but throughput per > >> $ > >>>> can > >>>>> be higher for other algorithms with a cluster of smaller machines. > >>>>> > >>>>> How big a machine are you talking about? Even relatively small > >> machines > >>>> are > >>>>> pretty massive any more. 8 core = 16 hyper-thread machines with 48GB > >>>> seem > >>>>> to be not even very impressive any more. > >>>>> > >>>>> On Fri, Jun 24, 2011 at 1:47 AM, XiaoboGu <[email protected]> > >>>> wrote: > >>>>> > >>>>>> We will put a big SMP server to deploy Mahout. > >>>>>> > >>>>>> Regards, > >>>>>> > >>>>>> Xiaobo Gu > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > >
