Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

Xiaobo Gu Sat, 25 Jun 2011 00:58:45 -0700

our server is atteched to a local disk array via 4 6Gb sas chanels ,we
can see 2 GB read and 1 GB write performance ,how many data nodes is
suitable ?


2011/6/25, Sean Owen <[email protected]>:
> There's that. There's also the fact that a 32-way machine almost certainly
> doesn't have 32 times the I/O bandwidth, let alone 32 times faster seek
> latency. (That is, it doesn't have 32 disks.) For a lof these kinds of jobs
> you could end up with an I/O bottleneck.
>
> Speaking of AWS and EMR, I find that I/O bottleneck is by far the issue
> there. I spread my jobs there as far across instances and racks as possible
> just to try to steal more little machine's I/O seeks!
>
> On Sat, Jun 25, 2011 at 3:17 AM, edwin <[email protected]> wrote:
>
>> Hi Ted,
>> I'm wondering for "isn't going to work well", you refer to inevitable
>> unnecessary hadoop overhead running on a single machine or there are other
>> implications to run big jobs on a single machine?
>>
>> - edwin
>>
>> On Jun 24, 2011, at 7:11 PM, Ted Dunning wrote:
>>
>> > I have done this with VM's but I would not generally recommend it.
>>  Without
>> > VM's you will have a pretty ugly configuration issue because Hadoop
>> usually
>> > assumes it owns the machine.
>> >
>> > Besides, this is a seriously square peg into a round hole kind of
>> > problem
>> > here.  Hadoop (map-reduce) was designed so that you could use several
>> little
>> > machines instead of one big one.  It just isn't going to work well on a
>> > single computer.
>> >
>> > On Fri, Jun 24, 2011 at 6:49 PM, XiaoboGu <[email protected]>
>> wrote:
>> >
>> >> Do you have any experience  in running multiple data nodes and task
>> >> trackers on a single SMP server.
>> >>
>> >>> -----Original Message-----
>> >>> From: Ted Dunning [mailto:[email protected]]
>> >>> Sent: Saturday, June 25, 2011 9:26 AM
>> >>> To: [email protected]
>> >>> Cc: [email protected]
>> >>> Subject: Re: Can all the algorithms in Mahout be run locally without a
>> >> Hadoop cluster.
>> >>>
>> >>> Pretty big.  SHould scream for local classifier learning.
>> >>>
>> >>> Local Hadoop should run pretty fast as well.
>> >>>
>> >>> On Fri, Jun 24, 2011 at 5:54 PM, XiaoboGu <[email protected]>
>> >> wrote:
>> >>>
>> >>>> 32Core, 256G RAM
>> >>>>
>> >>>>> -----Original Message-----
>> >>>>> From: Ted Dunning [mailto:[email protected]]
>> >>>>> Sent: Saturday, June 25, 2011 1:37 AM
>> >>>>> To: [email protected]
>> >>>>> Cc: [email protected]
>> >>>>> Subject: Re: Can all the algorithms in Mahout be run locally without
>> >> a
>> >>>> Hadoop cluster.
>> >>>>>
>> >>>>> Big iron is fine for some of the classifier stuff, but throughput
>> >>>>> per
>> >> $
>> >>>> can
>> >>>>> be higher for other algorithms with a cluster of smaller machines.
>> >>>>>
>> >>>>> How big a machine are you talking about?  Even relatively small
>> >> machines
>> >>>> are
>> >>>>> pretty massive any more.  8 core = 16 hyper-thread machines with
>> >>>>> 48GB
>> >>>> seem
>> >>>>> to be not even very impressive any more.
>> >>>>>
>> >>>>> On Fri, Jun 24, 2011 at 1:47 AM, XiaoboGu <[email protected]>
>> >>>> wrote:
>> >>>>>
>> >>>>>> We will put a big SMP server to deploy Mahout.
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>>
>> >>>>>> Xiaobo Gu
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

Reply via email to