Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

Ken Krugler Sat, 25 Jun 2011 12:15:37 -0700

Hi Xiaobo,

For Hadoop config questions such as these, you'll probably get more input on 
the Hadoop user list.


-- Ken

On Jun 25, 2011, at 12:58am, Xiaobo Gu wrote:

> our server is atteched to a local disk array via 4 6Gb sas chanels ,we
> can see 2 GB read and 1 GB write performance ,how many data nodes is
> suitable ?
> 
> 2011/6/25, Sean Owen <[email protected]>:
>> There's that. There's also the fact that a 32-way machine almost certainly
>> doesn't have 32 times the I/O bandwidth, let alone 32 times faster seek
>> latency. (That is, it doesn't have 32 disks.) For a lof these kinds of jobs
>> you could end up with an I/O bottleneck.
>> 
>> Speaking of AWS and EMR, I find that I/O bottleneck is by far the issue
>> there. I spread my jobs there as far across instances and racks as possible
>> just to try to steal more little machine's I/O seeks!
>> 
>> On Sat, Jun 25, 2011 at 3:17 AM, edwin <[email protected]> wrote:
>> 
>>> Hi Ted,
>>> I'm wondering for "isn't going to work well", you refer to inevitable
>>> unnecessary hadoop overhead running on a single machine or there are other
>>> implications to run big jobs on a single machine?
>>> 
>>> - edwin
>>> 
>>> On Jun 24, 2011, at 7:11 PM, Ted Dunning wrote:
>>> 
>>>> I have done this with VM's but I would not generally recommend it.
>>> Without
>>>> VM's you will have a pretty ugly configuration issue because Hadoop
>>> usually
>>>> assumes it owns the machine.
>>>> 
>>>> Besides, this is a seriously square peg into a round hole kind of
>>>> problem
>>>> here.  Hadoop (map-reduce) was designed so that you could use several
>>> little
>>>> machines instead of one big one.  It just isn't going to work well on a
>>>> single computer.
>>>> 
>>>> On Fri, Jun 24, 2011 at 6:49 PM, XiaoboGu <[email protected]>
>>> wrote:
>>>> 
>>>>> Do you have any experience  in running multiple data nodes and task
>>>>> trackers on a single SMP server.
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Ted Dunning [mailto:[email protected]]
>>>>>> Sent: Saturday, June 25, 2011 9:26 AM
>>>>>> To: [email protected]
>>>>>> Cc: [email protected]
>>>>>> Subject: Re: Can all the algorithms in Mahout be run locally without a
>>>>> Hadoop cluster.
>>>>>> 
>>>>>> Pretty big.  SHould scream for local classifier learning.
>>>>>> 
>>>>>> Local Hadoop should run pretty fast as well.
>>>>>> 
>>>>>> On Fri, Jun 24, 2011 at 5:54 PM, XiaoboGu <[email protected]>
>>>>> wrote:
>>>>>> 
>>>>>>> 32Core, 256G RAM
>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ted Dunning [mailto:[email protected]]
>>>>>>>> Sent: Saturday, June 25, 2011 1:37 AM
>>>>>>>> To: [email protected]
>>>>>>>> Cc: [email protected]
>>>>>>>> Subject: Re: Can all the algorithms in Mahout be run locally without
>>>>> a
>>>>>>> Hadoop cluster.
>>>>>>>> 
>>>>>>>> Big iron is fine for some of the classifier stuff, but throughput
>>>>>>>> per
>>>>> $
>>>>>>> can
>>>>>>>> be higher for other algorithms with a cluster of smaller machines.
>>>>>>>> 
>>>>>>>> How big a machine are you talking about?  Even relatively small
>>>>> machines
>>>>>>> are
>>>>>>>> pretty massive any more.  8 core = 16 hyper-thread machines with
>>>>>>>> 48GB
>>>>>>> seem
>>>>>>>> to be not even very impressive any more.
>>>>>>>> 
>>>>>>>> On Fri, Jun 24, 2011 at 1:47 AM, XiaoboGu <[email protected]>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> We will put a big SMP server to deploy Mahout.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> 
>>>>>>>>> Xiaobo Gu
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom data mining solutions

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

Reply via email to