Hi Xiaobo, For Hadoop config questions such as these, you'll probably get more input on the Hadoop user list.
-- Ken On Jun 25, 2011, at 12:58am, Xiaobo Gu wrote: > our server is atteched to a local disk array via 4 6Gb sas chanels ,we > can see 2 GB read and 1 GB write performance ,how many data nodes is > suitable ? > > 2011/6/25, Sean Owen <[email protected]>: >> There's that. There's also the fact that a 32-way machine almost certainly >> doesn't have 32 times the I/O bandwidth, let alone 32 times faster seek >> latency. (That is, it doesn't have 32 disks.) For a lof these kinds of jobs >> you could end up with an I/O bottleneck. >> >> Speaking of AWS and EMR, I find that I/O bottleneck is by far the issue >> there. I spread my jobs there as far across instances and racks as possible >> just to try to steal more little machine's I/O seeks! >> >> On Sat, Jun 25, 2011 at 3:17 AM, edwin <[email protected]> wrote: >> >>> Hi Ted, >>> I'm wondering for "isn't going to work well", you refer to inevitable >>> unnecessary hadoop overhead running on a single machine or there are other >>> implications to run big jobs on a single machine? >>> >>> - edwin >>> >>> On Jun 24, 2011, at 7:11 PM, Ted Dunning wrote: >>> >>>> I have done this with VM's but I would not generally recommend it. >>> Without >>>> VM's you will have a pretty ugly configuration issue because Hadoop >>> usually >>>> assumes it owns the machine. >>>> >>>> Besides, this is a seriously square peg into a round hole kind of >>>> problem >>>> here. Hadoop (map-reduce) was designed so that you could use several >>> little >>>> machines instead of one big one. It just isn't going to work well on a >>>> single computer. >>>> >>>> On Fri, Jun 24, 2011 at 6:49 PM, XiaoboGu <[email protected]> >>> wrote: >>>> >>>>> Do you have any experience in running multiple data nodes and task >>>>> trackers on a single SMP server. >>>>> >>>>>> -----Original Message----- >>>>>> From: Ted Dunning [mailto:[email protected]] >>>>>> Sent: Saturday, June 25, 2011 9:26 AM >>>>>> To: [email protected] >>>>>> Cc: [email protected] >>>>>> Subject: Re: Can all the algorithms in Mahout be run locally without a >>>>> Hadoop cluster. >>>>>> >>>>>> Pretty big. SHould scream for local classifier learning. >>>>>> >>>>>> Local Hadoop should run pretty fast as well. >>>>>> >>>>>> On Fri, Jun 24, 2011 at 5:54 PM, XiaoboGu <[email protected]> >>>>> wrote: >>>>>> >>>>>>> 32Core, 256G RAM >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Ted Dunning [mailto:[email protected]] >>>>>>>> Sent: Saturday, June 25, 2011 1:37 AM >>>>>>>> To: [email protected] >>>>>>>> Cc: [email protected] >>>>>>>> Subject: Re: Can all the algorithms in Mahout be run locally without >>>>> a >>>>>>> Hadoop cluster. >>>>>>>> >>>>>>>> Big iron is fine for some of the classifier stuff, but throughput >>>>>>>> per >>>>> $ >>>>>>> can >>>>>>>> be higher for other algorithms with a cluster of smaller machines. >>>>>>>> >>>>>>>> How big a machine are you talking about? Even relatively small >>>>> machines >>>>>>> are >>>>>>>> pretty massive any more. 8 core = 16 hyper-thread machines with >>>>>>>> 48GB >>>>>>> seem >>>>>>>> to be not even very impressive any more. >>>>>>>> >>>>>>>> On Fri, Jun 24, 2011 at 1:47 AM, XiaoboGu <[email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>>> We will put a big SMP server to deploy Mahout. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Xiaobo Gu >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> >> -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com custom data mining solutions
