our server is atteched to a local disk array via 4 6Gb sas chanels ,we can see 2 GB read and 1 GB write performance ,how many data nodes is suitable ?
2011/6/25, Sean Owen <[email protected]>: > There's that. There's also the fact that a 32-way machine almost certainly > doesn't have 32 times the I/O bandwidth, let alone 32 times faster seek > latency. (That is, it doesn't have 32 disks.) For a lof these kinds of jobs > you could end up with an I/O bottleneck. > > Speaking of AWS and EMR, I find that I/O bottleneck is by far the issue > there. I spread my jobs there as far across instances and racks as possible > just to try to steal more little machine's I/O seeks! > > On Sat, Jun 25, 2011 at 3:17 AM, edwin <[email protected]> wrote: > >> Hi Ted, >> I'm wondering for "isn't going to work well", you refer to inevitable >> unnecessary hadoop overhead running on a single machine or there are other >> implications to run big jobs on a single machine? >> >> - edwin >> >> On Jun 24, 2011, at 7:11 PM, Ted Dunning wrote: >> >> > I have done this with VM's but I would not generally recommend it. >> Without >> > VM's you will have a pretty ugly configuration issue because Hadoop >> usually >> > assumes it owns the machine. >> > >> > Besides, this is a seriously square peg into a round hole kind of >> > problem >> > here. Hadoop (map-reduce) was designed so that you could use several >> little >> > machines instead of one big one. It just isn't going to work well on a >> > single computer. >> > >> > On Fri, Jun 24, 2011 at 6:49 PM, XiaoboGu <[email protected]> >> wrote: >> > >> >> Do you have any experience in running multiple data nodes and task >> >> trackers on a single SMP server. >> >> >> >>> -----Original Message----- >> >>> From: Ted Dunning [mailto:[email protected]] >> >>> Sent: Saturday, June 25, 2011 9:26 AM >> >>> To: [email protected] >> >>> Cc: [email protected] >> >>> Subject: Re: Can all the algorithms in Mahout be run locally without a >> >> Hadoop cluster. >> >>> >> >>> Pretty big. SHould scream for local classifier learning. >> >>> >> >>> Local Hadoop should run pretty fast as well. >> >>> >> >>> On Fri, Jun 24, 2011 at 5:54 PM, XiaoboGu <[email protected]> >> >> wrote: >> >>> >> >>>> 32Core, 256G RAM >> >>>> >> >>>>> -----Original Message----- >> >>>>> From: Ted Dunning [mailto:[email protected]] >> >>>>> Sent: Saturday, June 25, 2011 1:37 AM >> >>>>> To: [email protected] >> >>>>> Cc: [email protected] >> >>>>> Subject: Re: Can all the algorithms in Mahout be run locally without >> >> a >> >>>> Hadoop cluster. >> >>>>> >> >>>>> Big iron is fine for some of the classifier stuff, but throughput >> >>>>> per >> >> $ >> >>>> can >> >>>>> be higher for other algorithms with a cluster of smaller machines. >> >>>>> >> >>>>> How big a machine are you talking about? Even relatively small >> >> machines >> >>>> are >> >>>>> pretty massive any more. 8 core = 16 hyper-thread machines with >> >>>>> 48GB >> >>>> seem >> >>>>> to be not even very impressive any more. >> >>>>> >> >>>>> On Fri, Jun 24, 2011 at 1:47 AM, XiaoboGu <[email protected]> >> >>>> wrote: >> >>>>> >> >>>>>> We will put a big SMP server to deploy Mahout. >> >>>>>> >> >>>>>> Regards, >> >>>>>> >> >>>>>> Xiaobo Gu >> >>>>>> >> >>>>>> >> >>>> >> >>>> >> >> >> >> >> >> >
