Well it's probably worth  to know 30G is really hitting rock bottom when you 
talk about big data. Hadoop is linearly scalable so probably going to 3 or 4 
similar machines could get you below the mysql time but it's hardly a fair 
comparison.
Setting it up I would suggest reading the hadoop docs: 
http://hadoop.apache.org/docs/current/
These hardware specs give you an idea why it's an unusual case: 
http://hortonworks.com/blog/best-practices-for-selecting-apache-hadoop-hardware/

To give you some hints. Each node needs to be configure on how much resources 
it's allowed to take. This is a balance between several parameters:
mapred.tasktracker.map.tasks.maximum, mapred.tasktracker.reduce.tasks.maximum, 
mapred.child.java.opts
There are tons more configurations but this is where you start. Different 
hardware and different jobs require different configurations so try it out.
Since you are extremely tight on ram you probably want to reduce memory usage 
on most processes like the namenode/jobtracker/hive and on each node drop the 
memory requirements for tasktracker/datanode.
Also don't put your nodes on 100MB links they are almost always to slow.

Bennie.

From: Gobinda Paul [mailto:[email protected]]
Sent: Tuesday, March 12, 2013 11:01 AM
To: [email protected]
Subject: RE: Getting Slow Query Performance!


Thnx for your reply , i am new to hadoop and hive .My goal is to process a big 
data using hadoop,
this is my university project ( Data Mining ) ,need to show that hadoop is 
better than mysql in case
of Big data(30-100GB+) Processing,i know hadoop does that.To do so,can you 
please suggest me,
how many node is required to show the performance  and what type of 
configuration is required for each node.


From: [email protected]<mailto:[email protected]>
To: [email protected]<mailto:[email protected]>
CC: [email protected]<mailto:[email protected]>
Date: Tue, 12 Mar 2013 10:40:33 +0100
Subject: RE: Getting Slow Query Performance!
Generally a single hadoop machine will perform worse then a single mysql 
machine. People normally use hadoop when they have so much data it won't really 
fit on a single machine and it would require specialized hardware (Stuff like 
SAN's) to run.
30GB of data really isn't that much and 2GB of ram is really not what hadoop is 
designed to work on. It really likes to have lots of memory.
I also don't see the hadoop configuration files so perhaps you only have 1 
mapper and 1 reducer. But this is not a typical use-case so I doubt you'll see 
snappy performance after tweaking the configs.


Reply via email to