Generally a single hadoop machine will perform worse then a single mysql machine. People normally use hadoop when they have so much data it won't really fit on a single machine and it would require specialized hardware (Stuff like SAN's) to run. 30GB of data really isn't that much and 2GB of ram is really not what hadoop is designed to work on. It really likes to have lots of memory. I also don't see the hadoop configuration files so perhaps you only have 1 mapper and 1 reducer. But this is not a typical use-case so I doubt you'll see snappy performance after tweaking the configs.
From: Gobinda Paul [mailto:[email protected]] Sent: Tuesday, March 12, 2013 10:10 AM To: [email protected] Subject: Getting Slow Query Performance! i use sqoop to import 30GB data ( two table employee(aprox 21 GB) and salary(aprox 9GB ) into hadoop(Single Node) via hive. i run a sample query like SELECT EMPLOYEE.ID,EMPLOYEE.NAME,EMPLOYEE.DEPT,SALARY.AMOUNT FROM EMPLOYEE JOIN SALARY WHERE EMPLOYEE.ID=SALARY.EMPLOYEE_ID AND SALARY.AMOUNT>900000; In Hive it's take 15 Min(aprox.) where as mySQL take 4.5 min( aprox ) to execute that query . CPU: Pentium(R) Dual-Core CPU E5700 @ 3.00GHz RAM: 2GB HDD: 500GB Here IS My hive-site.xml conf. <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> <property> <name>hive.hwi.listen.host</name> <value>0.0.0.0</value> <description>This is the host address the Hive Web Interface will listen on</description> </property> <property> <name>hive.hwi.listen.port</name> <value>9999</value> <description>This is the port the Hive Web Interface will listen on</description> </property> <property> <name>hive.hwi.war.file</name> <value>/lib/hive-hwi-0.9.0.war</value> <description>This is the WAR file with the jsp content for Hive Web Interface</description> </property> <property> <name>mapred.reduce.tasks</name> <value>-1</value> <description>The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers. </description> </property> <property> <name>hive.exec.reducers.bytes.per.reducer</name> <value>1000000000</value> <description>size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.</description> </property> <property> <name>hive.exec.reducers.max</name> <value>999</value> <description>max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is negative, hive will use this one as the max number of reducers when automatically determine number of reducers. </description> </property> <property> <name>hive.exec.scratchdir</name> <value>/tmp/hive-${user.name}</value> <description>Scratch space for Hive jobs</description> </property> <property> <name>hive.metastore.local</name> <value>true</value> </property> </configuration> Any IDEA ??
