RE: Getting Slow Query Performance!

Bennie Schut Tue, 12 Mar 2013 02:41:07 -0700

Generally a single hadoop machine will perform worse then a single mysql 
machine. People normally use hadoop when they have so much data it won't really 
fit on a single machine and it would require specialized hardware (Stuff like 
SAN's) to run.
30GB of data really isn't that much and 2GB of ram is really not what hadoop is 
designed to work on. It really likes to have lots of memory.
I also don't see the hadoop configuration files so perhaps you only have 1 
mapper and 1 reducer. But this is not a typical use-case so I doubt you'll see 
snappy performance after tweaking the configs.


From: Gobinda Paul [mailto:[email protected]]
Sent: Tuesday, March 12, 2013 10:10 AM
To: [email protected]
Subject: Getting Slow Query Performance!



i use sqoop to import 30GB data ( two table employee(aprox 21 GB)  and 
salary(aprox 9GB ) into hadoop(Single Node) via hive.

i run a sample query like SELECT 
EMPLOYEE.ID,EMPLOYEE.NAME,EMPLOYEE.DEPT,SALARY.AMOUNT FROM EMPLOYEE JOIN SALARY 
WHERE EMPLOYEE.ID=SALARY.EMPLOYEE_ID AND SALARY.AMOUNT>900000;

In Hive it's take 15 Min(aprox.) where as mySQL take 4.5 min( aprox ) to 
execute that query .

CPU: Pentium(R) Dual-Core  CPU      E5700  @ 3.00GHz
RAM:  2GB
HDD: 500GB


Here IS My hive-site.xml conf.


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    
<value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
  </property>
  <property>
    <name>hive.hwi.listen.host</name>
     <value>0.0.0.0</value>
     <description>This is the host address the Hive Web Interface will listen 
on</description>
  </property>
  <property>
    <name>hive.hwi.listen.port</name>
    <value>9999</value>
    <description>This is the port the Hive Web Interface will listen 
on</description>
   </property>
   <property>
    <name>hive.hwi.war.file</name>
    <value>/lib/hive-hwi-0.9.0.war</value>
    <description>This is the WAR file with the jsp content for Hive Web 
Interface</description>
   </property>

  <property>
  <name>mapred.reduce.tasks</name>
    <value>-1</value>
            <description>The default number of reduce tasks per job.  Typically 
set
            to a prime close to the number of available hosts.  Ignored when
            mapred.job.tracker is "local". Hadoop set this to 1 by default, 
whereas hive uses -1 as its default value.
            By setting this property to -1, Hive will automatically figure out 
what should be the number of reducers.
            </description>
   </property>

   <property>
     <name>hive.exec.reducers.bytes.per.reducer</name>
     <value>1000000000</value>
     <description>size per reducer.The default is 1G, i.e if the input size is 
10G, it will use 10 reducers.</description>
   </property>


  <property>
    <name>hive.exec.reducers.max</name>
    <value>999</value>
        <description>max number of reducers will be used. If the one
            specified in the configuration parameter mapred.reduce.tasks is
            negative, hive will use this one as the max number of reducers when
            automatically determine number of reducers.
            </description>
   </property>

  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive-${user.name}</value>
    <description>Scratch space for Hive jobs</description>
  </property>

   <property>
     <name>hive.metastore.local</name>
     <value>true</value>
   </property>

</configuration>


Any IDEA ??

RE: Getting Slow Query Performance!

Reply via email to