Re: Hbase import Tsv performance (slow import)

Nick maillard Tue, 23 Oct 2012 10:14:00 -0700

Thanks for the help!

My conf files are : Hadoop:
hdfs-site


<configuration>
 <property>
  <name>dfs.replication</name>
  <value>3</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/home/runner/app/hadoop/dfs/data</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>
<property>
        <name>dfs.datanode.max.xcievers</name>
        <value>4096</value>
      </property>
</configuration>


Mapred-site.xml

<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>master:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>14</value>
  <description>The maximum number of map tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>14</value>
  <description>The maximum number of reduce tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>
<property>
<name>mapred.child.java.opts</name>
  <value>-Xmx400m</value>
  <description>Java opts for the task tracker child processes.
  The following symbol, if present, will be interpolated: @taskid@ is replaced
  by current TaskID. Any other occurrences of '@' will go unchanged.
  For example, to enable verbose gc logging to a file named for the taskid in
  /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
        -Xmx1024m -verbose:gc -Xloggc:/tmp/@[email protected]

  The configuration variable mapred.child.ulimit can be used to control the
  maximum virtual memory of the child processes.
  </description>
</property>
</configuration>


core-site.xml

<configuration>
 <property>
  <name>hadoop.tmp.dir</name>
  <value>/home/runner/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://master:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>


For Hbase:
hbase-site:
<configuration>
 <property>
    <name>hbase.rootdir</name>
    <value>hdfs://master:54310/hbase</value>
 </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    </description>
  </property>
<property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2222</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>ks25937.kimsufi.com</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/home/runner/hbase/hbase-0.94.2/tmp</value>
    </property>
</configuration>




I am currently running import and looking at the logs to try and understand
This seems definitely phishy:

2012-10-23 18:39:49,107 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201210231145_0010_m_000041_0 0.21332978%
2012-10-23 18:39:50,363 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201210231145_0010_m_000028_0 0.20936884%
2012-10-23 18:49:38,098 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201210231145_0010_m_000030_0: Task attempt_201210231145_0010_m_000030_0
failed to report status for 602 seconds. Killing!
2012-10-23 18:49:38,116 INFO org.apache.hadoop.mapred.TaskTracker: Process
Thread Dump: lost task
90 active threads
Thread 742 (process reaper):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.lang.UNIXProcess.waitForProcessExit(Native Method)
    java.lang.UNIXProcess.access$200(UNIXProcess.java:54)
    java.lang.UNIXProcess$3.run(UNIXProcess.java:174)
    
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    java.lang.Thread.run(Thread.java:722)
Thread 740 (process reaper):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.lang.UNIXProcess.waitForProcessExit(Native Method)
    java.lang.UNIXProcess.access$200(UNIXProcess.java:54)
    java.lang.UNIXProcess$3.run(UNIXProcess.java:174)

Re: Hbase import Tsv performance (slow import)

Reply via email to