Hi Stack, I decompiled the ImportTsv class and added some sysout statements in main() to figure out the problem. Please find the modified class here: http://pastebin.com/sKQcMXe4
With help of Keshav, i got to know that csv import works fine when i provide "-Dimporttsv.separator=," as first commandline parameter after specifying the classname. Here is the command and console log of the successful import of csv file: sudo -u hdfs hadoop jar /usr/lib/hadoop/importdata.jar com.intuit.ihub.hbase.poc.ImportData -Dimporttsv.separator=, -Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city testload /temp/csv -Dimporttsv.skip.bad.lines=true Command line Arguments::-Dimporttsv.separator=, Command line Arguments::-Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city Command line Arguments::testload Command line Arguments::/temp/csv Command line Arguments::-Dimporttsv.skip.bad.lines=true OtherArguments==>testload OtherArguments==>/temp/csv OtherArguments==>-D OtherArguments==>importtsv.skip.bad.lines=true SEPARATOR as per jobconf:, 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:host.name =ihub-namenode1 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_20 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_20/jre 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/lib/hadoop-0.20/conf:/usr/java/jdk1.6.0_20/jre//lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r06.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/hbase.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/zookeeper.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hadoop/lib:/usr/lib/hbase/lib:/usr/lib/sqoop/lib:/etc/hbase/conf 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_20/jre/lib/amd64/server:/usr/java/jdk1.6.0_20/jre/lib/amd64:/usr/java/jdk1.6.0_20/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-71.el6.x86_64 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:user.name =hdfs 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:user.home=/usr/lib/hadoop-0.20 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root 12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ihub-jobtracker1:2181 sessionTimeout=180000 watcher=hconnection 12/03/07 10:01:33 INFO zookeeper.ClientCnxn: Opening socket connection to server ihub-jobtracker1/192.168.1.98:2181 12/03/07 10:01:33 INFO zookeeper.ClientCnxn: Socket connection established to ihub-jobtracker1/192.168.1.98:2181, initiating session 12/03/07 10:01:33 INFO zookeeper.ClientCnxn: Session establishment complete on server ihub-jobtracker1/192.168.1.98:2181, sessionid = 0x135d53c669a00ab, negotiated timeout = 40000 12/03/07 10:01:33 INFO mapreduce.TableOutputFormat: Created table instance for testload 12/03/07 10:01:33 INFO input.FileInputFormat: Total input paths to process : 1 12/03/07 10:01:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 12/03/07 10:01:33 WARN snappy.LoadSnappy: Snappy native library not loaded 12/03/07 10:01:34 INFO mapred.JobClient: Running job: job_201203021306_0028 12/03/07 10:01:35 INFO mapred.JobClient: map 0% reduce 0% 12/03/07 10:01:40 INFO mapred.JobClient: map 100% reduce 0% 12/03/07 10:01:41 INFO mapred.JobClient: Job complete: job_201203021306_0028 12/03/07 10:01:41 INFO mapred.JobClient: Counters: 13 12/03/07 10:01:41 INFO mapred.JobClient: Job Counters 12/03/07 10:01:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5177 12/03/07 10:01:41 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/03/07 10:01:41 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/03/07 10:01:41 INFO mapred.JobClient: Launched map tasks=1 12/03/07 10:01:41 INFO mapred.JobClient: Data-local map tasks=1 12/03/07 10:01:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/03/07 10:01:41 INFO mapred.JobClient: FileSystemCounters 12/03/07 10:01:41 INFO mapred.JobClient: HDFS_BYTES_READ=160 12/03/07 10:01:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=61534 12/03/07 10:01:41 INFO mapred.JobClient: ImportData 12/03/07 10:01:41 INFO mapred.JobClient: Bad Lines=0 12/03/07 10:01:41 INFO mapred.JobClient: Map-Reduce Framework 12/03/07 10:01:41 INFO mapred.JobClient: Map input records=8 12/03/07 10:01:41 INFO mapred.JobClient: Spilled Records=0 12/03/07 10:01:41 INFO mapred.JobClient: Map output records=8 12/03/07 10:01:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=104 If i run the last commnand and put the "-Dimporttsv.separator=," arguments at the end then the Separator in the jobconf wont be set and import will failbecause the default Separator is tab character. Here is the modified command and output of it: sudo -u hdfs hadoop jar /usr/lib/hadoop/importdata.jar com.intuit.ihub.hbase.poc.ImportData -Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city testload /temp/csv -Dimporttsv.skip.bad.lines=true -Dimporttsv.separator=, Command line Arguments::-Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city Command line Arguments::testload Command line Arguments::/temp/csv Command line Arguments::-Dimporttsv.skip.bad.lines=true Command line Arguments::-Dimporttsv.separator=, OtherArguments==>testload OtherArguments==>/temp/csv OtherArguments==>-D OtherArguments==>importtsv.skip.bad.lines=true OtherArguments==>-D OtherArguments==>importtsv.separator=, SEPARATOR as per jobconf:null 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:host.name =ihub-namenode1 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_20 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_20/jre 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/lib/hadoop-0.20/conf:/usr/java/jdk1.6.0_20/jre//lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r06.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/hbase.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/zookeeper.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hadoop/lib:/usr/lib/hbase/lib:/usr/lib/sqoop/lib:/etc/hbase/conf 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_20/jre/lib/amd64/server:/usr/java/jdk1.6.0_20/jre/lib/amd64:/usr/java/jdk1.6.0_20/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-71.el6.x86_64 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:user.name =hdfs 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:user.home=/usr/lib/hadoop-0.20 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root 12/03/07 10:02:17 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ihub-jobtracker1:2181 sessionTimeout=180000 watcher=hconnection 12/03/07 10:02:17 INFO zookeeper.ClientCnxn: Opening socket connection to server ihub-jobtracker1/192.168.1.98:2181 12/03/07 10:02:17 INFO zookeeper.ClientCnxn: Socket connection established to ihub-jobtracker1/192.168.1.98:2181, initiating session 12/03/07 10:02:17 INFO zookeeper.ClientCnxn: Session establishment complete on server ihub-jobtracker1/192.168.1.98:2181, sessionid = 0x135d53c669a00af, negotiated timeout = 40000 12/03/07 10:02:18 INFO mapreduce.TableOutputFormat: Created table instance for testload 12/03/07 10:02:18 INFO input.FileInputFormat: Total input paths to process : 1 12/03/07 10:02:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 12/03/07 10:02:18 WARN snappy.LoadSnappy: Snappy native library not loaded 12/03/07 10:02:18 INFO mapred.JobClient: Running job: job_201203021306_0029 12/03/07 10:02:19 INFO mapred.JobClient: map 0% reduce 0% 12/03/07 10:02:25 INFO mapred.JobClient: map 100% reduce 0% 12/03/07 10:02:27 INFO mapred.JobClient: Job complete: job_201203021306_0029 12/03/07 10:02:27 INFO mapred.JobClient: Counters: 13 12/03/07 10:02:27 INFO mapred.JobClient: Job Counters 12/03/07 10:02:27 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=7515 12/03/07 10:02:27 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/03/07 10:02:27 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/03/07 10:02:27 INFO mapred.JobClient: Launched map tasks=1 12/03/07 10:02:27 INFO mapred.JobClient: Data-local map tasks=1 12/03/07 10:02:27 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/03/07 10:02:27 INFO mapred.JobClient: FileSystemCounters 12/03/07 10:02:27 INFO mapred.JobClient: HDFS_BYTES_READ=160 12/03/07 10:02:27 INFO mapred.JobClient: FILE_BYTES_WRITTEN=61370 12/03/07 10:02:27 INFO mapred.JobClient: ImportData 12/03/07 10:02:27 INFO mapred.JobClient: Bad Lines=8 12/03/07 10:02:27 INFO mapred.JobClient: Map-Reduce Framework 12/03/07 10:02:27 INFO mapred.JobClient: Map input records=8 12/03/07 10:02:27 INFO mapred.JobClient: Spilled Records=0 12/03/07 10:02:27 INFO mapred.JobClient: Map output records=0 12/03/07 10:02:27 INFO mapred.JobClient: SPLIT_RAW_BYTES=104 I tried to analyze the problem and as per my analysis there is a problem with "String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();" on line#102. Let me know you views. On Mon, Mar 5, 2012 at 5:06 PM, Shrijeet Paliwal <[email protected]>wrote: > Anil, > Stack meant adding debug statements yourself in tool. > > -Shrijeet > > On Mon, Mar 5, 2012 at 4:54 PM, anil gupta <[email protected]> wrote: > > > Hi St.Ack, > > > > Thanks for the response. Both the tsv and csv are UTF-8 file. Could you > > please let me know how to run bulk loading in Debug mode? I dont know of > > any hadoop option which can run a job in Debug mode. > > > > Thanks, > > Anil > > > > On Mon, Mar 5, 2012 at 2:58 PM, Stack <[email protected]> wrote: > > > > > On Mon, Mar 5, 2012 at 11:48 AM, anil gupta <[email protected]> > > wrote: > > > > I am getting a "Bad line at offset" error in Stderr log of tasks > while > > > > testing bulk loading a CSV file into HBase. I am using cdh3u2. Import > > of > > > a > > > > TSV works fine. > > > > > > > > > > Its your encoding of the tsv and csv or its a problem w/ the parsing > > > code in importtsv tool. Can you figure which it is? Can you add a > > > bit of debug for the next time you run the job? > > > > > > Thanks, > > > St.Ack > > > > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
