Glad to worked through that and everything is working. I will add an example of MR to Hbase-to-HDFS in the book.
On 11/14/11 1:24 AM, "Stuti Awasthi" <[email protected]> wrote: >Hi, >I think that issue is with Filesystem Configuration, as in config, it is >picking HbaseConfiguration. When I modified my output directory path to >absolute path of HDFS : >FileOutputFormat.setOutputPath(job, new >Path("hdfs://master:54310/MR/stuti3")); > >The MR jobs runs successfully and I am able to see stuti3 directory >inside HDFS at desired path. > > >-----Original Message----- >From: Stuti Awasthi >Sent: Monday, November 14, 2011 11:40 AM >To: [email protected] >Subject: RE: MR - Input from Hbase output to HDFS > >Hi Joey, >Thanks for pointing this. After importing "FileOutputFormat" as you >suggested, I am able to run MR job from eclipse (Windows) the only >problem is I am not able to see the output directory this code is >creating. HDFS and HBase are on Linux machine. > >Code : > Configuration config = HBaseConfiguration.create(); > config.set("hbase.zookeeper.quorum", "master"); > config.set("hbase.zookeeper.property.clientPort", "2181"); > > Job job = new Job(config, "Hbase_Read_Write"); > job.setJarByClass(ReadWriteDriver.class); > Scan scan = new Scan(); > scan.setCaching(500); > scan.setCacheBlocks(false); > TableMapReduceUtil.initTableMapperJob("users", >scan,ReadWriteMapper.class, Text.class, IntWritable.class, job); > job.setOutputFormatClass(TextOutputFormat.class); > FileOutputFormat.setOutputPath(job, new Path("/stuti2")); > >After executing this code, the MR jobs runs successfully but when I look >hdfs no directory is created "/stuti2". I also looked directory in local >filesystem of Linux machine as well as windows machine, but not able to >find the output folder anywhere. > >Eclipse console Output : >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:java.version=1.6.0_27 >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:java.vendor=Sun Microsystems Inc. >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\wor >kspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbase\M >RHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRHbas >eReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseReadWri >te\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\hbas >e-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D:\w >orkspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:java.library.path=C:\Program >Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32 >;C:\Windows;C:/Program Files/Java/jre6/bin/client;C:/Program >Files/Java/jre6/bin;C:/Program >Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\System3 >2\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program >Files\Java\jdk1.6.0_27;C:\Program >Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclipse;; >. >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\ >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:java.compiler=<NA> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:os.name=Windows 7 >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.arch=x86 >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:os.version=6.1 >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:user.name=stutiawasthi >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:user.home=C:\Users\stutiawasthi >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client connection, >connectString=master:2181 sessionTimeout=180000 watcher=hconnection >11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection to >server master/10.33.64.235:2181 >11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection >established to master/10.33.64.235:2181, initiating session >11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment >complete on server master/10.33.64.235:2181, sessionid = >0x33879243de00ec, negotiated timeout = 180000 >11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001 >11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection, >connectString=master:2181 sessionTimeout=180000 watcher=hconnection >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to >server master/10.33.64.235:2181 >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection >established to master/10.33.64.235:2181, initiating session >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment >complete on server master/10.33.64.235:2181, sessionid = >0x33879243de00ed, negotiated timeout = 180000 >11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection, >connectString=master:2181 sessionTimeout=180000 watcher=hconnection >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to >server master/10.33.64.235:2181 >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection >established to master/10.33.64.235:2181, initiating session >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment >complete on server master/10.33.64.235:2181, sessionid = >0x33879243de00ee, negotiated timeout = 180000 >11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100 >11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720 >11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680 >............................................... >11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0 >11/11/14 11:21:46 INFO mapred.TaskRunner: >Task:attempt_local_0001_m_000000_0 is done. And is in the process of >commiting >11/11/14 11:21:46 INFO mapred.LocalJobRunner: >11/11/14 11:21:46 INFO mapred.TaskRunner: Task >'attempt_local_0001_m_000000_0' done. >11/11/14 11:21:46 INFO mapred.LocalJobRunner: >11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments >11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with 1 >segments left of total size: 103 bytes >11/11/14 11:21:46 INFO mapred.LocalJobRunner: >11/11/14 11:21:46 INFO mapred.TaskRunner: >Task:attempt_local_0001_r_000000_0 is done. And is in the process of >commiting >11/11/14 11:21:46 INFO mapred.LocalJobRunner: >11/11/14 11:21:46 INFO mapred.TaskRunner: Task >attempt_local_0001_r_000000_0 is allowed to commit now >11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task >'attempt_local_0001_r_000000_0' to /stuti2 >11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce >11/11/14 11:21:46 INFO mapred.TaskRunner: Task >'attempt_local_0001_r_000000_0' done. >11/11/14 11:21:47 INFO mapred.JobClient: map 100% reduce 100% >11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001 >11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12 >11/11/14 11:21:47 INFO mapred.JobClient: FileSystemCounters >11/11/14 11:21:47 INFO mapred.JobClient: FILE_BYTES_READ=40923 >11/11/14 11:21:47 INFO mapred.JobClient: FILE_BYTES_WRITTEN=82343 >11/11/14 11:21:47 INFO mapred.JobClient: Map-Reduce Framework >11/11/14 11:21:47 INFO mapred.JobClient: Reduce input groups=5 >11/11/14 11:21:47 INFO mapred.JobClient: Combine output records=0 >11/11/14 11:21:47 INFO mapred.JobClient: Map input records=5 >11/11/14 11:21:47 INFO mapred.JobClient: Reduce shuffle bytes=0 >11/11/14 11:21:47 INFO mapred.JobClient: Reduce output records=5 >11/11/14 11:21:47 INFO mapred.JobClient: Spilled Records=10 >11/11/14 11:21:47 INFO mapred.JobClient: Map output bytes=91 >11/11/14 11:21:47 INFO mapred.JobClient: Combine input records=0 >11/11/14 11:21:47 INFO mapred.JobClient: Map output records=5 >11/11/14 11:21:47 INFO mapred.JobClient: Reduce input records=5 > > >Please Suggest > >-----Original Message----- >From: Joey Echeverria [mailto:[email protected]] >Sent: Friday, November 11, 2011 10:38 PM >To: [email protected] >Subject: Re: MR - Input from Hbase output to HDFS > >There are two APIs (old and new), and you appear to be mixing them. >TableMapReduceUtil only works with the new API. The solution is to import >the new version of FileOutputFormat which takes a Job: > > >import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; > >-Joey > >On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <[email protected]> >wrote: >> The method " setOutputPath (JobConf,Path)" take JobConf as a parameter >>not the Job object. >> At least this is the error Im getting while compiling with Hadoop >>0.20.2 jar with eclipse. >> >> FileOutputFormat.setOutputPath(conf, new Path("/output")); >> >> -----Original Message----- >> From: Prashant Sharma [mailto:[email protected]] >> Sent: Friday, November 11, 2011 11:20 AM >> To: [email protected] >> Subject: Re: MR - Input from Hbase output to HDFS >> >> Hi stuti, >> I was wondering why you are not using job object to set output path >>like this. >> >> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") ); >> >> >> thanks >> >> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi >><[email protected]>wrote: >> >>> Hi Andrie, >>> Well I am bit confused. When I use Jobconf , and associate with >>> JobClient to run the job then I get the error that "Input directory is >>>not set". >>> Since I want my input to be taken by Hbase table which I already >>> configured with "TableMapReduceUtil.initTableMapperJob". I don't want >>> to set input directory via jobconf. >>> How to mix these 2 so that I can get input from Hbase and write ouput >>> to HDFS. >>> >>> Thanks >>> >>> -----Original Message----- >>> From: Andrei Cojocaru [mailto:[email protected]] >>> Sent: Thursday, November 10, 2011 7:09 PM >>> To: [email protected] >>> Subject: Re: MR - Input from Hbase output to HDFS >>> >>> Stuti, >>> >>> I don't see you associating JobConf with Job anywhere. >>> -Andrei >>> >>> ::DISCLAIMER:: >>> >>> --------------------------------------------------------------------- >>> - >>> ------------------------------------------------- >>> >>> The contents of this e-mail and any attachment(s) are confidential >>> and intended for the named recipient(s) only. >>> It shall not attach any liability on the originator or HCL or its >>> affiliates. Any views or opinions presented in this email are solely >>> those of the author and may not necessarily reflect the opinions of >>> HCL or its affiliates. >>> Any form of reproduction, dissemination, copying, disclosure, >>> modification, distribution and / or publication of this message >>> without the prior written consent of the author of this e-mail is >>> strictly prohibited. If you have received this email in error please >>> delete it and notify the sender immediately. Before opening any mail >>> and attachments please check them for viruses and defect. >>> >>> >>> --------------------------------------------------------------------- >>> - >>> ------------------------------------------------- >>> >> > > > >-- >Joseph Echeverria >Cloudera, Inc. >443.305.9434 >
