When using HBase, consider using the new API primarily. The mapred.* package upstream in Hadoop is not deprecated anymore, however.
On 22-Nov-2011, at 1:21 AM, Denis Kreis wrote: > Hi > > Is org.apache.hadoop.mapred.FileInputFormat to be considered > as obsolete/deprecated? > > Thanks! > > 2011/11/15 Stuti Awasthi <[email protected]> > >> Sure Doug, >> Thanks >> >> -----Original Message----- >> From: Doug Meil [mailto:[email protected]] >> Sent: Monday, November 14, 2011 9:08 PM >> To: [email protected] >> Subject: Re: MR - Input from Hbase output to HDFS >> >> >> Glad to worked through that and everything is working. I will add an >> example of MR to Hbase-to-HDFS in the book. >> >> >> >> >> >> On 11/14/11 1:24 AM, "Stuti Awasthi" <[email protected]> wrote: >> >>> Hi, >>> I think that issue is with Filesystem Configuration, as in config, it >>> is picking HbaseConfiguration. When I modified my output directory path >>> to absolute path of HDFS : >>> FileOutputFormat.setOutputPath(job, new >>> Path("hdfs://master:54310/MR/stuti3")); >>> >>> The MR jobs runs successfully and I am able to see stuti3 directory >>> inside HDFS at desired path. >>> >>> >>> -----Original Message----- >>> From: Stuti Awasthi >>> Sent: Monday, November 14, 2011 11:40 AM >>> To: [email protected] >>> Subject: RE: MR - Input from Hbase output to HDFS >>> >>> Hi Joey, >>> Thanks for pointing this. After importing "FileOutputFormat" as you >>> suggested, I am able to run MR job from eclipse (Windows) the only >>> problem is I am not able to see the output directory this code is >>> creating. HDFS and HBase are on Linux machine. >>> >>> Code : >>> Configuration config = HBaseConfiguration.create(); >>> config.set("hbase.zookeeper.quorum", "master"); >>> config.set("hbase.zookeeper.property.clientPort", "2181"); >>> >>> Job job = new Job(config, "Hbase_Read_Write"); >>> job.setJarByClass(ReadWriteDriver.class); >>> Scan scan = new Scan(); >>> scan.setCaching(500); >>> scan.setCacheBlocks(false); >>> TableMapReduceUtil.initTableMapperJob("users", >>> scan,ReadWriteMapper.class, Text.class, IntWritable.class, job); >>> job.setOutputFormatClass(TextOutputFormat.class); >>> FileOutputFormat.setOutputPath(job, new Path("/stuti2")); >>> >>> After executing this code, the MR jobs runs successfully but when I >>> look hdfs no directory is created "/stuti2". I also looked directory in >>> local filesystem of Linux machine as well as windows machine, but not >>> able to find the output folder anywhere. >>> >>> Eclipse console Output : >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:java.version=1.6.0_27 >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:java.vendor=Sun Microsystems Inc. >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\ >>> wor >>> kspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbas >>> e\M >>> RHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRH >>> bas >>> eReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseRead >>> Wri >>> te\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\h >>> bas >>> e-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D >>> :\w orkspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:java.library.path=C:\Program >>> Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\syste >>> m32 ;C:\Windows;C:/Program Files/Java/jre6/bin/client;C:/Program >>> Files/Java/jre6/bin;C:/Program >>> Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\Syst >>> em3 2\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program >>> Files\Java\jdk1.6.0_27;C:\Program >>> Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclips >>> e;; >>> . >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\ >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:java.compiler=<NA> >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:os.name=Windows 7 >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:os.arch=x86 >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:os.version=6.1 >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:user.name=stutiawasthi >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:user.home=C:\Users\stutiawasthi >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client >>> environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite >>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client >>> connection, >>> connectString=master:2181 sessionTimeout=180000 watcher=hconnection >>> 11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection >>> to server master/10.33.64.235:2181 >>> 11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection >>> established to master/10.33.64.235:2181, initiating session >>> 11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment >>> complete on server master/10.33.64.235:2181, sessionid = >>> 0x33879243de00ec, negotiated timeout = 180000 >>> 11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001 >>> 11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client >>> connection, >>> connectString=master:2181 sessionTimeout=180000 watcher=hconnection >>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection >>> to server master/10.33.64.235:2181 >>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection >>> established to master/10.33.64.235:2181, initiating session >>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment >>> complete on server master/10.33.64.235:2181, sessionid = >>> 0x33879243de00ed, negotiated timeout = 180000 >>> 11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client >>> connection, >>> connectString=master:2181 sessionTimeout=180000 watcher=hconnection >>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection >>> to server master/10.33.64.235:2181 >>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection >>> established to master/10.33.64.235:2181, initiating session >>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment >>> complete on server master/10.33.64.235:2181, sessionid = >>> 0x33879243de00ee, negotiated timeout = 180000 >>> 11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100 >>> 11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720 >>> 11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680 >>> ............................................... >>> 11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0 >>> 11/11/14 11:21:46 INFO mapred.TaskRunner: >>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of >>> commiting >>> 11/11/14 11:21:46 INFO mapred.LocalJobRunner: >>> 11/11/14 11:21:46 INFO mapred.TaskRunner: Task >>> 'attempt_local_0001_m_000000_0' done. >>> 11/11/14 11:21:46 INFO mapred.LocalJobRunner: >>> 11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments >>> 11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with >>> 1 segments left of total size: 103 bytes >>> 11/11/14 11:21:46 INFO mapred.LocalJobRunner: >>> 11/11/14 11:21:46 INFO mapred.TaskRunner: >>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of >>> commiting >>> 11/11/14 11:21:46 INFO mapred.LocalJobRunner: >>> 11/11/14 11:21:46 INFO mapred.TaskRunner: Task >>> attempt_local_0001_r_000000_0 is allowed to commit now >>> 11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task >>> 'attempt_local_0001_r_000000_0' to /stuti2 >>> 11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce >>> 11/11/14 11:21:46 INFO mapred.TaskRunner: Task >>> 'attempt_local_0001_r_000000_0' done. >>> 11/11/14 11:21:47 INFO mapred.JobClient: map 100% reduce 100% >>> 11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001 >>> 11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12 >>> 11/11/14 11:21:47 INFO mapred.JobClient: FileSystemCounters >>> 11/11/14 11:21:47 INFO mapred.JobClient: FILE_BYTES_READ=40923 >>> 11/11/14 11:21:47 INFO mapred.JobClient: FILE_BYTES_WRITTEN=82343 >>> 11/11/14 11:21:47 INFO mapred.JobClient: Map-Reduce Framework >>> 11/11/14 11:21:47 INFO mapred.JobClient: Reduce input groups=5 >>> 11/11/14 11:21:47 INFO mapred.JobClient: Combine output records=0 >>> 11/11/14 11:21:47 INFO mapred.JobClient: Map input records=5 >>> 11/11/14 11:21:47 INFO mapred.JobClient: Reduce shuffle bytes=0 >>> 11/11/14 11:21:47 INFO mapred.JobClient: Reduce output records=5 >>> 11/11/14 11:21:47 INFO mapred.JobClient: Spilled Records=10 >>> 11/11/14 11:21:47 INFO mapred.JobClient: Map output bytes=91 >>> 11/11/14 11:21:47 INFO mapred.JobClient: Combine input records=0 >>> 11/11/14 11:21:47 INFO mapred.JobClient: Map output records=5 >>> 11/11/14 11:21:47 INFO mapred.JobClient: Reduce input records=5 >>> >>> >>> Please Suggest >>> >>> -----Original Message----- >>> From: Joey Echeverria [mailto:[email protected]] >>> Sent: Friday, November 11, 2011 10:38 PM >>> To: [email protected] >>> Subject: Re: MR - Input from Hbase output to HDFS >>> >>> There are two APIs (old and new), and you appear to be mixing them. >>> TableMapReduceUtil only works with the new API. The solution is to >>> import the new version of FileOutputFormat which takes a Job: >>> >>> >>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; >>> >>> -Joey >>> >>> On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <[email protected]> >>> wrote: >>>> The method " setOutputPath (JobConf,Path)" take JobConf as a >>>> parameter not the Job object. >>>> At least this is the error Im getting while compiling with Hadoop >>>> 0.20.2 jar with eclipse. >>>> >>>> FileOutputFormat.setOutputPath(conf, new Path("/output")); >>>> >>>> -----Original Message----- >>>> From: Prashant Sharma [mailto:[email protected]] >>>> Sent: Friday, November 11, 2011 11:20 AM >>>> To: [email protected] >>>> Subject: Re: MR - Input from Hbase output to HDFS >>>> >>>> Hi stuti, >>>> I was wondering why you are not using job object to set output path >>>> like this. >>>> >>>> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") ); >>>> >>>> >>>> thanks >>>> >>>> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi >>>> <[email protected]>wrote: >>>> >>>>> Hi Andrie, >>>>> Well I am bit confused. When I use Jobconf , and associate with >>>>> JobClient to run the job then I get the error that "Input directory >>>>> is not set". >>>>> Since I want my input to be taken by Hbase table which I already >>>>> configured with "TableMapReduceUtil.initTableMapperJob". I don't want >>>>> to set input directory via jobconf. >>>>> How to mix these 2 so that I can get input from Hbase and write >>>>> ouput to HDFS. >>>>> >>>>> Thanks >>>>> >>>>> -----Original Message----- >>>>> From: Andrei Cojocaru [mailto:[email protected]] >>>>> Sent: Thursday, November 10, 2011 7:09 PM >>>>> To: [email protected] >>>>> Subject: Re: MR - Input from Hbase output to HDFS >>>>> >>>>> Stuti, >>>>> >>>>> I don't see you associating JobConf with Job anywhere. >>>>> -Andrei >>>>> >>>>> ::DISCLAIMER:: >>>>> >>>>> -------------------------------------------------------------------- >>>>> - >>>>> - >>>>> ------------------------------------------------- >>>>> >>>>> The contents of this e-mail and any attachment(s) are confidential >>>>> and intended for the named recipient(s) only. >>>>> It shall not attach any liability on the originator or HCL or its >>>>> affiliates. Any views or opinions presented in this email are solely >>>>> those of the author and may not necessarily reflect the opinions of >>>>> HCL or its affiliates. >>>>> Any form of reproduction, dissemination, copying, disclosure, >>>>> modification, distribution and / or publication of this message >>>>> without the prior written consent of the author of this e-mail is >>>>> strictly prohibited. If you have received this email in error please >>>>> delete it and notify the sender immediately. Before opening any mail >>>>> and attachments please check them for viruses and defect. >>>>> >>>>> >>>>> -------------------------------------------------------------------- >>>>> - >>>>> - >>>>> ------------------------------------------------- >>>>> >>>> >>> >>> >>> >>> -- >>> Joseph Echeverria >>> Cloudera, Inc. >>> 443.305.9434 >>> >> >> >>
