Re: MR - Input from Hbase output to HDFS

Denis Kreis Mon, 21 Nov 2011 11:52:12 -0800

Hi

Is org.apache.hadoop.mapred.FileInputFormat to be considered
as obsolete/deprecated?


Thanks!

2011/11/15 Stuti Awasthi <[email protected]>

> Sure Doug,
> Thanks
>
> -----Original Message-----
> From: Doug Meil [mailto:[email protected]]
> Sent: Monday, November 14, 2011 9:08 PM
> To: [email protected]
> Subject: Re: MR - Input from Hbase output to HDFS
>
>
> Glad to worked through that and everything is working.  I will add an
> example of MR to Hbase-to-HDFS in the book.
>
>
>
>
>
> On 11/14/11 1:24 AM, "Stuti Awasthi" <[email protected]> wrote:
>
> >Hi,
> >I think that issue is with Filesystem Configuration, as in config, it
> >is picking HbaseConfiguration. When I modified my output directory path
> >to absolute path of HDFS :
> >FileOutputFormat.setOutputPath(job, new
> >Path("hdfs://master:54310/MR/stuti3"));
> >
> >The MR jobs runs successfully and I am able to see stuti3 directory
> >inside HDFS at desired path.
> >
> >
> >-----Original Message-----
> >From: Stuti Awasthi
> >Sent: Monday, November 14, 2011 11:40 AM
> >To: [email protected]
> >Subject: RE: MR - Input from Hbase output to HDFS
> >
> >Hi Joey,
> >Thanks for pointing this. After importing "FileOutputFormat" as you
> >suggested, I am able to run MR job from eclipse (Windows) the only
> >problem is I am not able to see the output directory this code is
> >creating. HDFS and HBase are on Linux machine.
> >
> >Code :
> >               Configuration config = HBaseConfiguration.create();
> >               config.set("hbase.zookeeper.quorum", "master");
> >               config.set("hbase.zookeeper.property.clientPort", "2181");
> >
> >               Job job = new Job(config, "Hbase_Read_Write");
> >               job.setJarByClass(ReadWriteDriver.class);
> >               Scan scan = new Scan();
> >               scan.setCaching(500);
> >               scan.setCacheBlocks(false);
> >               TableMapReduceUtil.initTableMapperJob("users",
> >scan,ReadWriteMapper.class, Text.class, IntWritable.class, job);
> >               job.setOutputFormatClass(TextOutputFormat.class);
> >               FileOutputFormat.setOutputPath(job, new Path("/stuti2"));
> >
> >After executing this code, the MR jobs runs successfully but when I
> >look hdfs no directory is created "/stuti2". I also looked directory in
> >local filesystem of Linux machine as well as windows machine, but not
> >able to find the output folder anywhere.
> >
> >Eclipse console Output :
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.version=1.6.0_27
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.vendor=Sun Microsystems Inc.
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\
> >wor
> >kspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbas
> >e\M
> >RHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRH
> >bas
> >eReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseRead
> >Wri
> >te\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\h
> >bas
> >e-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D
> >:\w orkspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.library.path=C:\Program
> >Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\syste
> >m32 ;C:\Windows;C:/Program Files/Java/jre6/bin/client;C:/Program
> >Files/Java/jre6/bin;C:/Program
> >Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\Syst
> >em3 2\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
> >Files\Java\jdk1.6.0_27;C:\Program
> >Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclips
> >e;;
> >.
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.compiler=<NA>
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:os.name=Windows 7
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:os.arch=x86
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:os.version=6.1
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:user.name=stutiawasthi
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:user.home=C:\Users\stutiawasthi
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client
> >connection,
> >connectString=master:2181 sessionTimeout=180000 watcher=hconnection
> >11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection
> >to server master/10.33.64.235:2181
> >11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection
> >established to master/10.33.64.235:2181, initiating session
> >11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment
> >complete on server master/10.33.64.235:2181, sessionid =
> >0x33879243de00ec, negotiated timeout = 180000
> >11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001
> >11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client
> >connection,
> >connectString=master:2181 sessionTimeout=180000 watcher=hconnection
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection
> >to server master/10.33.64.235:2181
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection
> >established to master/10.33.64.235:2181, initiating session
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment
> >complete on server master/10.33.64.235:2181, sessionid =
> >0x33879243de00ed, negotiated timeout = 180000
> >11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client
> >connection,
> >connectString=master:2181 sessionTimeout=180000 watcher=hconnection
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection
> >to server master/10.33.64.235:2181
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection
> >established to master/10.33.64.235:2181, initiating session
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment
> >complete on server master/10.33.64.235:2181, sessionid =
> >0x33879243de00ee, negotiated timeout = 180000
> >11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100
> >11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720
> >11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680
> >...............................................
> >11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0
> >11/11/14 11:21:46 INFO mapred.TaskRunner:
> >Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> >commiting
> >11/11/14 11:21:46 INFO mapred.LocalJobRunner:
> >11/11/14 11:21:46 INFO mapred.TaskRunner: Task
> >'attempt_local_0001_m_000000_0' done.
> >11/11/14 11:21:46 INFO mapred.LocalJobRunner:
> >11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments
> >11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with
> >1 segments left of total size: 103 bytes
> >11/11/14 11:21:46 INFO mapred.LocalJobRunner:
> >11/11/14 11:21:46 INFO mapred.TaskRunner:
> >Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> >commiting
> >11/11/14 11:21:46 INFO mapred.LocalJobRunner:
> >11/11/14 11:21:46 INFO mapred.TaskRunner: Task
> >attempt_local_0001_r_000000_0 is allowed to commit now
> >11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task
> >'attempt_local_0001_r_000000_0' to /stuti2
> >11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce
> >11/11/14 11:21:46 INFO mapred.TaskRunner: Task
> >'attempt_local_0001_r_000000_0' done.
> >11/11/14 11:21:47 INFO mapred.JobClient:  map 100% reduce 100%
> >11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001
> >11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12
> >11/11/14 11:21:47 INFO mapred.JobClient:   FileSystemCounters
> >11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_READ=40923
> >11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82343
> >11/11/14 11:21:47 INFO mapred.JobClient:   Map-Reduce Framework
> >11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input groups=5
> >11/11/14 11:21:47 INFO mapred.JobClient:     Combine output records=0
> >11/11/14 11:21:47 INFO mapred.JobClient:     Map input records=5
> >11/11/14 11:21:47 INFO mapred.JobClient:     Reduce shuffle bytes=0
> >11/11/14 11:21:47 INFO mapred.JobClient:     Reduce output records=5
> >11/11/14 11:21:47 INFO mapred.JobClient:     Spilled Records=10
> >11/11/14 11:21:47 INFO mapred.JobClient:     Map output bytes=91
> >11/11/14 11:21:47 INFO mapred.JobClient:     Combine input records=0
> >11/11/14 11:21:47 INFO mapred.JobClient:     Map output records=5
> >11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input records=5
> >
> >
> >Please Suggest
> >
> >-----Original Message-----
> >From: Joey Echeverria [mailto:[email protected]]
> >Sent: Friday, November 11, 2011 10:38 PM
> >To: [email protected]
> >Subject: Re: MR - Input from Hbase output to HDFS
> >
> >There are two APIs (old and new), and you appear to be mixing them.
> >TableMapReduceUtil only works with the new API. The solution is to
> >import the new version of FileOutputFormat which takes a Job:
> >
> >
> >import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
> >
> >-Joey
> >
> >On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <[email protected]>
> >wrote:
> >> The method " setOutputPath (JobConf,Path)" take JobConf as a
> >>parameter not the Job object.
> >> At least this is the error Im getting while compiling with Hadoop
> >>0.20.2 jar with eclipse.
> >>
> >> FileOutputFormat.setOutputPath(conf, new Path("/output"));
> >>
> >> -----Original Message-----
> >> From: Prashant Sharma [mailto:[email protected]]
> >> Sent: Friday, November 11, 2011 11:20 AM
> >> To: [email protected]
> >> Subject: Re: MR - Input from Hbase output to HDFS
> >>
> >> Hi stuti,
> >> I was wondering why  you are not using job object to set output path
> >>like this.
> >>
> >> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );
> >>
> >>
> >> thanks
> >>
> >> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi
> >><[email protected]>wrote:
> >>
> >>> Hi Andrie,
> >>> Well I am bit confused. When I use Jobconf , and associate with
> >>>JobClient to run the job then I get the error that "Input directory
> >>>is not set".
> >>> Since I want my input to be taken by Hbase table which I already
> >>>configured with "TableMapReduceUtil.initTableMapperJob". I don't want
> >>>to set input directory via jobconf.
> >>> How to mix these 2 so that I can get input from Hbase and write
> >>>ouput  to HDFS.
> >>>
> >>> Thanks
> >>>
> >>> -----Original Message-----
> >>> From: Andrei Cojocaru [mailto:[email protected]]
> >>> Sent: Thursday, November 10, 2011 7:09 PM
> >>> To: [email protected]
> >>> Subject: Re: MR - Input from Hbase output to HDFS
> >>>
> >>> Stuti,
> >>>
> >>> I don't see you associating JobConf with Job anywhere.
> >>> -Andrei
> >>>
> >>> ::DISCLAIMER::
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> -
> >>> -------------------------------------------------
> >>>
> >>> The contents of this e-mail and any attachment(s) are confidential
> >>> and intended for the named recipient(s) only.
> >>> It shall not attach any liability on the originator or HCL or its
> >>> affiliates. Any views or opinions presented in this email are solely
> >>> those of the author and may not necessarily reflect the opinions of
> >>> HCL or its affiliates.
> >>> Any form of reproduction, dissemination, copying, disclosure,
> >>> modification, distribution and / or publication of this message
> >>> without the prior written consent of the author of this e-mail is
> >>> strictly prohibited. If you have received this email in error please
> >>> delete it and notify the sender immediately. Before opening any mail
> >>> and attachments please check them for viruses and defect.
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> -
> >>> -------------------------------------------------
> >>>
> >>
> >
> >
> >
> >--
> >Joseph Echeverria
> >Cloudera, Inc.
> >443.305.9434
> >
>
>
>

Re: MR - Input from Hbase output to HDFS

Reply via email to