Re: MR - Input from Hbase output to HDFS

Doug Meil Mon, 14 Nov 2011 07:39:12 -0800

Glad to worked through that and everything is working.  I will add an
example of MR to Hbase-to-HDFS in the book.






On 11/14/11 1:24 AM, "Stuti Awasthi" <[email protected]> wrote:

>Hi,
>I think that issue is with Filesystem Configuration, as in config, it is
>picking HbaseConfiguration. When I modified my output directory path to
>absolute path of HDFS :
>FileOutputFormat.setOutputPath(job, new
>Path("hdfs://master:54310/MR/stuti3"));
>
>The MR jobs runs successfully and I am able to see stuti3 directory
>inside HDFS at desired path.
>
>
>-----Original Message-----
>From: Stuti Awasthi
>Sent: Monday, November 14, 2011 11:40 AM
>To: [email protected]
>Subject: RE: MR - Input from Hbase output to HDFS
>
>Hi Joey,
>Thanks for pointing this. After importing "FileOutputFormat" as you
>suggested, I am able to run MR job from eclipse (Windows) the only
>problem is I am not able to see the output directory this code is
>creating. HDFS and HBase are on Linux machine.
>
>Code :
>               Configuration config = HBaseConfiguration.create();
>               config.set("hbase.zookeeper.quorum", "master");
>               config.set("hbase.zookeeper.property.clientPort", "2181");
>                       
>               Job job = new Job(config, "Hbase_Read_Write");
>               job.setJarByClass(ReadWriteDriver.class);
>               Scan scan = new Scan();
>               scan.setCaching(500);
>               scan.setCacheBlocks(false);
>               TableMapReduceUtil.initTableMapperJob("users",
>scan,ReadWriteMapper.class, Text.class, IntWritable.class, job);
>               job.setOutputFormatClass(TextOutputFormat.class);
>               FileOutputFormat.setOutputPath(job, new Path("/stuti2"));
>
>After executing this code, the MR jobs runs successfully but when I look
>hdfs no directory is created "/stuti2". I also looked directory in local
>filesystem of Linux machine as well as windows machine, but not able to
>find the output folder anywhere.
>       
>Eclipse console Output :
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.version=1.6.0_27
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.vendor=Sun Microsystems Inc.
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\wor
>kspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbase\M
>RHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRHbas
>eReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseReadWri
>te\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\hbas
>e-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D:\w
>orkspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.library.path=C:\Program
>Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32
>;C:\Windows;C:/Program Files/Java/jre6/bin/client;C:/Program
>Files/Java/jre6/bin;C:/Program
>Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\System3
>2\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
>Files\Java\jdk1.6.0_27;C:\Program
>Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclipse;;
>.
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.compiler=<NA>
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:os.name=Windows 7
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.arch=x86
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:os.version=6.1
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:user.name=stutiawasthi
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:user.home=C:\Users\stutiawasthi
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection to
>server master/10.33.64.235:2181
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment
>complete on server master/10.33.64.235:2181, sessionid =
>0x33879243de00ec, negotiated timeout = 180000
>11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001
>11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to
>server master/10.33.64.235:2181
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment
>complete on server master/10.33.64.235:2181, sessionid =
>0x33879243de00ed, negotiated timeout = 180000
>11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to
>server master/10.33.64.235:2181
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment
>complete on server master/10.33.64.235:2181, sessionid =
>0x33879243de00ee, negotiated timeout = 180000
>11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100
>11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720
>11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680
>...............................................
>11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0
>11/11/14 11:21:46 INFO mapred.TaskRunner:
>Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>commiting
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task
>'attempt_local_0001_m_000000_0' done.
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments
>11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with 1
>segments left of total size: 103 bytes
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner:
>Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>commiting
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task
>attempt_local_0001_r_000000_0 is allowed to commit now
>11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task
>'attempt_local_0001_r_000000_0' to /stuti2
>11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task
>'attempt_local_0001_r_000000_0' done.
>11/11/14 11:21:47 INFO mapred.JobClient:  map 100% reduce 100%
>11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001
>11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12
>11/11/14 11:21:47 INFO mapred.JobClient:   FileSystemCounters
>11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_READ=40923
>11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82343
>11/11/14 11:21:47 INFO mapred.JobClient:   Map-Reduce Framework
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input groups=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Combine output records=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Map input records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce shuffle bytes=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce output records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Spilled Records=10
>11/11/14 11:21:47 INFO mapred.JobClient:     Map output bytes=91
>11/11/14 11:21:47 INFO mapred.JobClient:     Combine input records=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Map output records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input records=5
>
>
>Please Suggest
>
>-----Original Message-----
>From: Joey Echeverria [mailto:[email protected]]
>Sent: Friday, November 11, 2011 10:38 PM
>To: [email protected]
>Subject: Re: MR - Input from Hbase output to HDFS
>
>There are two APIs (old and new), and you appear to be mixing them.
>TableMapReduceUtil only works with the new API. The solution is to import
>the new version of FileOutputFormat which takes a Job:
>
>
>import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>
>-Joey
>
>On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <[email protected]>
>wrote:
>> The method " setOutputPath (JobConf,Path)" take JobConf as a parameter
>>not the Job object.
>> At least this is the error Im getting while compiling with Hadoop
>>0.20.2 jar with eclipse.
>>
>> FileOutputFormat.setOutputPath(conf, new Path("/output"));
>>
>> -----Original Message-----
>> From: Prashant Sharma [mailto:[email protected]]
>> Sent: Friday, November 11, 2011 11:20 AM
>> To: [email protected]
>> Subject: Re: MR - Input from Hbase output to HDFS
>>
>> Hi stuti,
>> I was wondering why  you are not using job object to set output path
>>like this.
>>
>> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );
>>
>>
>> thanks
>>
>> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi
>><[email protected]>wrote:
>>
>>> Hi Andrie,
>>> Well I am bit confused. When I use Jobconf , and associate with
>>> JobClient to run the job then I get the error that "Input directory is
>>>not set".
>>> Since I want my input to be taken by Hbase table which I already
>>> configured with "TableMapReduceUtil.initTableMapperJob". I don't want
>>> to set input directory via jobconf.
>>> How to mix these 2 so that I can get input from Hbase and write ouput
>>> to HDFS.
>>>
>>> Thanks
>>>
>>> -----Original Message-----
>>> From: Andrei Cojocaru [mailto:[email protected]]
>>> Sent: Thursday, November 10, 2011 7:09 PM
>>> To: [email protected]
>>> Subject: Re: MR - Input from Hbase output to HDFS
>>>
>>> Stuti,
>>>
>>> I don't see you associating JobConf with Job anywhere.
>>> -Andrei
>>>
>>> ::DISCLAIMER::
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> -------------------------------------------------
>>>
>>> The contents of this e-mail and any attachment(s) are confidential
>>> and intended for the named recipient(s) only.
>>> It shall not attach any liability on the originator or HCL or its
>>> affiliates. Any views or opinions presented in this email are solely
>>> those of the author and may not necessarily reflect the opinions of
>>> HCL or its affiliates.
>>> Any form of reproduction, dissemination, copying, disclosure,
>>> modification, distribution and / or publication of this message
>>> without the prior written consent of the author of this e-mail is
>>> strictly prohibited. If you have received this email in error please
>>> delete it and notify the sender immediately. Before opening any mail
>>> and attachments please check them for viruses and defect.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> -------------------------------------------------
>>>
>>
>
>
>
>--
>Joseph Echeverria
>Cloudera, Inc.
>443.305.9434
>

Re: MR - Input from Hbase output to HDFS

Reply via email to