RE: MR - Input from Hbase output to HDFS

Stuti Awasthi Sun, 13 Nov 2011 22:24:53 -0800

Hi,
I think that issue is with Filesystem Configuration, as in config, it is 
picking HbaseConfiguration. When I modified my output directory path to 
absolute path of HDFS :
FileOutputFormat.setOutputPath(job, new Path("hdfs://master:54310/MR/stuti3"));


The MR jobs runs successfully and I am able to see stuti3 directory inside HDFS 
at desired path.


-----Original Message-----
From: Stuti Awasthi 
Sent: Monday, November 14, 2011 11:40 AM
To: [email protected]
Subject: RE: MR - Input from Hbase output to HDFS

Hi Joey,
Thanks for pointing this. After importing "FileOutputFormat" as you suggested, 
I am able to run MR job from eclipse (Windows) the only problem is I am not 
able to see the output directory this code is creating. HDFS and HBase are on 
Linux machine.

Code :
                Configuration config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "master");
                config.set("hbase.zookeeper.property.clientPort", "2181");
                        
                Job job = new Job(config, "Hbase_Read_Write");
                job.setJarByClass(ReadWriteDriver.class);
                Scan scan = new Scan();
                scan.setCaching(500);
                scan.setCacheBlocks(false);
                TableMapReduceUtil.initTableMapperJob("users", 
scan,ReadWriteMapper.class, Text.class, IntWritable.class, job);
                job.setOutputFormatClass(TextOutputFormat.class);
                FileOutputFormat.setOutputPath(job, new Path("/stuti2"));

After executing this code, the MR jobs runs successfully but when I look hdfs 
no directory is created "/stuti2". I also looked directory in local filesystem 
of Linux machine as well as windows machine, but not able to find the output 
folder anywhere.
        
Eclipse console Output :
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
environment:java.version=1.6.0_27
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun 
Microsystems Inc.
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\workspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\hbase-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
environment:java.library.path=C:\Program 
Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32;C:\Windows;C:/Program
 Files/Java/jre6/bin/client;C:/Program Files/Java/jre6/bin;C:/Program 
Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
 Files\Java\jdk1.6.0_27;C:\Program 
Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclipse;;.
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
environment:java.compiler=<NA>
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.name=Windows 7
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.arch=x86
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.version=6.1
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
environment:user.name=stutiawasthi
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
environment:user.home=C:\Users\stutiawasthi
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client connection, 
connectString=master:2181 sessionTimeout=180000 watcher=hconnection
11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection to 
server master/10.33.64.235:2181
11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection established to 
master/10.33.64.235:2181, initiating session
11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment complete on 
server master/10.33.64.235:2181, sessionid = 0x33879243de00ec, negotiated 
timeout = 180000
11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001
11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection, 
connectString=master:2181 sessionTimeout=180000 watcher=hconnection
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to 
server master/10.33.64.235:2181
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection established to 
master/10.33.64.235:2181, initiating session
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment complete on 
server master/10.33.64.235:2181, sessionid = 0x33879243de00ed, negotiated 
timeout = 180000
11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection, 
connectString=master:2181 sessionTimeout=180000 watcher=hconnection
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to 
server master/10.33.64.235:2181
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection established to 
master/10.33.64.235:2181, initiating session
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment complete on 
server master/10.33.64.235:2181, sessionid = 0x33879243de00ee, negotiated 
timeout = 180000
11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680 
...............................................
11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0
11/11/14 11:21:46 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is 
done. And is in the process of commiting
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' 
done.
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments
11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with 1 
segments left of total size: 103 bytes
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is 
done. And is in the process of commiting
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is 
allowed to commit now
11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task 
'attempt_local_0001_r_000000_0' to /stuti2
11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce
11/11/14 11:21:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' 
done.
11/11/14 11:21:47 INFO mapred.JobClient:  map 100% reduce 100%
11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001
11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12
11/11/14 11:21:47 INFO mapred.JobClient:   FileSystemCounters
11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_READ=40923
11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82343
11/11/14 11:21:47 INFO mapred.JobClient:   Map-Reduce Framework
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input groups=5
11/11/14 11:21:47 INFO mapred.JobClient:     Combine output records=0
11/11/14 11:21:47 INFO mapred.JobClient:     Map input records=5
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce output records=5
11/11/14 11:21:47 INFO mapred.JobClient:     Spilled Records=10
11/11/14 11:21:47 INFO mapred.JobClient:     Map output bytes=91
11/11/14 11:21:47 INFO mapred.JobClient:     Combine input records=0
11/11/14 11:21:47 INFO mapred.JobClient:     Map output records=5
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input records=5


Please Suggest

-----Original Message-----
From: Joey Echeverria [mailto:[email protected]]
Sent: Friday, November 11, 2011 10:38 PM
To: [email protected]
Subject: Re: MR - Input from Hbase output to HDFS

There are two APIs (old and new), and you appear to be mixing them.
TableMapReduceUtil only works with the new API. The solution is to import the 
new version of FileOutputFormat which takes a Job:


import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

-Joey

On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <[email protected]> wrote:
> The method " setOutputPath (JobConf,Path)" take JobConf as a parameter not 
> the Job object.
> At least this is the error Im getting while compiling with Hadoop 0.20.2 jar 
> with eclipse.
>
> FileOutputFormat.setOutputPath(conf, new Path("/output"));
>
> -----Original Message-----
> From: Prashant Sharma [mailto:[email protected]]
> Sent: Friday, November 11, 2011 11:20 AM
> To: [email protected]
> Subject: Re: MR - Input from Hbase output to HDFS
>
> Hi stuti,
> I was wondering why  you are not using job object to set output path like 
> this.
>
> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );
>
>
> thanks
>
> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi <[email protected]>wrote:
>
>> Hi Andrie,
>> Well I am bit confused. When I use Jobconf , and associate with 
>> JobClient to run the job then I get the error that "Input directory is not 
>> set".
>> Since I want my input to be taken by Hbase table which I already 
>> configured with "TableMapReduceUtil.initTableMapperJob". I don't want 
>> to set input directory via jobconf.
>> How to mix these 2 so that I can get input from Hbase and write ouput 
>> to HDFS.
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Andrei Cojocaru [mailto:[email protected]]
>> Sent: Thursday, November 10, 2011 7:09 PM
>> To: [email protected]
>> Subject: Re: MR - Input from Hbase output to HDFS
>>
>> Stuti,
>>
>> I don't see you associating JobConf with Job anywhere.
>> -Andrei
>>
>> ::DISCLAIMER::
>>
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------------
>>
>> The contents of this e-mail and any attachment(s) are confidential 
>> and intended for the named recipient(s) only.
>> It shall not attach any liability on the originator or HCL or its 
>> affiliates. Any views or opinions presented in this email are solely 
>> those of the author and may not necessarily reflect the opinions of 
>> HCL or its affiliates.
>> Any form of reproduction, dissemination, copying, disclosure, 
>> modification, distribution and / or publication of this message 
>> without the prior written consent of the author of this e-mail is 
>> strictly prohibited. If you have received this email in error please 
>> delete it and notify the sender immediately. Before opening any mail 
>> and attachments please check them for viruses and defect.
>>
>>
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------------
>>
>



--
Joseph Echeverria
Cloudera, Inc.
443.305.9434

RE: MR - Input from Hbase output to HDFS

Reply via email to