RE: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

Michael Segel Fri, 16 Sep 2011 10:05:55 -0700

Sonal,

Just because you have a m/r job doesn't mean that you need to reduce anything. 
You can have a job that contains only a mapper.
Or your job runner can have a series of map jobs in serial.


Most if not all of the map/reduce jobs where we pull data from HBase, don't 
require a reducer. 

To give you a simple example... if I want to determine the table schema where I 
am storing some sort of structured data...
I just write a m/r job which opens a table, scan's the table counting the 
occurrence of each column name via dynamic counters.

There is no need for a reducer.

Does that help?


> Date: Fri, 16 Sep 2011 21:41:01 +0530
> Subject: Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ...
> From: [email protected]
> To: [email protected]
> 
> Michel,
> 
> Sorry can you please help me understand what you mean when you say that when
> dealing with HBase, you really dont want to use a reducer? Here, Hbase is
> being used as the input to the MR job.
> 
> Thanks
> Sonal
> 
> 
> On Fri, Sep 16, 2011 at 2:35 PM, Michel Segel 
> <[email protected]>wrote:
> 
> > I think you need to get a little bit more information.
> > Reducers are expensive.
> > When Thomas says that he is aggregating data, what exactly does he mean?
> > When dealing w HBase, you really don't want to use a reducer.
> >
> > You may want to run two map jobs and it could be that just dumping the
> > output via jdbc makes the most sense.
> >
> > We are starting to see a lot of questions where the OP isn't providing
> > enough information so that the recommendation could be wrong...
> >
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> > On Sep 16, 2011, at 2:22 AM, Sonal Goyal <[email protected]> wrote:
> >
> > > There is a DBOutputFormat class in the org.apache,hadoop.mapreduce.lib.db
> > > package, you could use that. Or you could write to the hdfs and then use
> > > something like HIHO[1] to export to the db. I have been working
> > extensively
> > > in this area, you can write to me directly if you need any help.
> > >
> > > 1. https://github.com/sonalgoyal/hiho
> > >
> > > Best Regards,
> > > Sonal
> > > Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> > > Nube Technologies <http://www.nubetech.co>
> > >
> > > <http://in.linkedin.com/in/sonalgoyal>
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Sep 16, 2011 at 10:55 AM, Steinmaurer Thomas <
> > > [email protected]> wrote:
> > >
> > >> Hello,
> > >>
> > >>
> > >>
> > >> writing a MR-Job to process HBase data and store aggregated data in
> > >> Oracle. How would you do that in a MR-job?
> > >>
> > >>
> > >>
> > >> Currently, for test purposes we write the result into a HBase table
> > >> again by using a TableReducer. Is there something like a OracleReducer,
> > >> RelationalReducer, JDBCReducer or whatever? Or should one simply use
> > >> plan JDBC code in the reduce step?
> > >>
> > >>
> > >>
> > >> Thanks!
> > >>
> > >>
> > >>
> > >> Thomas
> > >>
> > >>
> > >>
> > >>
> >

RE: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

Reply via email to