Thanks for the reply, Josh. I understand its function a bit better now.
On Wed, Aug 14, 2013 at 5:50 PM, Josh Wills <[email protected]> wrote: > Hey Narlin, > > DoFns are similar to the Mapper and Reducer classes that you would write > in classic MapReduce jobs-- they don't spawn MapReduce jobs themselves. The > Crunch planner will analyze the overall DAG of DoFns, groupByKeys, unions, > and combineValues operations and compile the DAG into one or more MapReduce > jobs, where each of the DoFns will be assigned to one of the Mappers or > Reducers in those jobs. Crunch has its own Mapper and Reducer > implementations (named CrunchMapper and CrunchReducer, naturally) that are > responsible for executing the DoFns that are assigned to each phase of the > job. > > In general, you should not need to use mapper and reducer classes when you > use Crunch, although if you have legacy Mapper and Reducer classes that you > would like to use in conjunction with the DoFns in a Crunch pipeline, there > is a collection of methods in org.apache.crunch.lib.MapReduce in Crunch > 0.7.0 that will wrap a given Mapper or Reducer class inside of a DoFn. > > Hope that helps. > > Best, > Josh > > > > On Wed, Aug 14, 2013 at 12:59 PM, Narlin M <[email protected]> wrote: > >> I have just recently started using Crunch, having been recommended to use >> it instead of writing plain map reduce jobs. As I was going through the >> crunch documentation, some questions came to my mind. Am I correct in >> saying that the DoFn family of functions will internally spawn map-reduce >> jobs, so there is no need to write separate mapper or reducer classes? If >> so, I agree that this will abstract some of the lower level details from >> the programmer, but at the same time, does it not lower the programmer's >> control over the processing logic? >> >> Also, will there be situations when separate mapper / reducer classes >> will be required in addition to the DoFn functions? >> >> Thanks. >> > > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> >
