If no mapper or reducer class is set in the jobConf then the code defaults to IdentityMapper and IdentityReducer respectively which essentially are pass throughs of key/value pairs.
Dennis Kubes Charlie Williams wrote: > I am very new to the Nutch source code, and have been reading over the > Injector class code. From what I understood of the MapReduce system there > had to be both a map and reduce step in order for the algorithm to function > properly. However, in CrawlDb.createJob( Configuration, Path ) a new job is > created for merging the injected URLs that has no Mapper Class set. > > .. > > JobConf job = new NutchJob(config); > job.setJobNmae("crawldb " + crawlDb); > > > Path current = new Path(crawlDb, CrawlDatum.DB_DIR_NAME); > if ( FileSystem.get( job ).exists( current ) ) { > job.addInputPath( current ); > } > > job.setInputFormat( SequenceFileInputFormat.class ); > job.setInputKeyClass( UTF8.class ); > job.setInputValueClass( CrawlDatum.class ); > > job.setReducerClass( CrawlDbReducer.class ); > > job.setOutputPath( newCrawlDb); > job.setOutputFormat( MapFileOutputFormat.class ); > job.setOutputKeyClass( UTF8.class ); > job.setOutputValueClass( CrawlDatum.class ); > > return job; > > > How does this code function properly? > > Is it designed to only run on a single machine and thus does not need a > mapper function set? > > Thanks for any help, > > -Charles Williams > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers