[Nutch-general] IdentityReducer while fetching

Vishal Shah Thu, 12 Oct 2006 05:30:42 -0700

Hi,
 
  While running a fetch on 3M urls with -noParsing option set, I noticed
that the reduce is taking too long. Since the reducer class is the
IdentityReducer class in this case, couldn't hadoop handle it internally
by setting the output path of map directly to the final output path? Or,
do a simple rename of the temp output directory to the final output
directory?
 
  For the reduce phase, it seems that the copy is unnecessary in this
case. I am unfamiliar with the details of Hadoop, so maybe there is a
strong reason to do things the way they are done right now, or maybe I
am mistaken about how they are done. Can the experts please throw some
light on this?
 
Thank you,
 
-vishal.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] IdentityReducer while fetching

Reply via email to