[Nutch-dev] Re: to count the number of pages from each domain

Andrzej Bialecki Fri, 05 May 2006 10:44:57 -0700

[EMAIL PROTECTED] wrote:

We decided that it is impossible in hadoop to have different input/output
types for map and reduce. Then we decided to use another scheme. This scheme
assumes to run two jobs. First job has map function, second job has reduce
task. These jobs have different classes for input and output parameters. New

map and reduce will do the same as described above.

You can use ObjectWritable to pass any type of Writable inside it. Thisway you can mix/match different input/output types easily. The overheadof this wrapping is probably still smaller than submitting another jobjust to change the types...


Please take a look at Indexer.java, where this trick is used.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: to count the number of pages from each domain

Reply via email to