On 03/18/2011 11:31 AM, Harsh J wrote: > Probably a small case, in which I would require reading from multiple > sources in my job (perhaps even process them differently until the Map > phase), with special reader-schemas for each of my sources.
How would your mapper detect which schema was in use? Would it use something like instanceof? If that's the case, then you could simply use a union as the job's schema. Or would you want a different mapper for each input type? That seems like a higher-level tool, like Hadoop's MultipleInputs, which shouldn't be too hard to build, but I don't think should be built into the base MapReduce API, but rather a layer above it, no? Doug
