On 8 Sep 2011, at 23:36, Daniel Dai <[email protected]> wrote: > It seems like you want to do something like this: > > A = xxxxx -- Pig pipeline > B = MAPREDUCE mahout.jar Store A into '<PATH>/content/reuters/reuters-out' > seqdirectory –input <PATH>/content/reuters/reuters-out –output > <PATH>/content/reuters/seqfiles –charset UTF-8 > C = MAPREDUCE mahout.jar seq2sparse –input <PATH>/content/reuters/seqfiles > –output <PATH>/content/reuters/seqfiles-TF –norm 2 –weight TF > D = MAPREDUCE mahout.jar Load '<PATH>/content/reuters/seqfiles-TF-IDF' > seq2sparse –input<PATH>/content/reuters/seqfiles –output > <PATH>/content/reuters/seqfiles-TF-IDF –norm 2 –weight TFIDF > E = foreach D generate .... -- Pig pipeline > > You only need to interface Pig in the first and last step, but Pig requires > you to do LOAD/STORE for each job, and that's the problem. If we make > Store/Load as optional, that will solve your problem, right?
I think so. I'd like to confirm that this really works ok before asking for a change to Pig. But I guess there should be other non-Mahout scenarios that have similar needs. Can you suggest where to patch Pig to make store/load optional? Dan
