Yes, makes sense to change it in Pig anyway. The code is in org.apache.pig.parser.LogicalPlanBuilder.buildNativeOp. You may also need to change parser to make Load/Store optional. Would you want to give a try?
Daniel On Thu, Sep 8, 2011 at 2:50 PM, Dan Brickley <[email protected]> wrote: > On 8 Sep 2011, at 23:36, Daniel Dai <[email protected]> wrote: > > > It seems like you want to do something like this: > > > > A = xxxxx -- Pig pipeline > > B = MAPREDUCE mahout.jar Store A into > '<PATH>/content/reuters/reuters-out' > > seqdirectory –input <PATH>/content/reuters/reuters-out –output > > <PATH>/content/reuters/seqfiles –charset UTF-8 > > C = MAPREDUCE mahout.jar seq2sparse –input > <PATH>/content/reuters/seqfiles > > –output <PATH>/content/reuters/seqfiles-TF –norm 2 –weight TF > > D = MAPREDUCE mahout.jar Load '<PATH>/content/reuters/seqfiles-TF-IDF' > > seq2sparse –input<PATH>/content/reuters/seqfiles –output > > <PATH>/content/reuters/seqfiles-TF-IDF –norm 2 –weight TFIDF > > E = foreach D generate .... -- Pig pipeline > > > > You only need to interface Pig in the first and last step, but Pig > requires > > you to do LOAD/STORE for each job, and that's the problem. If we make > > Store/Load as optional, that will solve your problem, right? > > I think so. I'd like to confirm that this really works ok before asking for > a change to Pig. But I guess there should be other non-Mahout scenarios that > have similar needs. Can you suggest where to patch Pig to make store/load > optional? > > Dan
