I think we discussed several of these points on the mailing list. I am not sure I would ever expect there to be a common format across all jobs. They just don't all operate on the same information. Even where two jobs ingest "vectors", it doesn't mean vectors for one are meaningful for another.
If you spot cases where two jobs really ingest the same input, and do not have the same input format, they could surely be unified. But that's better tackled by identifying the case(s), make a JIRA, and patch it. For generalization I think this is a bridge too far. Another layer of options and metadata specifying what sub-types can be imported and exported with what caveats, etc.? The jobs aren't even consistent in the version of Hadoop they use -- we have some still on 0.19.x. On Mon, Sep 12, 2011 at 4:14 AM, Lance Norskog <[email protected]> wrote: > https://cwiki.apache.org/confluence/display/MAHOUT/Import+Export+Sequence+File+Formats > > Please have a look; comment or rewrite as you please. It's a wish list of > what I would want, approaching Mahout either as an experienced user or as a > newbie. > > -- > Lance Norskog > [email protected] >
