Re: Spec for a common import/export service for Mahout jobs

Sean Owen Mon, 12 Sep 2011 00:03:13 -0700

I think we discussed several of these points on the mailing list.

I am not sure I would ever expect there to be a common format across
all jobs. They just don't all operate on the same information. Even
where two jobs ingest "vectors", it doesn't mean vectors for one are
meaningful for another.

If you spot cases where two jobs really ingest the same input, and do
not have the same input format, they could surely be unified. But
that's better tackled by identifying the case(s), make a JIRA, and
patch it.

For generalization I think this is a bridge too far. Another layer of
options and metadata specifying what sub-types can be imported and
exported with what caveats, etc.? The jobs aren't even consistent in
the version of Hadoop they use -- we have some still on 0.19.x.

On Mon, Sep 12, 2011 at 4:14 AM, Lance Norskog <[email protected]> wrote:
> https://cwiki.apache.org/confluence/display/MAHOUT/Import+Export+Sequence+File+Formats
>
> Please have a look; comment or rewrite as you please. It's a wish list of
> what I would want, approaching Mahout either as an experienced user or as a
> newbie.
>
> --
> Lance Norskog
> [email protected]
>

Re: Spec for a common import/export service for Mahout jobs

Reply via email to