On Tue, Sep 13, 2011 at 6:27 AM, Lance Norskog <[email protected]> wrote:
> Machine learning has quite a few algorithms where data is processed in a way
> foreign to its domain. Running SVD on user/item/preference matrices is a
> great example: this makes no sense whatsoever.

(Why?? this is one of the most canonical uses of the SVD. The SVD
operates on a matrix and that's a matrix.)

> This proposal in no way recommends a "common format" across all jobs. the
> jobs all have their own i/o format, and that would say. Under this proposal,
> you can ask a job to also emit its data in one of the common formats. The
> semantics don't matter.

In this regard, I would go further than you. There's no need for
things to have their own format needlessly. They either emit logically
the same sort of thing or they don't. If they do, it should be the
same format. I think we could and should change the output then. My
question is, what cases of that do you observe?

I don't think it's a change that touches everything, or even most
jobs, but you've already pointed out 2 changes that do make sense:


> The best justification for this is FPGrowth. It emits a custom object,
> TopKStringPatterns. If I am interested in processing only one aspect of it's
> full data structure, I cannot ask it to emit part of that structure. If,
> say, I want to collate the graph of short patterns, I'm stuck. Without
> writing a custom Java program, the output goes nowhere. (I'm really
> interested if anybody has ever done anything with the output.)

I don't know it myself, but probably a perfect example of a job whose
output can change. (1)


> Another simpler use case is the system of making Vector files out of Lucene
> analysis output. The Lanzcos distributed solver takes matrices instead of
> vectors. What if I want to run the Lucene vector file output through this
> SVD? I have to somehow turn my (named) vectors into a (labeled) matrix.

I smell the need for a utility that can convert "n m-dimensional
vectors" into an "mxn matrix". (2)

Reply via email to