On Sep 1, 2011, at 10:04 AM, Sean Owen wrote: > Your input needs to be CSV if you want to use it all as-is. But, it quickly > creates vectors out of things, so really you can comment out the first > mapper than creates user vectors, and just wire it to use yours instead. it > should do all the rest from there. >
I could use the --startPhase functionality to skip the first two phases, right? > On Thu, Sep 1, 2011 at 2:58 PM, Grant Ingersoll <[email protected]> wrote: > >> Assuming I've done my own translation (I followed Ted's piece), how do I >> get this into the rest of the RecJob? Right now, I have a NamedVector (the >> name is the id of the from email address) and the cells are {0,1} for each >> message id (1 if that user has interacted with that message id). In looking >> at the RecommenderJob, it seems like I could skip the first couple of >> phases, but it also seems like I need a DistributedRowMatrix as input for >> the next phase (maybePruneAndTranspose). Is my understanding correct? I >> guess I need to convert my seq. file of NamedVectors to the >> DistributedRowMatrix? >> >> -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com
