NaN is added for all user item pairs that already exist in the input, to
make them ineligible for recommendation. That's normal - could this be the
case?
On Oct 11, 2011 7:49 PM, "Grant Ingersoll" <[email protected]> wrote:

>
> On Oct 11, 2011, at 12:36 PM, Sean Owen wrote:
>
> > Where is the NaN coming up -- what has this value?
>
> simColumn seems to be the originator in the Aggregate step.  For instance,
> my current breakpoint shows:
> {309682:0.9566912651062012,42938:0.9566912651062012,309672:NaN}
>
> I can also see some in the PartialMultiplyMapper via the
> similarityMatrixColumn.
>
> Is that set by SimilarityMatrixRowWrapperMapper?
> <code>
> /* remove self similarity */
>    similarityMatrixRow.set(key.get(), Double.NaN);
> </code>
>
>
>
> > It should be propagated in some cases but not others. I'm not aware of
> > any changes here.
>
> yeah, me neither.  This is all related to MAHOUT-798.
>
> >
> > Generally small data sets will have this problem of not being able to
> > compute much of anything useful, so NaN might be right here.
> > But you say it was different recently, which seems to rule that out.
>
> I also _believe_ I'm seeing it in a much larger data set on Hadoop, it's
> just that's a whole lot harder to debug.
>
> >
> > On Tue, Oct 11, 2011 at 5:34 PM, Grant Ingersoll <[email protected]>
> wrote:
> >> I'm running trunk RecommenderJob (via build-asf-email.sh) and am not
> getting any recommendations due to NaNs being calculated in the
> AggregateAndRecommend step.  I'm not quite sure what is going on as it seems
> like this was working as little as two weeks ago (post Sebastian's big
> change to RecJob), but I don't see a whole lot of changes in that part of
> the code.
> >>
> >> The data is user id's mapping to email thread ids.  My input data is
> simply a triple of user id, thread id, 1 (meaning that user participated in
> that thread)  It seems like I will have a lot of good values in the inputs
> to the AggregateAndRecommend step, except one id will be NaN and this then
> seems to get added in and makes everything NaN (I realize this is a very
> naive understanding).  I sense that I should be looking upstream in the
> process for a fix, but I am not sure where that is.
> >>
> >> Any ideas where I should be looking to eliminate these NaNs?  If you
> want to try this with a small data set, you can get it here:
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout(but 
> note the companion article is not published yet.)
> >>
> >> Thanks,
> >> Grant
>
>
>

Reply via email to