On Oct 11, 2011, at 12:36 PM, Sean Owen wrote:

> Where is the NaN coming up -- what has this value?

simColumn seems to be the originator in the Aggregate step.  For instance, my 
current breakpoint shows:
{309682:0.9566912651062012,42938:0.9566912651062012,309672:NaN}

I can also see some in the PartialMultiplyMapper via the 
similarityMatrixColumn.  

Is that set by SimilarityMatrixRowWrapperMapper?
<code>
/* remove self similarity */
    similarityMatrixRow.set(key.get(), Double.NaN);
</code>



> It should be propagated in some cases but not others. I'm not aware of
> any changes here.

yeah, me neither.  This is all related to MAHOUT-798.

> 
> Generally small data sets will have this problem of not being able to
> compute much of anything useful, so NaN might be right here.
> But you say it was different recently, which seems to rule that out.

I also _believe_ I'm seeing it in a much larger data set on Hadoop, it's just 
that's a whole lot harder to debug.

> 
> On Tue, Oct 11, 2011 at 5:34 PM, Grant Ingersoll <[email protected]> wrote:
>> I'm running trunk RecommenderJob (via build-asf-email.sh) and am not getting 
>> any recommendations due to NaNs being calculated in the 
>> AggregateAndRecommend step.  I'm not quite sure what is going on as it seems 
>> like this was working as little as two weeks ago (post Sebastian's big 
>> change to RecJob), but I don't see a whole lot of changes in that part of 
>> the code.
>> 
>> The data is user id's mapping to email thread ids.  My input data is simply 
>> a triple of user id, thread id, 1 (meaning that user participated in that 
>> thread)  It seems like I will have a lot of good values in the inputs to the 
>> AggregateAndRecommend step, except one id will be NaN and this then seems to 
>> get added in and makes everything NaN (I realize this is a very naive 
>> understanding).  I sense that I should be looking upstream in the process 
>> for a fix, but I am not sure where that is.
>> 
>> Any ideas where I should be looking to eliminate these NaNs?  If you want to 
>> try this with a small data set, you can get it here: 
>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout 
>> (but note the companion article is not published yet.)
>> 
>> Thanks,
>> Grant


Reply via email to