The general idea is that as count grows, it should push the result away from 0 and towards 1. Or it needs to move towards -1, if the result is negative. It needs to stay in the range [-1,1] too. I think those last two explain 80% of the apparent extra fuss there, and is why a simple multiply wouldn't quite work.
I imagine you could write a different, slightly simpler, and possibly more principled formulation that still matches those goals. The weighting system is a little arbitrary. On Mon, Oct 29, 2012 at 2:59 PM, yamo93 <[email protected]> wrote: > Hi all, > > I have a question on the formula used for weighted similarities in the > class AbstractSimilarity. > > I expected to find a simple percentage, as > double scaleFactor = (double) count / (double) (num + 1); > return result * scaleFactor; > > But the the code is more complex. > > What are the benefits of this approach ? > > Rgds, > Yann. >
