2013/11/29 Andreas Hjortgaard Danielsen <andrea...@gmail.com>: > Hi, > > It might be worth noting that Lucene uses the same implementation: > http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
Same as what? The current master or @larsmans' suggested fix? > And Gensim has an option for choosing an addition constant (although the > default is 0). > https://github.com/piskvorky/gensim/blob/develop/gensim/models/tfidfmodel.py > > Could this be some numerical trick? Honestly I don't remember well how we ended up in the current implementation. I just remember that we had introduced bugs at some points (negative values and zero division error). The current state might still be buggy in some respect as the last bugfix change might not be the "correct" way to do it. -- Olivier ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general