Re: Problem converting tokenized documents into TFIDF vectors

2014-01-26 Thread Drew Farris
Scott, Based on the dictionary output, it looks like the processing of generating vector from your tokenized text is not working properly. The only term that's making it into your dictionary is 'java' - everything else is being filtered out. Furthermore, your tf vectors have a single dimension

Re: Problem converting tokenized documents into TFIDF vectors

2014-01-26 Thread Scott C. Cote
Drew, I'm sorry - I'm derelict (as opposed to dirichlet) in responding that I got passed my problem. It was the min freq that was killing me. Forgot about that parameter. Thank you for your assist. Hope to be able to return the favor. Am on the hook to update documentation for Mahout already

Re: generic latent variable recommender question

2014-01-26 Thread Ted Dunning
On Sun, Jan 26, 2014 at 9:36 AM, Pat Ferrel p...@occamsmachete.com wrote: I think I’ll leave dithering out until it goes live because it would seem to make the eyeball test easier. I doubt all these experiments will survive. With anti-flood if you turn the epsilon parameter to 1 (makes

Re: Problem converting tokenized documents into TFIDF vectors

2014-01-26 Thread Suneel Marthi
Scott, FYI... 0.9 Release is not official yet. The project trunk's still at 0.9-SNAPSHOT. Please feel free to update the documentation. On Sunday, January 26, 2014 1:34 PM, Scott C. Cote scottcc...@gmail.com wrote: Drew, I'm sorry - I'm derelict (as opposed to dirichlet) in responding

Re: Problem converting tokenized documents into TFIDF vectors

2014-01-26 Thread Scott C. Cote
I understand that it is not official. Am just trying to provide another test opportunity for the .9 release. SCott On 1/26/14 1:05 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Scott, FYI... 0.9 Release is not official yet. The project trunk's still at 0.9-SNAPSHOT. Please feel free to

Re: generic latent variable recommender question

2014-01-26 Thread Tevfik Aytekin
Thanks for the answers, actually I worked on a similar issue, increasing the diversity of top-N lists (http://link.springer.com/article/10.1007%2Fs10844-013-0252-9). Clustering-based approaches produce good results and they are very fast compared to some optimization based techniques. Also it