Re: SparseVectorsFromSequenceFiles tfidf fail

2015-04-22 Thread mw
Also i noticed that there must be something wrong when calculating the variance since the file in stdcalc seems to be empty: root@test:[/opt/sparse/stdcalc] # ll total 20K drwxr-xr-x 2 tomcat7 tomcat7 4.0K Apr 22 11:02 . drwxr-xr-x 9 tomcat7 tomcat7 4.0K Apr 22 11:02 .. -rw-r--r-- 1 tomcat7

Re: SparseVectorsFromSequenceFiles tfidf fail

2015-04-21 Thread mw
Mahout 0.10.0 On 04/21/2015 02:05 PM, Suneel Marthi wrote: What's the Mahout Version# u r running with? On Tue, Apr 21, 2015 at 6:37 AM, mw m...@plista.com wrote: Hello, I am trying to get tfidf vectors from a corpus of 100k documents. I noticed that tfidf sequence file is empty, while the

Re: SparseVectorsFromSequenceFiles tfidf fail

2015-04-21 Thread Suneel Marthi
What's the Mahout Version# u r running with? On Tue, Apr 21, 2015 at 6:37 AM, mw m...@plista.com wrote: Hello, I am trying to get tfidf vectors from a corpus of 100k documents. I noticed that tfidf sequence file is empty, while the tf vectors are not. Here is the log from

SparseVectorsFromSequenceFiles tfidf fail

2015-04-21 Thread mw
Hello, I am trying to get tfidf vectors from a corpus of 100k documents. I noticed that tfidf sequence file is empty, while the tf vectors are not. Here is the log from SparseVectorsFromSequenceFiles: INFO org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: