Re: mahout 1.0 on EMR with spark item-similarity

2015-04-21 Thread Pat Ferrel
Gave you the wrong schema entries for the advice about queries. Check with Solr documentation, which always trumps my guesses. To use token phrases do the following: NOTICE NO TOKENIZER OR ANALYZER USED—> > indexed="true”/> My bad, this is for multi-valued fields, see ab

Re: SparseVectorsFromSequenceFiles tfidf fail

2015-04-21 Thread mw
Mahout 0.10.0 On 04/21/2015 02:05 PM, Suneel Marthi wrote: What's the Mahout Version# u r running with? On Tue, Apr 21, 2015 at 6:37 AM, mw wrote: Hello, I am trying to get tfidf vectors from a corpus of 100k documents. I noticed that tfidf sequence file is empty, while the tf vectors are n

Re: SparseVectorsFromSequenceFiles tfidf fail

2015-04-21 Thread Suneel Marthi
What's the Mahout Version# u r running with? On Tue, Apr 21, 2015 at 6:37 AM, mw wrote: > Hello, > > I am trying to get tfidf vectors from a corpus of 100k documents. I > noticed that tfidf sequence file is empty, while the tf vectors are not. > > Here is the log from SparseVectorsFromSequenceFi

SparseVectorsFromSequenceFiles tfidf fail

2015-04-21 Thread mw
Hello, I am trying to get tfidf vectors from a corpus of 100k documents. I noticed that tfidf sequence file is empty, while the tf vectors are not. Here is the log from SparseVectorsFromSequenceFiles: INFO org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1