its EncodedVectorsFromSequenceFiles.java I believe ------ Robin Anil
On Tue, Jul 31, 2012 at 6:05 PM, Eric Friedman <[email protected]>wrote: > Can you point me to the class I should look at to see how this is done? > > On Tue, Jul 31, 2012 at 10:49 AM, Robin Anil <[email protected]> wrote: > > You can pass in any vector(not just a tfidf vector). For example the > > asf-email example script using Vectors generated using the randomized > > encoding. > > ------ > > Robin Anil > > > > > > On Tue, Jul 31, 2012 at 12:26 PM, Sean Owen <[email protected]> wrote: > > > >> I don't know this code too much, but, there is simply a step in front > >> I believe that vectorizes text with TF-IDF. The result are simple > >> vectors. You could just inject your vectors (i.e. real-value > >> attributes) at that stage and skip the TF-IDF. It may need a little > >> hacking. > >> > >> On Tue, Jul 31, 2012 at 6:21 PM, Eric Friedman <[email protected]> > >> wrote: > >> > All of the examples that I've found for training NB classifiers seem > >> > to have textual data as input. Is there a way to build a classifier > >> > with more general attributes? > >> > > >> > I found this jira ticket > >> > (https://issues.apache.org/jira/browse/MAHOUT-286), but it's been > >> > closed:duplicate under > >> > https://issues.apache.org/jira/browse/MAHOUT-155, which doesn't seem > >> > to address the underlying question. > >> > > >> > I know that I can do this with weka, but not at scale -- is mahout > >> > only able to build textual classifiers? > >> > > >> > Thanks, > >> > Eric > >> >
