Sorry I didn't properly read your message. The random forest code is quite different and what I suggested is not applicable.
The DataConverter converts a String to a Vector wrapped by Instance. With this you can create your training set I think. On Mon, Feb 3, 2014 at 10:09 PM, Frank Scholten <[email protected]>wrote: > Have a look at OnlineLogisticRegressionTest.iris(). > > Here List.subList() is used in combination with Collections.shuffle() to > make the train and test dataset split. > > So you could first read the dataset in a list and then use this trick. > > I just pushed an example to Github that also uses this approach but I > wrapped this logic into a utility > > See: https://github.com/frankscholten/mahout-sgd-bank-marketing and > > > https://github.com/frankscholten/mahout-sgd-bank-marketing/blob/master/src/main/java/bankmarketing/util/TrainAndTestSetUtil.java > > Cheers, > > Frank > > > On Mon, Feb 3, 2014 at 10:01 PM, j.barrett Strausser < > [email protected]> wrote: > >> Two part question. >> >> 1. String Descriptor for input data >> >> Can anyone confirm my reasoning on the following - >> >> I believe the below code does the following. It says the first column is >> the feature to be predicted (is a label) all other columns are to be used >> in the tree construction e.g. as variable to split on. >> >> val descriptor = "L N N" >> val trainDataValues = fileAsStringArray("myTrainFile.csv"); >> val data = DataLoader.loadData(DataLoader.generateDataset(descriptor, >> false, trainDataValues), trainDataValues); >> >> Where my "myTrainFile.csv has a form like >> >> "A", .45,.55 >> ... >> ... >> "B" 33.3, 22.3 >> >> >> >> 2. String Descriptor for input data >> >> I'm now provided a new file "myTestData.csv" >> >> This data has no labels, but is otherwise the same as above. So if I >> attempt to create a dataset an error will be thrown with complain of no >> label. >> >> All I'm interested in is being able to call forest.classify(..., ...) but >> I'm not sure how to correctly construct my training dataset. >> >> I cannot simply split the original dataset as is done in most examples. >> >> >> Any examples showing test data construction independent of the original >> training set would be appreciated. >> >> >> -- >> >> >> https://github.com/bearrito >> @deepbearrito >> > >
