Re: Data(Set) creation of for train and test.

Frank Scholten Mon, 03 Feb 2014 13:25:58 -0800

Sorry I didn't properly read your message. The random forest code is quite
different and what I suggested is not applicable.


The DataConverter converts a String to a Vector wrapped by Instance. With
this you can create your training set I think.



On Mon, Feb 3, 2014 at 10:09 PM, Frank Scholten <[email protected]>wrote:

> Have a look at OnlineLogisticRegressionTest.iris().
>
> Here List.subList() is used in combination with Collections.shuffle() to
> make the train and test dataset split.
>
> So you could first read the dataset in a list and then use this trick.
>
> I just pushed an example to Github that also uses this approach but I
> wrapped this logic into a utility
>
> See: https://github.com/frankscholten/mahout-sgd-bank-marketing and
>
>
> https://github.com/frankscholten/mahout-sgd-bank-marketing/blob/master/src/main/java/bankmarketing/util/TrainAndTestSetUtil.java
>
> Cheers,
>
> Frank
>
>
> On Mon, Feb 3, 2014 at 10:01 PM, j.barrett Strausser <
> [email protected]> wrote:
>
>> Two part question.
>>
>> 1. String Descriptor for input data
>>
>> Can anyone confirm my reasoning on the following -
>>
>> I believe the below code does the following.  It says the first column is
>> the feature to be predicted (is a label) all other columns are to be used
>> in the tree construction e.g. as variable to split on.
>>
>> val descriptor = "L N N"
>> val trainDataValues = fileAsStringArray("myTrainFile.csv");
>> val data = DataLoader.loadData(DataLoader.generateDataset(descriptor,
>> false, trainDataValues), trainDataValues);
>>
>> Where my "myTrainFile.csv has a form like
>>
>> "A", .45,.55
>> ...
>> ...
>> "B" 33.3, 22.3
>>
>>
>>
>> 2. String Descriptor for input data
>>
>> I'm now provided a new file "myTestData.csv"
>>
>> This data has no labels, but is otherwise the same as above. So if I
>> attempt to create a dataset an error will be thrown with complain of no
>> label.
>>
>> All I'm interested in is being able to call forest.classify(..., ...) but
>> I'm not sure how to correctly construct my training dataset.
>>
>> I cannot simply split the original dataset as is done in most examples.
>>
>>
>> Any examples showing test data construction independent of the original
>> training set would be appreciated.
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>

Re: Data(Set) creation of for train and test.

Reply via email to