Hi,

I am trying to use random forest for my dataset that includes string values
also. The dataset that i used for training is a csv file but includes some
string categorical values also.

I had read the scikit preprocessing issues and it seems i shoudl have used
DictVectoricer to encode my categorical string values after i put them in a
dict format. But i am not sure how i will use the resulting output at the
random forest code.

Here is what i do basically fro random forest:

def main():
     #read in the training file
     train = csv_io.read_data("train.csv")
     #set the training responses
     target = [x[0] for x in train]
     #set the training features
     train = [x[1:] for x in train]
     #read in the test file
     realtest = csv_io.read_data("test.csv")

     # random forest code
     rf = RandomForestClassifier(n_estimators=150, min_samples_split=2,
n_jobs=-1)
     # fit the training data
     print('fitting the model')
     rf.fit(train, target)
     # run model against test data
     predicted_probs = rf.predict_proba(realtest)

     predicted_probs = ["%f" % x[1] for x in predicted_probs]
     csv_io.write_delimited_file("random_forest_solution.csv",
predicted_probs)

     print ('Random Forest Complete! You Rock! Submit
random_forest_solution.csv to Kaggle')

 if __name__=="__main__":
     main()

Will be happy if someone helps

-- 
Oğuz Yarımtepe
http://about.me/oguzy
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to