Hi,
I am trying to use random forest for my dataset that includes string values
also. The dataset that i used for training is a csv file but includes some
string categorical values also.
I had read the scikit preprocessing issues and it seems i shoudl have used
DictVectoricer to encode my categorical string values after i put them in a
dict format. But i am not sure how i will use the resulting output at the
random forest code.
Here is what i do basically fro random forest:
def main():
#read in the training file
train = csv_io.read_data("train.csv")
#set the training responses
target = [x[0] for x in train]
#set the training features
train = [x[1:] for x in train]
#read in the test file
realtest = csv_io.read_data("test.csv")
# random forest code
rf = RandomForestClassifier(n_estimators=150, min_samples_split=2,
n_jobs=-1)
# fit the training data
print('fitting the model')
rf.fit(train, target)
# run model against test data
predicted_probs = rf.predict_proba(realtest)
predicted_probs = ["%f" % x[1] for x in predicted_probs]
csv_io.write_delimited_file("random_forest_solution.csv",
predicted_probs)
print ('Random Forest Complete! You Rock! Submit
random_forest_solution.csv to Kaggle')
if __name__=="__main__":
main()
Will be happy if someone helps
--
Oğuz Yarımtepe
http://about.me/oguzy
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general