Hi, I`m a graduate student utilizing sklean for some data work. 
And when I`m handling the data using the Decision Trees library, I found there 
are some inconvenience:
Neither the classificationTree nor the regressionTree supports categorical 
feature. That means the Decision trees model can only accept continuous 
feature. 
For example, the categorical feature like app name such as google, facebook 
can`t be input into the model, because they can`t be transformed to continuous 
value properly. And there don`t exist a corresponding algorithm to divide 
discrete feature in the Decision Trees library.
However, the CART algorithm itself has considered the use of categorical 
feature. So I have made some modification of Decision Trees library based on 
CART and apply the new model on my own work.  And it proves that the support 
for categorical feature indeed improves the performance, which is very 
necessary for decision tree, I think.
I`m very willing to contribute this to sklearn community, but I`m new to this 
community, not so familiar about the procedure.
Could u give some suggestions or comments on this new feature? Or has anyone 
already processed on this feature? Thank you so much.


Best wishes!







--

顺颂时祺!




李扬 
上海交通大学  电子信息 与 电气工程 学院  
电话:18818212371
地址:上海市闵行区东川路800号
邮编:200240


Yang Li  +86 188 1821 2371
Shanghai Jiao Tong University
School of Electronic,Information and Electrical Engineering F1203026
800 Dongchuan Road, Minhang District, Shanghai 200240




 
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to