Hi,

I'm completely new to sklearn, so it's entirely likely that I'm just 
misunderstanding something very fundamental here.  I thought that the 
inputs for the GradientBoostingClassifier would be the same as for other 
classifiers (LinearSVC, MultinomialNB, etc.), but when trying to run the 
code below, the GradientBoostingClassifier gives me the following error:

Traceback (most recent call last):
   File "classify.py", line 27, in <module>
     c = dict[label].fit(training_set,votes)
   File 
"../lib/python2.7/site-packages/sklearn/ensemble/gradient_boosting.py", 
line 633, in fit
     return super(GradientBoostingClassifier, self).fit(X, y)
   File 
"../lib/python2.7/site-packages/sklearn/ensemble/gradient_boosting.py", 
line 417, in fit
     X = np.asfortranarray(X, dtype=DTYPE)
   File "../lib/python2.7/site-packages/numpy/core/numeric.py", line 
359, in asfortranarray
     return array(a, dtype, copy=False, order='F', ndmin=1)
ValueError: setting an array element with a sequence.


This leads me to assume that I'm failing to prepare my inputs to the 
GBClassifier correctly, but have so far been unable to find information 
that will correct my course.

Can anyone shed some light on where I've gone astray?  Thanks.

Brian


from sklearn.datasets import fetch_20newsgroups

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier

dict = { 'N Bayes': MultinomialNB(),
           'SVC':     LinearSVC(),
#          'DTC':     DecisionTreeClassifier(),
           'GBC':     GradientBoostingClassifier() }

def load_training_set():
   myset = fetch_20newsgroups(subset='train')
   training_documents = myset.data[:200]
   votes_for_training_set = myset.target[:200]
   return training_documents, votes_for_training_set

documents, votes = load_training_set()
vect = TfidfVectorizer(min_n=1, max_n=3)
training_set = vect.fit_transform(documents)

for label in dict.keys():
   print "Trying " + label + " .... "
   c = dict[label].fit(training_set,votes)
   print "  DONE."

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to