2012/8/8 Philipp Singer <[email protected]>:
> Hey!
>
> The problem seems to be the following:
>
> With the TfidfVectorizer you get back a sparse array representation.
>
> I think the GradientBoostingClassifier can't directly work with sparse
> matrices, whereas the first three can.
>
> So you can try it again with:
>
> training_set.toarray()

Hey Brian,

this is correct - I appologize for the poor error message (I need to fix that).

btw: our decision trees (and thus RandomForest and GBRT) are not well
suited for sparse data (or generally lots of features with few split
points) - you might want to consider different techniques.

best,
 Peter

>
> HTH
> Philipp
>
> Am 08.08.2012 22:40, schrieb Brian Wingenroth:
>> Hi,
>>
>> I'm completely new to sklearn, so it's entirely likely that I'm just
>> misunderstanding something very fundamental here.  I thought that the
>> inputs for the GradientBoostingClassifier would be the same as for other
>> classifiers (LinearSVC, MultinomialNB, etc.), but when trying to run the
>> code below, the GradientBoostingClassifier gives me the following error:
>>
>> Traceback (most recent call last):
>>     File "classify.py", line 27, in <module>
>>       c = dict[label].fit(training_set,votes)
>>     File
>> "../lib/python2.7/site-packages/sklearn/ensemble/gradient_boosting.py",
>> line 633, in fit
>>       return super(GradientBoostingClassifier, self).fit(X, y)
>>     File
>> "../lib/python2.7/site-packages/sklearn/ensemble/gradient_boosting.py",
>> line 417, in fit
>>       X = np.asfortranarray(X, dtype=DTYPE)
>>     File "../lib/python2.7/site-packages/numpy/core/numeric.py", line
>> 359, in asfortranarray
>>       return array(a, dtype, copy=False, order='F', ndmin=1)
>> ValueError: setting an array element with a sequence.
>>
>>
>> This leads me to assume that I'm failing to prepare my inputs to the
>> GBClassifier correctly, but have so far been unable to find information
>> that will correct my course.
>>
>> Can anyone shed some light on where I've gone astray?  Thanks.
>>
>> Brian
>>
>>
>> from sklearn.datasets import fetch_20newsgroups
>>
>> from sklearn.feature_extraction.text import TfidfVectorizer
>>
>> from sklearn.naive_bayes import MultinomialNB
>> from sklearn.svm import LinearSVC
>> from sklearn.tree import DecisionTreeClassifier
>> from sklearn.ensemble import GradientBoostingClassifier
>>
>> dict = { 'N Bayes': MultinomialNB(),
>>             'SVC':     LinearSVC(),
>> #          'DTC':     DecisionTreeClassifier(),
>>             'GBC':     GradientBoostingClassifier() }
>>
>> def load_training_set():
>>     myset = fetch_20newsgroups(subset='train')
>>     training_documents = myset.data[:200]
>>     votes_for_training_set = myset.target[:200]
>>     return training_documents, votes_for_training_set
>>
>> documents, votes = load_training_set()
>> vect = TfidfVectorizer(min_n=1, max_n=3)
>> training_set = vect.fit_transform(documents)
>>
>> for label in dict.keys():
>>     print "Trying " + label + " .... "
>>     c = dict[label].fit(training_set,votes)
>>     print "  DONE."
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to