Re: [scikit-learn] numpy integration with random forrest implementation

Sebastian Raschka Sat, 21 Jan 2017 10:27:07 -0800

Hi, Carlton,
sounds like you are looking for multilabel classification and your target array 
has the shape [n_samples, n_outputs]? If the output shape is consistent (aka 
all output label arrays have 13 columns), you should be fine, otherwise, you 
could use the MultiLabelBinarizer 
(http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html#sklearn.preprocessing.MultiLabelBinarizer).


Also, the RandomForestClassifier should support multillabel classification.

Best,
Sebastian

> On Jan 21, 2017, at 12:59 PM, Carlton Banks <[email protected]> wrote:
> 
> Most of the machine learning library i’ve tried has an option of of just give 
> the dimension…
> In this case my input consist of an numpy.ndarray with shape (x,2050) and the 
> output is an numpy.ndarray with shape (x,13) 
> x is different for each  set… 
> But for each set is the number of columns consistent.  
> 
> Column consistency is usually enough for most library tools i’ve worked with… 
> But is this not the case here?
>> Den 21. jan. 2017 kl. 18.42 skrev Jacob Schreiber <[email protected]>:
>> 
>> I don't understand what you mean. Does each sample have a fixed number of 
>> features or not?
>> 
>> On Sat, Jan 21, 2017 at 9:35 AM, Carlton Banks <[email protected]> wrote:
>> Thanks for the response!
>> 
>> If you see it in 1d then yes…. it has variable length. In 2d will the number 
>> of columns always be constant both for the input and output. 
>> 
>>> Den 21. jan. 2017 kl. 18.25 skrev Jacob Schreiber <[email protected]>:
>>> 
>>> If what you're saying is that you have a variable length input, then most 
>>> sklearn classifiers won't work on this data. They expect a fixed feature 
>>> set. Perhaps you could try extracting a set of informative features being 
>>> fed into the classifier?
>>> 
>>> On Sat, Jan 21, 2017 at 3:18 AM, Carlton Banks <[email protected]> wrote:
>>> Hi guys..
>>> 
>>> I am currently working on a ASR project  in which the objective is to 
>>> substitute part of the general ASR framework with some form of neural 
>>> network, to see whether the tested part improves in any way.
>>> 
>>> I started working with the feature extraction and tried, to make a neural 
>>> network (NN) that could create MFCC features. I already know what the 
>>> desired output is supposed to be, so the problem boils down to a simple
>>> input -  output mapping. Problem here is the my NN doesn’t seem to perform 
>>> that well..  and i seem to get pretty large error for some reason.
>>> 
>>> I therefore wanted to give random forrest a try, and see whether it could 
>>> provide me a better result.
>>> 
>>> I am currently storing my input and output in numpy.ndarrays, in which the 
>>> input and output columns is consistent throughout all the examples, but the 
>>> number of rows changes
>>> depending on length of the audio file.
>>> 
>>> Is it possible with the random forrest implementation in scikit-learn to 
>>> train a random forrest to map an input an output, given they are stored 
>>> numpy.ndarrays?
>>> Or do i have do it in a different way? and if so how?
>>> 
>>> kind regards
>>> 
>>> Carl truz
>>> _______________________________________________
>>> scikit-learn mailing list
>>> [email protected]
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>> 
>>> _______________________________________________
>>> scikit-learn mailing list
>>> [email protected]
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] numpy integration with random forrest implementation

Reply via email to