Hi, Carlton, sounds like you are looking for multilabel classification and your target array has the shape [n_samples, n_outputs]? If the output shape is consistent (aka all output label arrays have 13 columns), you should be fine, otherwise, you could use the MultiLabelBinarizer (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html#sklearn.preprocessing.MultiLabelBinarizer).
Also, the RandomForestClassifier should support multillabel classification. Best, Sebastian > On Jan 21, 2017, at 12:59 PM, Carlton Banks <[email protected]> wrote: > > Most of the machine learning library i’ve tried has an option of of just give > the dimension… > In this case my input consist of an numpy.ndarray with shape (x,2050) and the > output is an numpy.ndarray with shape (x,13) > x is different for each set… > But for each set is the number of columns consistent. > > Column consistency is usually enough for most library tools i’ve worked with… > But is this not the case here? >> Den 21. jan. 2017 kl. 18.42 skrev Jacob Schreiber <[email protected]>: >> >> I don't understand what you mean. Does each sample have a fixed number of >> features or not? >> >> On Sat, Jan 21, 2017 at 9:35 AM, Carlton Banks <[email protected]> wrote: >> Thanks for the response! >> >> If you see it in 1d then yes…. it has variable length. In 2d will the number >> of columns always be constant both for the input and output. >> >>> Den 21. jan. 2017 kl. 18.25 skrev Jacob Schreiber <[email protected]>: >>> >>> If what you're saying is that you have a variable length input, then most >>> sklearn classifiers won't work on this data. They expect a fixed feature >>> set. Perhaps you could try extracting a set of informative features being >>> fed into the classifier? >>> >>> On Sat, Jan 21, 2017 at 3:18 AM, Carlton Banks <[email protected]> wrote: >>> Hi guys.. >>> >>> I am currently working on a ASR project in which the objective is to >>> substitute part of the general ASR framework with some form of neural >>> network, to see whether the tested part improves in any way. >>> >>> I started working with the feature extraction and tried, to make a neural >>> network (NN) that could create MFCC features. I already know what the >>> desired output is supposed to be, so the problem boils down to a simple >>> input - output mapping. Problem here is the my NN doesn’t seem to perform >>> that well.. and i seem to get pretty large error for some reason. >>> >>> I therefore wanted to give random forrest a try, and see whether it could >>> provide me a better result. >>> >>> I am currently storing my input and output in numpy.ndarrays, in which the >>> input and output columns is consistent throughout all the examples, but the >>> number of rows changes >>> depending on length of the audio file. >>> >>> Is it possible with the random forrest implementation in scikit-learn to >>> train a random forrest to map an input an output, given they are stored >>> numpy.ndarrays? >>> Or do i have do it in a different way? and if so how? >>> >>> kind regards >>> >>> Carl truz >>> _______________________________________________ >>> scikit-learn mailing list >>> [email protected] >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> [email protected] >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
