Sebastian,

a few days ago, I asked a very similar question and I got this link as a
response:

 https://github.com/scikit-learn/scikit-learn/issues/2034


I think that you could try something similar.


Best,

Zoraida.-

El 21/08/14 18:48, "Sebastian Okser" <seo...@utu.fi> escribió:

>I am trying to use the pipeline combined with a countvectorizer,
>tfidftransformer and randomforest. However the output of the second step
>is a sparse array and randomforest requires a dense one. How can I add a
>step to allow for a conversion of the matrix from sparse to dense, using
>something along the lines of data.toarray(). Additionally, I would like
>to add some additional features to the dataset after the text has been
>processed. How can I create a step for this (normally I could use
>something like hstack)? My code is as follows:
>
>pipeline = Pipeline([
>    ('vect', CountVectorizer()),
>    ('tfidf', TfidfTransformer()),
>    ('clf', OneVsRestClassifier(SVC(probability=True))),
>])
>I would like to adjust this somehow to the following:
>
>pipeline = Pipeline([
>    ('vect', CountVectorizer()),
>    ('tfidf', TfidfTransformer()),
>    ('change_to_dense', SOME HOW CHANGE TO DENSE),
>    ('add_more_data', SOME HOW ADD FEATURES),
>    ('clf', OneVsRestClassifier(SVC(probability=True))),
>])
>
>My first dataset, lets call it data1 is just an array of sentences. Below
>is an example:
>
>data1 = ['This is the first sentence',
>             'This is the second sentence',
>             'This is the third sentence']
>
>The second dataset is numerical data of the following form:
>
>data2 = array([[0],
>                     [1],
>                     [0]])
>
>
>Thanks!
>--------------------------------------------------------------------------
>----
>Slashdot TV.
>Video for Nerds.  Stuff that matters.
>http://tv.slashdot.org/
>_______________________________________________
>Scikit-learn-general mailing list
>Scikit-learn-general@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


________________________________

Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, puede 
contener información privilegiada o confidencial y es para uso exclusivo de la 
persona o entidad de destino. Si no es usted. el destinatario indicado, queda 
notificado de que la lectura, utilización, divulgación y/o copia sin 
autorización puede estar prohibida en virtud de la legislación vigente. Si ha 
recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente 
por esta misma vía y proceda a su destrucción.

The information contained in this transmission is privileged and confidential 
information intended only for the use of the individual or entity named above. 
If the reader of this message is not the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this communication 
is strictly prohibited. If you have received this transmission in error, do not 
read it. Please immediately reply to the sender that you have received this 
communication in error and then delete it.

Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, pode 
conter informação privilegiada ou confidencial e é para uso exclusivo da pessoa 
ou entidade de destino. Se não é vossa senhoria o destinatário indicado, fica 
notificado de que a leitura, utilização, divulgação e/ou cópia sem autorização 
pode estar proibida em virtude da legislação vigente. Se recebeu esta mensagem 
por erro, rogamos-lhe que nos o comunique imediatamente por esta mesma via e 
proceda a sua destruição

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to