On the other hand I can't seem to replicate your error.

On 30 August 2014 21:56, Joel Nothman <[email protected]> wrote:

> That's not a solution I'm happy with :s
>
>
> On 30 August 2014 21:35, Lakomkin Egor <[email protected]> wrote:
>
>> Joel,
>>
>> Thank you for your reply. I fixed the problem with defining my own
>> transformer, that does the same function as Binarizer, but produces sparse
>> matrix.
>>
>> Regards, Egor
>>
>>
>> 2014-08-30 18:07 GMT+08:00 Joel Nothman <[email protected]>:
>>
>> I cannot immediately tell why this doesn't work.
>>>
>>> Firstly, I assume (and hope) it has nothing to do with
>>> transformer_weights. Check that removing this still results in the error.
>>>
>>> The error implies that the transformers (pipelines) are producing data
>>> of different shape. Perhaps adding another transformer like this will help.
>>> Perhaps you should add a DebugTransformer into each pipeline:
>>>
>>> class DebugTransformer(TransformerMixin):
>>>     def __init__(self, name):
>>>         self.name = name
>>>
>>>     def transform(self, X):
>>>         print(self.name, 'got', X.shape)
>>>         return X
>>>
>>>     def fit(self, X, y=None):
>>>         return self
>>>
>>> and at least check the shapes directly.
>>>
>>> - Joel
>>>
>>>
>>>
>>> On 30 August 2014 12:48, Lakomkin Egor <[email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have heterogeneous data with text and binary features and I try to
>>>> handle it in FeatureUnion. I use HashingVectorizer for text data and
>>>> Binarizer for integer data(i need only know if the value of the feature >
>>>> 0).
>>>>
>>>> The problem is that the naive code that I have written did not work out
>>>> of the box. Is there any example of using together text and binary data in
>>>> FeatureUnion?
>>>>
>>>> I attached error description below and code/structure of Feature Union
>>>> that I tried. Thanks for help in advance!
>>>>
>>>> Platform: Windows 7, 64-bit, scikit-learn : 0.15.1
>>>> The error:
>>>> X_batch = transformer.transform(X_batch)
>>>>   File "C:\Anaconda\lib\site-packages\sklearn\pipeline.py", line 384,
>>>> in transform
>>>>     Xs = sparse.hstack(Xs).tocsr()
>>>>   File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line
>>>> 453, in hstack
>>>>     return bmat([blocks], format=format, dtype=dtype)
>>>>   File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line
>>>> 567, in bmat
>>>>     raise ValueError('blocks[%d,:] has incompatible row dimensions' % i)
>>>> ValueError: blocks[0,:] has incompatible row dimensions
>>>>
>>>> Data that I feed to the transformer(in batches) is in the form [
>>>> {'title' : ..., 'description' : '', 'phone_flag' : 1}, .. ]
>>>>
>>>> FeatureUnion structure that I use:
>>>>
>>>> transformer = FeatureUnion([
>>>>         ('description', Pipeline([
>>>>                 ('get', GetItemTransformer('description')),
>>>>                 ('vectorize',HashingVectorizer(encoding='utf-8',
>>>> n_features = N_TEXT_FEATURES, analyzer=analyzer)),
>>>>             ])
>>>>         ),
>>>>         ('title', Pipeline([
>>>>                 ('get', GetItemTransformer('title')),
>>>>                 ('vectorize',HashingVectorizer(encoding='utf-8',
>>>> n_features = N_TEXT_FEATURES, analyzer=analyzer)),
>>>>             ])
>>>>         ),
>>>>         ('flag',
>>>>             Pipeline([
>>>>                 ('get', GetItemTransformer('phone_flag')),
>>>>                 ('vectorize',Binarizer()),
>>>>             ])
>>>>         ),
>>>>     ],transformer_weights={'title': 2.0, 'description' : 1.0})
>>>>
>>>>
>>>> GetItemTransformer
>>>>
>>>> class GetItemTransformer(TransformerMixin):
>>>>     def __init__(self, field):
>>>>         self.field = field
>>>>
>>>>     def transform(self,X):
>>>>         if type(X) == type([]):
>>>>             return [x[self.field] for x in X]
>>>>         raise Exception("Not supported")
>>>>
>>>>     def fit(self,X,Y=None, **fit_params):
>>>>         return self
>>>>
>>>> Regards, Egor
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Slashdot TV.
>>>> Video for Nerds.  Stuff that matters.
>>>> http://tv.slashdot.org/
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Slashdot TV.
>>> Video for Nerds.  Stuff that matters.
>>> http://tv.slashdot.org/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.
>> Video for Nerds.  Stuff that matters.
>> http://tv.slashdot.org/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to