That's not a solution I'm happy with :s

On 30 August 2014 21:35, Lakomkin Egor <[email protected]> wrote:

> Joel,
>
> Thank you for your reply. I fixed the problem with defining my own
> transformer, that does the same function as Binarizer, but produces sparse
> matrix.
>
> Regards, Egor
>
>
> 2014-08-30 18:07 GMT+08:00 Joel Nothman <[email protected]>:
>
> I cannot immediately tell why this doesn't work.
>>
>> Firstly, I assume (and hope) it has nothing to do with
>> transformer_weights. Check that removing this still results in the error.
>>
>> The error implies that the transformers (pipelines) are producing data of
>> different shape. Perhaps adding another transformer like this will help.
>> Perhaps you should add a DebugTransformer into each pipeline:
>>
>> class DebugTransformer(TransformerMixin):
>>     def __init__(self, name):
>>         self.name = name
>>
>>     def transform(self, X):
>>         print(self.name, 'got', X.shape)
>>         return X
>>
>>     def fit(self, X, y=None):
>>         return self
>>
>> and at least check the shapes directly.
>>
>> - Joel
>>
>>
>>
>> On 30 August 2014 12:48, Lakomkin Egor <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I have heterogeneous data with text and binary features and I try to
>>> handle it in FeatureUnion. I use HashingVectorizer for text data and
>>> Binarizer for integer data(i need only know if the value of the feature >
>>> 0).
>>>
>>> The problem is that the naive code that I have written did not work out
>>> of the box. Is there any example of using together text and binary data in
>>> FeatureUnion?
>>>
>>> I attached error description below and code/structure of Feature Union
>>> that I tried. Thanks for help in advance!
>>>
>>> Platform: Windows 7, 64-bit, scikit-learn : 0.15.1
>>> The error:
>>> X_batch = transformer.transform(X_batch)
>>>   File "C:\Anaconda\lib\site-packages\sklearn\pipeline.py", line 384, in
>>> transform
>>>     Xs = sparse.hstack(Xs).tocsr()
>>>   File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line
>>> 453, in hstack
>>>     return bmat([blocks], format=format, dtype=dtype)
>>>   File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line
>>> 567, in bmat
>>>     raise ValueError('blocks[%d,:] has incompatible row dimensions' % i)
>>> ValueError: blocks[0,:] has incompatible row dimensions
>>>
>>> Data that I feed to the transformer(in batches) is in the form [
>>> {'title' : ..., 'description' : '', 'phone_flag' : 1}, .. ]
>>>
>>> FeatureUnion structure that I use:
>>>
>>> transformer = FeatureUnion([
>>>         ('description', Pipeline([
>>>                 ('get', GetItemTransformer('description')),
>>>                 ('vectorize',HashingVectorizer(encoding='utf-8',
>>> n_features = N_TEXT_FEATURES, analyzer=analyzer)),
>>>             ])
>>>         ),
>>>         ('title', Pipeline([
>>>                 ('get', GetItemTransformer('title')),
>>>                 ('vectorize',HashingVectorizer(encoding='utf-8',
>>> n_features = N_TEXT_FEATURES, analyzer=analyzer)),
>>>             ])
>>>         ),
>>>         ('flag',
>>>             Pipeline([
>>>                 ('get', GetItemTransformer('phone_flag')),
>>>                 ('vectorize',Binarizer()),
>>>             ])
>>>         ),
>>>     ],transformer_weights={'title': 2.0, 'description' : 1.0})
>>>
>>>
>>> GetItemTransformer
>>>
>>> class GetItemTransformer(TransformerMixin):
>>>     def __init__(self, field):
>>>         self.field = field
>>>
>>>     def transform(self,X):
>>>         if type(X) == type([]):
>>>             return [x[self.field] for x in X]
>>>         raise Exception("Not supported")
>>>
>>>     def fit(self,X,Y=None, **fit_params):
>>>         return self
>>>
>>> Regards, Egor
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Slashdot TV.
>>> Video for Nerds.  Stuff that matters.
>>> http://tv.slashdot.org/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.
>> Video for Nerds.  Stuff that matters.
>> http://tv.slashdot.org/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to