On the other hand I can't seem to replicate your error.
On 30 August 2014 21:56, Joel Nothman <[email protected]> wrote:
> That's not a solution I'm happy with :s
>
>
> On 30 August 2014 21:35, Lakomkin Egor <[email protected]> wrote:
>
>> Joel,
>>
>> Thank you for your reply. I fixed the problem with defining my own
>> transformer, that does the same function as Binarizer, but produces sparse
>> matrix.
>>
>> Regards, Egor
>>
>>
>> 2014-08-30 18:07 GMT+08:00 Joel Nothman <[email protected]>:
>>
>> I cannot immediately tell why this doesn't work.
>>>
>>> Firstly, I assume (and hope) it has nothing to do with
>>> transformer_weights. Check that removing this still results in the error.
>>>
>>> The error implies that the transformers (pipelines) are producing data
>>> of different shape. Perhaps adding another transformer like this will help.
>>> Perhaps you should add a DebugTransformer into each pipeline:
>>>
>>> class DebugTransformer(TransformerMixin):
>>> def __init__(self, name):
>>> self.name = name
>>>
>>> def transform(self, X):
>>> print(self.name, 'got', X.shape)
>>> return X
>>>
>>> def fit(self, X, y=None):
>>> return self
>>>
>>> and at least check the shapes directly.
>>>
>>> - Joel
>>>
>>>
>>>
>>> On 30 August 2014 12:48, Lakomkin Egor <[email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have heterogeneous data with text and binary features and I try to
>>>> handle it in FeatureUnion. I use HashingVectorizer for text data and
>>>> Binarizer for integer data(i need only know if the value of the feature >
>>>> 0).
>>>>
>>>> The problem is that the naive code that I have written did not work out
>>>> of the box. Is there any example of using together text and binary data in
>>>> FeatureUnion?
>>>>
>>>> I attached error description below and code/structure of Feature Union
>>>> that I tried. Thanks for help in advance!
>>>>
>>>> Platform: Windows 7, 64-bit, scikit-learn : 0.15.1
>>>> The error:
>>>> X_batch = transformer.transform(X_batch)
>>>> File "C:\Anaconda\lib\site-packages\sklearn\pipeline.py", line 384,
>>>> in transform
>>>> Xs = sparse.hstack(Xs).tocsr()
>>>> File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line
>>>> 453, in hstack
>>>> return bmat([blocks], format=format, dtype=dtype)
>>>> File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line
>>>> 567, in bmat
>>>> raise ValueError('blocks[%d,:] has incompatible row dimensions' % i)
>>>> ValueError: blocks[0,:] has incompatible row dimensions
>>>>
>>>> Data that I feed to the transformer(in batches) is in the form [
>>>> {'title' : ..., 'description' : '', 'phone_flag' : 1}, .. ]
>>>>
>>>> FeatureUnion structure that I use:
>>>>
>>>> transformer = FeatureUnion([
>>>> ('description', Pipeline([
>>>> ('get', GetItemTransformer('description')),
>>>> ('vectorize',HashingVectorizer(encoding='utf-8',
>>>> n_features = N_TEXT_FEATURES, analyzer=analyzer)),
>>>> ])
>>>> ),
>>>> ('title', Pipeline([
>>>> ('get', GetItemTransformer('title')),
>>>> ('vectorize',HashingVectorizer(encoding='utf-8',
>>>> n_features = N_TEXT_FEATURES, analyzer=analyzer)),
>>>> ])
>>>> ),
>>>> ('flag',
>>>> Pipeline([
>>>> ('get', GetItemTransformer('phone_flag')),
>>>> ('vectorize',Binarizer()),
>>>> ])
>>>> ),
>>>> ],transformer_weights={'title': 2.0, 'description' : 1.0})
>>>>
>>>>
>>>> GetItemTransformer
>>>>
>>>> class GetItemTransformer(TransformerMixin):
>>>> def __init__(self, field):
>>>> self.field = field
>>>>
>>>> def transform(self,X):
>>>> if type(X) == type([]):
>>>> return [x[self.field] for x in X]
>>>> raise Exception("Not supported")
>>>>
>>>> def fit(self,X,Y=None, **fit_params):
>>>> return self
>>>>
>>>> Regards, Egor
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Slashdot TV.
>>>> Video for Nerds. Stuff that matters.
>>>> http://tv.slashdot.org/
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Slashdot TV.
>>> Video for Nerds. Stuff that matters.
>>> http://tv.slashdot.org/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.
>> Video for Nerds. Stuff that matters.
>> http://tv.slashdot.org/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general