subject:"\[Scikit\-learn\-general\] Subclassing vectorizers"

Re: [Scikit-learn-general] Subclassing vectorizers

2016-03-23 Thread Fred Mailhot

Thanks very much everyone; seems to be working now!


On 23 March 2016 at 00:58, Sebastian Raschka  wrote:

> Hah, and I just wanted to write regarding the VotingClassifier — I
> remember my struggle quite well when I tried to to make it pipeline and
> GridSearch compatible until I figured that one out :P
>
> > On Mar 23, 2016, at 12:34 AM, Joel Nothman 
> wrote:
> >
> > And I lied that none of the scikit-learn estimators define their own
> get_params. Of course the following do: VotingClassifier, Kernel (and
> subclasses), Pipeline and FeatureUnion
> >
> > On 23 March 2016 at 15:04, Joel Nothman  wrote:
> > something like the following may suffice:
> >
> > def get_params(self, deep=True):
> > out = super(WordCooccurrenceVectorizer, self).get_params(deep=deep)
> > out['w2v_clusters'] = self.w2v_clusters
> > return out
> >
> > On 23 March 2016 at 15:01, Joel Nothman  wrote:
> > Hi Fred,
> >
> > We use the __init__ signature to get the list of parameters that (a) can
> be set by grid search; (b) need to be copied to a cloned instance of the
> estimator (with any fitted model discarded) in constructing ensembles,
> cross validation, etc. While none of the scikit-learn library of estimators
> do this, in practice you can overload get_params to define your own
> parameter listing. See
> http://scikit-learn.org/stable/developers/contributing.html#get-params-and-set-params
> >
> > On 23 March 2016 at 14:45, Fred Mailhot  wrote:
> > Hello list,
> >
> > Firstly, thanks for this incredible package; I use it daily at work. Now
> on to the meat: I'm trying to subclass TfidfVectorizer and running into
> issues. I want to specify an extra param for __init__() that points to a
> file that gets used in build_analyzer(). Skipping irrelevant bits, I've got
> the following:
> >
> > #==
> > class WordCooccurrenceVectorizer(TfidfVectorizer):
> >
> > ### override __init__ to add w2v_clusters arg
> > # see
> http://stackoverflow.com/questions/2215923/avoid-specifying-all-arguments-in-a-subclass
> > # for explanation of syntax
> > def __init__(self, *args, **kwargs):
> > try:
> > self.w2v_cluster_path = kwargs.pop("w2v_clusters")
> > except KeyError:
> > pass
> > super(WordCooccurrenceVectorizer, self).__init__(*args, **kwargs)
> >
> > def build_analyzer(self):
> > preprocess = self.build_preprocessor()
> > stopwords = self.get_stop_words()
> > w2v_clusters = self.load_w2v_clusters()
> > tokenize = self.build_tokenizer()
> > return lambda doc:
> self._nwise(tokenize(preprocess(self.decode(doc))), stopwords, w2v_clusters)
> > [...]
> > #==
> >
> > I can instantiate this, but when I want to inspect it, I get the
> following (this is in ipython, in a script it just hangs):
> >
> > #==
> > In [2]: vec = WordCooccurrenceVectorizer(ngram_range=(2,2),
> stop_words="english", max_df=0.5, min_df=1, max_features=1,
> w2v_clusters="clusters.20160322_1803.w2v", binary=True)
> >
> > In [3]: vec
> > Out[3]:
> ---
> > RuntimeError  Traceback (most recent call
> last)
> >
> /Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/IPython/core/formatters.pyc
> in __call__(self, obj)
> > 697 type_pprinters=self.type_printers,
> > 698 deferred_pprinters=self.deferred_printers)
> > --> 699 printer.pretty(obj)
> > 700 printer.flush()
> > 701 return stream.getvalue()
> >
> > [...]
> >
> >
> /Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/sklearn/base.pyc
> in _get_param_names(cls)
> > 193" %s with constructor %s
> doesn't "
> > 194" follow this convention."
> > --> 195% (cls, init_signature))
> > 196 # Extract and sort argument names excluding 'self'
> > 197 return sorted([p.name for p in parameters])
> >
> > RuntimeError: scikit-learn estimators should always specify their
> parameters in the signature of their __init__ (no varargs).  'cooc_vectorizer.WordCooccurrenceVectorizer'> with constructor (,
> *args, **kwargs) doesn't  follow this convention.
> >
> > In [4]:
> > #==
> >
> > The error is clear enough -- I can't use *args and **kwargs in a sklearn
> estimator's __init__() -- but I'm not sure what the correct way is to do
> what I need to do. Do I literally need to specify all of the __init__
> params in my subclass and then pass them on to the __init__ of super()? If
> so, what's the reason for setting this up this way?
> >
> >
> > Thanks for any

Re: [Scikit-learn-general] Subclassing vectorizers

2016-03-22 Thread Sebastian Raschka

Hah, and I just wanted to write regarding the VotingClassifier — I remember my 
struggle quite well when I tried to to make it pipeline and GridSearch 
compatible until I figured that one out :P

> On Mar 23, 2016, at 12:34 AM, Joel Nothman  wrote:
> 
> And I lied that none of the scikit-learn estimators define their own 
> get_params. Of course the following do: VotingClassifier, Kernel (and 
> subclasses), Pipeline and FeatureUnion
> 
> On 23 March 2016 at 15:04, Joel Nothman  wrote:
> something like the following may suffice:
> 
> def get_params(self, deep=True):
> out = super(WordCooccurrenceVectorizer, self).get_params(deep=deep)
> out['w2v_clusters'] = self.w2v_clusters
> return out
> 
> On 23 March 2016 at 15:01, Joel Nothman  wrote:
> Hi Fred,
> 
> We use the __init__ signature to get the list of parameters that (a) can be 
> set by grid search; (b) need to be copied to a cloned instance of the 
> estimator (with any fitted model discarded) in constructing ensembles, cross 
> validation, etc. While none of the scikit-learn library of estimators do 
> this, in practice you can overload get_params to define your own parameter 
> listing. See 
> http://scikit-learn.org/stable/developers/contributing.html#get-params-and-set-params
> 
> On 23 March 2016 at 14:45, Fred Mailhot  wrote:
> Hello list,
> 
> Firstly, thanks for this incredible package; I use it daily at work. Now on 
> to the meat: I'm trying to subclass TfidfVectorizer and running into issues. 
> I want to specify an extra param for __init__() that points to a file that 
> gets used in build_analyzer(). Skipping irrelevant bits, I've got the 
> following:
> 
> #==
> class WordCooccurrenceVectorizer(TfidfVectorizer):
> 
> ### override __init__ to add w2v_clusters arg
> # see 
> http://stackoverflow.com/questions/2215923/avoid-specifying-all-arguments-in-a-subclass
> # for explanation of syntax
> def __init__(self, *args, **kwargs):
> try:
> self.w2v_cluster_path = kwargs.pop("w2v_clusters")
> except KeyError:
> pass
> super(WordCooccurrenceVectorizer, self).__init__(*args, **kwargs)
> 
> def build_analyzer(self):
> preprocess = self.build_preprocessor()
> stopwords = self.get_stop_words()
> w2v_clusters = self.load_w2v_clusters()
> tokenize = self.build_tokenizer()
> return lambda doc: 
> self._nwise(tokenize(preprocess(self.decode(doc))), stopwords, w2v_clusters)
> [...]
> #==
> 
> I can instantiate this, but when I want to inspect it, I get the following 
> (this is in ipython, in a script it just hangs):
> 
> #==
> In [2]: vec = WordCooccurrenceVectorizer(ngram_range=(2,2), 
> stop_words="english", max_df=0.5, min_df=1, max_features=1, 
> w2v_clusters="clusters.20160322_1803.w2v", binary=True)
> 
> In [3]: vec
> Out[3]: 
> ---
> RuntimeError  Traceback (most recent call last)
> /Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/IPython/core/formatters.pyc
>  in __call__(self, obj)
> 697 type_pprinters=self.type_printers,
> 698 deferred_pprinters=self.deferred_printers)
> --> 699 printer.pretty(obj)
> 700 printer.flush()
> 701 return stream.getvalue()
> 
> [...]
> 
> /Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/sklearn/base.pyc
>  in _get_param_names(cls)
> 193" %s with constructor %s doesn't "
> 194" follow this convention."
> --> 195% (cls, init_signature))
> 196 # Extract and sort argument names excluding 'self'
> 197 return sorted([p.name for p in parameters])
> 
> RuntimeError: scikit-learn estimators should always specify their parameters 
> in the signature of their __init__ (no varargs).  'cooc_vectorizer.WordCooccurrenceVectorizer'> with constructor (, 
> *args, **kwargs) doesn't  follow this convention.
> 
> In [4]:
> #==
> 
> The error is clear enough -- I can't use *args and **kwargs in a sklearn 
> estimator's __init__() -- but I'm not sure what the correct way is to do what 
> I need to do. Do I literally need to specify all of the __init__ params in my 
> subclass and then pass them on to the __init__ of super()? If so, what's the 
> reason for setting this up this way?
> 
> 
> Thanks for any pointers/guidance,
> Fred.
> 
> 
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to

Re: [Scikit-learn-general] Subclassing vectorizers

2016-03-22 Thread Joel Nothman

And I lied that none of the scikit-learn estimators define their own
get_params. Of course the following do: VotingClassifier, Kernel (and
subclasses), Pipeline and FeatureUnion

On 23 March 2016 at 15:04, Joel Nothman  wrote:

> something like the following may suffice:
>
> def get_params(self, deep=True):
> out = super(WordCooccurrenceVectorizer, self).get_params(deep=deep)
> out['w2v_clusters'] = self.w2v_clusters
> return out
>
> On 23 March 2016 at 15:01, Joel Nothman  wrote:
>
>> Hi Fred,
>>
>> We use the __init__ signature to get the list of parameters that (a) can
>> be set by grid search; (b) need to be copied to a cloned instance of the
>> estimator (with any fitted model discarded) in constructing ensembles,
>> cross validation, etc. While none of the scikit-learn library of estimators
>> do this, in practice you can overload get_params to define your own
>> parameter listing. See
>> http://scikit-learn.org/stable/developers/contributing.html#get-params-and-set-params
>>
>> On 23 March 2016 at 14:45, Fred Mailhot  wrote:
>>
>>> Hello list,
>>>
>>> Firstly, thanks for this incredible package; I use it daily at work. Now
>>> on to the meat: I'm trying to subclass TfidfVectorizer and running into
>>> issues. I want to specify an extra param for __init__() that points to a
>>> file that gets used in build_analyzer(). Skipping irrelevant bits, I've got
>>> the following:
>>>
>>> #==
>>> class WordCooccurrenceVectorizer(TfidfVectorizer):
>>>
>>> ### override __init__ to add w2v_clusters arg
>>> # see
>>> http://stackoverflow.com/questions/2215923/avoid-specifying-all-arguments-in-a-subclass
>>> # for explanation of syntax
>>> def __init__(self, *args, **kwargs):
>>> try:
>>> self.w2v_cluster_path = kwargs.pop("w2v_clusters")
>>> except KeyError:
>>> pass
>>> super(WordCooccurrenceVectorizer, self).__init__(*args, **kwargs)
>>>
>>> def build_analyzer(self):
>>> preprocess = self.build_preprocessor()
>>> stopwords = self.get_stop_words()
>>> w2v_clusters = self.load_w2v_clusters()
>>> tokenize = self.build_tokenizer()
>>> return lambda doc:
>>> self._nwise(tokenize(preprocess(self.decode(doc))), stopwords, w2v_clusters)
>>> [...]
>>> #==
>>>
>>> I can instantiate this, but when I want to inspect it, I get the
>>> following (this is in ipython, in a script it just hangs):
>>>
>>> #==
>>> In [2]: vec = WordCooccurrenceVectorizer(ngram_range=(2,2),
>>> stop_words="english", max_df=0.5, min_df=1, max_features=1,
>>> w2v_clusters="clusters.20160322_1803.w2v", binary=True)
>>>
>>> In [3]: vec
>>> Out[3]:
>>> ---
>>> RuntimeError  Traceback (most recent call
>>> last)
>>> /Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/IPython/core/formatters.pyc
>>> in __call__(self, obj)
>>> 697 type_pprinters=self.type_printers,
>>> 698 deferred_pprinters=self.deferred_printers)
>>> --> 699 printer.pretty(obj)
>>> 700 printer.flush()
>>> 701 return stream.getvalue()
>>>
>>> [...]
>>>
>>> /Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/sklearn/base.pyc
>>> in _get_param_names(cls)
>>> 193" %s with constructor %s
>>> doesn't "
>>> 194" follow this convention."
>>> --> 195% (cls, init_signature))
>>> 196 # Extract and sort argument names excluding 'self'
>>> 197 return sorted([p.name for p in parameters])
>>>
>>> RuntimeError: scikit-learn estimators should always specify their
>>> parameters in the signature of their __init__ (no varargs). >> 'cooc_vectorizer.WordCooccurrenceVectorizer'> with constructor (,
>>> *args, **kwargs) doesn't  follow this convention.
>>>
>>> In [4]:
>>> #==
>>>
>>> The error is clear enough -- I can't use *args and **kwargs in a sklearn
>>> estimator's __init__() -- but I'm not sure what the correct way is to do
>>> what I need to do. Do I literally need to specify all of the __init__
>>> params in my subclass and then pass them on to the __init__ of super()? If
>>> so, what's the reason for setting this up this way?
>>>
>>>
>>> Thanks for any pointers/guidance,
>>> Fred.
>>>
>>>
>>>
>>> --
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351=/4140
>>> ___
>>>

Re: [Scikit-learn-general] Subclassing vectorizers

2016-03-22 Thread Joel Nothman

something like the following may suffice:

def get_params(self, deep=True):
out = super(WordCooccurrenceVectorizer, self).get_params(deep=deep)
out['w2v_clusters'] = self.w2v_clusters
return out

On 23 March 2016 at 15:01, Joel Nothman  wrote:

> Hi Fred,
>
> We use the __init__ signature to get the list of parameters that (a) can
> be set by grid search; (b) need to be copied to a cloned instance of the
> estimator (with any fitted model discarded) in constructing ensembles,
> cross validation, etc. While none of the scikit-learn library of estimators
> do this, in practice you can overload get_params to define your own
> parameter listing. See
> http://scikit-learn.org/stable/developers/contributing.html#get-params-and-set-params
>
> On 23 March 2016 at 14:45, Fred Mailhot  wrote:
>
>> Hello list,
>>
>> Firstly, thanks for this incredible package; I use it daily at work. Now
>> on to the meat: I'm trying to subclass TfidfVectorizer and running into
>> issues. I want to specify an extra param for __init__() that points to a
>> file that gets used in build_analyzer(). Skipping irrelevant bits, I've got
>> the following:
>>
>> #==
>> class WordCooccurrenceVectorizer(TfidfVectorizer):
>>
>> ### override __init__ to add w2v_clusters arg
>> # see
>> http://stackoverflow.com/questions/2215923/avoid-specifying-all-arguments-in-a-subclass
>> # for explanation of syntax
>> def __init__(self, *args, **kwargs):
>> try:
>> self.w2v_cluster_path = kwargs.pop("w2v_clusters")
>> except KeyError:
>> pass
>> super(WordCooccurrenceVectorizer, self).__init__(*args, **kwargs)
>>
>> def build_analyzer(self):
>> preprocess = self.build_preprocessor()
>> stopwords = self.get_stop_words()
>> w2v_clusters = self.load_w2v_clusters()
>> tokenize = self.build_tokenizer()
>> return lambda doc:
>> self._nwise(tokenize(preprocess(self.decode(doc))), stopwords, w2v_clusters)
>> [...]
>> #==
>>
>> I can instantiate this, but when I want to inspect it, I get the
>> following (this is in ipython, in a script it just hangs):
>>
>> #==
>> In [2]: vec = WordCooccurrenceVectorizer(ngram_range=(2,2),
>> stop_words="english", max_df=0.5, min_df=1, max_features=1,
>> w2v_clusters="clusters.20160322_1803.w2v", binary=True)
>>
>> In [3]: vec
>> Out[3]:
>> ---
>> RuntimeError  Traceback (most recent call
>> last)
>> /Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/IPython/core/formatters.pyc
>> in __call__(self, obj)
>> 697 type_pprinters=self.type_printers,
>> 698 deferred_pprinters=self.deferred_printers)
>> --> 699 printer.pretty(obj)
>> 700 printer.flush()
>> 701 return stream.getvalue()
>>
>> [...]
>>
>> /Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/sklearn/base.pyc
>> in _get_param_names(cls)
>> 193" %s with constructor %s
>> doesn't "
>> 194" follow this convention."
>> --> 195% (cls, init_signature))
>> 196 # Extract and sort argument names excluding 'self'
>> 197 return sorted([p.name for p in parameters])
>>
>> RuntimeError: scikit-learn estimators should always specify their
>> parameters in the signature of their __init__ (no varargs). > 'cooc_vectorizer.WordCooccurrenceVectorizer'> with constructor (,
>> *args, **kwargs) doesn't  follow this convention.
>>
>> In [4]:
>> #==
>>
>> The error is clear enough -- I can't use *args and **kwargs in a sklearn
>> estimator's __init__() -- but I'm not sure what the correct way is to do
>> what I need to do. Do I literally need to specify all of the __init__
>> params in my subclass and then pass them on to the __init__ of super()? If
>> so, what's the reason for setting this up this way?
>>
>>
>> Thanks for any pointers/guidance,
>> Fred.
>>
>>
>>
>> --
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351=/4140
>> ___
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to

Re: [Scikit-learn-general] Subclassing vectorizers

2016-03-22 Thread Joel Nothman

Hi Fred,

We use the __init__ signature to get the list of parameters that (a) can be
set by grid search; (b) need to be copied to a cloned instance of the
estimator (with any fitted model discarded) in constructing ensembles,
cross validation, etc. While none of the scikit-learn library of estimators
do this, in practice you can overload get_params to define your own
parameter listing. See
http://scikit-learn.org/stable/developers/contributing.html#get-params-and-set-params

On 23 March 2016 at 14:45, Fred Mailhot  wrote:

> Hello list,
>
> Firstly, thanks for this incredible package; I use it daily at work. Now
> on to the meat: I'm trying to subclass TfidfVectorizer and running into
> issues. I want to specify an extra param for __init__() that points to a
> file that gets used in build_analyzer(). Skipping irrelevant bits, I've got
> the following:
>
> #==
> class WordCooccurrenceVectorizer(TfidfVectorizer):
>
> ### override __init__ to add w2v_clusters arg
> # see
> http://stackoverflow.com/questions/2215923/avoid-specifying-all-arguments-in-a-subclass
> # for explanation of syntax
> def __init__(self, *args, **kwargs):
> try:
> self.w2v_cluster_path = kwargs.pop("w2v_clusters")
> except KeyError:
> pass
> super(WordCooccurrenceVectorizer, self).__init__(*args, **kwargs)
>
> def build_analyzer(self):
> preprocess = self.build_preprocessor()
> stopwords = self.get_stop_words()
> w2v_clusters = self.load_w2v_clusters()
> tokenize = self.build_tokenizer()
> return lambda doc:
> self._nwise(tokenize(preprocess(self.decode(doc))), stopwords, w2v_clusters)
> [...]
> #==
>
> I can instantiate this, but when I want to inspect it, I get the following
> (this is in ipython, in a script it just hangs):
>
> #==
> In [2]: vec = WordCooccurrenceVectorizer(ngram_range=(2,2),
> stop_words="english", max_df=0.5, min_df=1, max_features=1,
> w2v_clusters="clusters.20160322_1803.w2v", binary=True)
>
> In [3]: vec
> Out[3]:
> ---
> RuntimeError  Traceback (most recent call last)
> /Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/IPython/core/formatters.pyc
> in __call__(self, obj)
> 697 type_pprinters=self.type_printers,
> 698 deferred_pprinters=self.deferred_printers)
> --> 699 printer.pretty(obj)
> 700 printer.flush()
> 701 return stream.getvalue()
>
> [...]
>
> /Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/sklearn/base.pyc
> in _get_param_names(cls)
> 193" %s with constructor %s
> doesn't "
> 194" follow this convention."
> --> 195% (cls, init_signature))
> 196 # Extract and sort argument names excluding 'self'
> 197 return sorted([p.name for p in parameters])
>
> RuntimeError: scikit-learn estimators should always specify their
> parameters in the signature of their __init__ (no varargs).  'cooc_vectorizer.WordCooccurrenceVectorizer'> with constructor (,
> *args, **kwargs) doesn't  follow this convention.
>
> In [4]:
> #==
>
> The error is clear enough -- I can't use *args and **kwargs in a sklearn
> estimator's __init__() -- but I'm not sure what the correct way is to do
> what I need to do. Do I literally need to specify all of the __init__
> params in my subclass and then pass them on to the __init__ of super()? If
> so, what's the reason for setting this up this way?
>
>
> Thanks for any pointers/guidance,
> Fred.
>
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351=/4140
> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351=/4140___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Subclassing vectorizers

2016-03-22 Thread Fred Mailhot

Hello list,

Firstly, thanks for this incredible package; I use it daily at work. Now on
to the meat: I'm trying to subclass TfidfVectorizer and running into
issues. I want to specify an extra param for __init__() that points to a
file that gets used in build_analyzer(). Skipping irrelevant bits, I've got
the following:

#==
class WordCooccurrenceVectorizer(TfidfVectorizer):

### override __init__ to add w2v_clusters arg
# see
http://stackoverflow.com/questions/2215923/avoid-specifying-all-arguments-in-a-subclass
# for explanation of syntax
def __init__(self, *args, **kwargs):
try:
self.w2v_cluster_path = kwargs.pop("w2v_clusters")
except KeyError:
pass
super(WordCooccurrenceVectorizer, self).__init__(*args, **kwargs)

def build_analyzer(self):
preprocess = self.build_preprocessor()
stopwords = self.get_stop_words()
w2v_clusters = self.load_w2v_clusters()
tokenize = self.build_tokenizer()
return lambda doc:
self._nwise(tokenize(preprocess(self.decode(doc))), stopwords, w2v_clusters)
[...]
#==

I can instantiate this, but when I want to inspect it, I get the following
(this is in ipython, in a script it just hangs):

#==
In [2]: vec = WordCooccurrenceVectorizer(ngram_range=(2,2),
stop_words="english", max_df=0.5, min_df=1, max_features=1,
w2v_clusters="clusters.20160322_1803.w2v", binary=True)

In [3]: vec
Out[3]:
---
RuntimeError  Traceback (most recent call last)
/Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/IPython/core/formatters.pyc
in __call__(self, obj)
697 type_pprinters=self.type_printers,
698 deferred_pprinters=self.deferred_printers)
--> 699 printer.pretty(obj)
700 printer.flush()
701 return stream.getvalue()

[...]

/Users/fredmailhot/anaconda/envs/csai_experiments/lib/python2.7/site-packages/sklearn/base.pyc
in _get_param_names(cls)
193" %s with constructor %s doesn't
"
194" follow this convention."
--> 195% (cls, init_signature))
196 # Extract and sort argument names excluding 'self'
197 return sorted([p.name for p in parameters])

RuntimeError: scikit-learn estimators should always specify their
parameters in the signature of their __init__ (no varargs).  with constructor (,
*args, **kwargs) doesn't  follow this convention.

In [4]:
#==

The error is clear enough -- I can't use *args and **kwargs in a sklearn
estimator's __init__() -- but I'm not sure what the correct way is to do
what I need to do. Do I literally need to specify all of the __init__
params in my subclass and then pass them on to the __init__ of super()? If
so, what's the reason for setting this up this way?


Thanks for any pointers/guidance,
Fred.
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351=/4140___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Subclassing vectorizers

Re: [Scikit-learn-general] Subclassing vectorizers

Re: [Scikit-learn-general] Subclassing vectorizers

Re: [Scikit-learn-general] Subclassing vectorizers

Re: [Scikit-learn-general] Subclassing vectorizers

[Scikit-learn-general] Subclassing vectorizers

6 matches

Site Navigation

Mail list logo

Footer information