Hah, and I just wanted to write regarding the VotingClassifier — I remember my
struggle quite well when I tried to to make it pipeline and GridSearch
compatible until I figured that one out :P
> On Mar 23, 2016, at 12:34 AM, Joel Nothman wrote:
>
> And I lied that none
And I lied that none of the scikit-learn estimators define their own
get_params. Of course the following do: VotingClassifier, Kernel (and
subclasses), Pipeline and FeatureUnion
On 23 March 2016 at 15:04, Joel Nothman wrote:
> something like the following may suffice:
>
something like the following may suffice:
def get_params(self, deep=True):
out = super(WordCooccurrenceVectorizer, self).get_params(deep=deep)
out['w2v_clusters'] = self.w2v_clusters
return out
On 23 March 2016 at 15:01, Joel Nothman wrote:
> Hi Fred,
>
> We
Hi Fred,
We use the __init__ signature to get the list of parameters that (a) can be
set by grid search; (b) need to be copied to a cloned instance of the
estimator (with any fitted model discarded) in constructing ensembles,
cross validation, etc. While none of the scikit-learn library of
Hello list,
Firstly, thanks for this incredible package; I use it daily at work. Now on
to the meat: I'm trying to subclass TfidfVectorizer and running into
issues. I want to specify an extra param for __init__() that points to a
file that gets used in build_analyzer(). Skipping irrelevant bits,
>
> - In tree-based Not handling categorical variables as such hurts us a lot
> There's a PR to fix that, it still needs a bit of love:
> https://github.com/scikit-learn/scikit-learn/pull/4899
>
This is a conversation moved from
https://github.com/scikit-learn/scikit-learn/pull/4899 .
In the
Unfortunately, the most important parameters to adjust to maximize
accuracy are often those controlling the randomness in the algorithm,
i.e. max_features for which this strategy is not possible.
That being said, in the case of boosting, I think this strategy would
be worth automatizing, e.g. to