Thanks Andreas; I found the lambda living buried deeply in an imported class of my custom transformer. As it turns out, the *dill* package appears to be appeal to pickle lambdas without a hiccup, so I'm going with that for model persistence.
Thanks again, FM. On 5 April 2016 at 16:25, Andreas Mueller <t3k...@gmail.com> wrote: > What's the type of self.custom? > > Also, you can step into the debugger to see which function it is that can > not be pickled. > > > > > On 04/05/2016 04:14 PM, Fred Mailhot wrote: > > Hi all, > > I've got a pipeline with some custom transformers that's not pickling, and > I'm not sure why. I've had this previously when using custom preprocessors > & tokenizers with CountVectorizers. I dealt with it then by defining the > custom bits at the module level. > > I assumed I could avoid that by creating custom transformers that directly > subclass TransformerMixin and importing them to the module where the > pipeline is defined. > > The transformer is implemented like this: > > *==============================* > *[...imports...]* > *from text_preprocess import TextPreprocess* > > *class CustomTransformer(TransformerMixin):* > > * def __init__(self, param_file_1="params.txt"):* > * self.pattern_file = pattern_file* > > * self.custom = TextPreprocess(self.param_file) * > > * def transform(self, X, *_):* > * if isinstance(X, basestring):* > * X = [X]* > * return ["%s %s" % (x, " ".join([item["rewrite"] for item in* > * self.custom.match(x)["info"] if "rewrite" in item])) > for x in X]* > > * def fit(self, *_):* > * return self* > *==============================* > > the full pipeline look like this: > > *==============================* > *cm = CustomTransformer()* > > *vec = FeatureUnion([("char_ng",* > * CountVectorizer(analyzer="char_wb", > tokenizer=string.split,* > * ngram_range=(3, 5), > max_features=None, min_df=1,* > * max_df=0.5, **stop_words=None, > binary=False)),* > * ("word_ng",* > * CountVectorizer(analyzer="word", ngram_range=(2, > 3), * > * max_features=5000, min_df=1, > max_df=0.5,* > * stop_words="english", * > *binary=False))])* > > *pipeline = Pipeline([("custom", cm), ("vec", vec),* > * ("lr", LogisticRegressionCV(scoring="f1_macro"))])* > *==============================* > > And I get the following error when I fit & dump: > > *==============================* > *In [62]: pipeline.fit(docs, [0, 0, 0, 1])* > *Out[62]:* > *Pipeline(steps=[('custom', <cm_transformer.CustomTransformer object at > 0x113dd2310>), ('vec', FeatureUnion(n_jobs=1,** transformer_list=[('char_ng', > CountVectorizer(analyzer='char_wb', binary=False, decode_error=u'strict',* > * ...None,* > * refit=True, scoring='f1_macro', solver='lbfgs', tol=0.0001,* > * verbose=0))])* > > *In [63]: pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"), > pickle.HIGHEST_PROTOCOL)* > > *---------------------------------------------------------------------------* > *PicklingError Traceback (most recent call > last)* > *<ipython-input-63-99a63544716d> in <module>()* > *----> 1 pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"), > pickle.HIGHEST_PROTOCOL)* > > *PicklingError: Can't pickle <type 'function'>: attribute lookup > __builtin__.function failed* > *==============================* > > Any pointers would be appreciated. There are hints here and there on SO, > but most point to the solution I referred to above... > > Thanks! > Fred. > > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > Scikit-learn-general mailing > listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general