Thanks Andreas; I found the lambda living buried deeply in an imported
class of my custom transformer. As it turns out, the *dill* package appears
to be appeal to pickle lambdas without a hiccup, so I'm going with that for
model persistence.

Thanks again,
FM.

On 5 April 2016 at 16:25, Andreas Mueller <t3k...@gmail.com> wrote:

> What's the type of self.custom?
>
> Also, you can step into the debugger to see which function it is that can
> not be pickled.
>
>
>
>
> On 04/05/2016 04:14 PM, Fred Mailhot wrote:
>
> Hi all,
>
> I've got a pipeline with some custom transformers that's not pickling, and
> I'm not sure why. I've had this previously when using custom preprocessors
> & tokenizers with CountVectorizers. I dealt with it then by defining the
> custom bits at the module level.
>
> I assumed I could avoid that by creating custom transformers that directly
> subclass TransformerMixin and importing them to the module where the
> pipeline is defined.
>
> The transformer is implemented like this:
>
> *==============================*
> *[...imports...]*
> *from text_preprocess import TextPreprocess*
>
> *class CustomTransformer(TransformerMixin):*
>
> *    def __init__(self, param_file_1="params.txt"):*
> *        self.pattern_file = pattern_file*
>
> *        self.custom = TextPreprocess(self.param_file) *
>
> *    def transform(self, X, *_):*
> *        if isinstance(X, basestring):*
> *            X = [X]*
> *        return ["%s %s" % (x, " ".join([item["rewrite"] for item in*
> *                   self.custom.match(x)["info"] if "rewrite" in item]))
> for x in X]*
>
> *    def fit(self, *_):*
> *        return self*
> *==============================*
>
> the full pipeline look like this:
>
> *==============================*
> *cm = CustomTransformer()*
>
> *vec = FeatureUnion([("char_ng",*
> *                     CountVectorizer(analyzer="char_wb",
> tokenizer=string.split,*
> *                                     ngram_range=(3, 5),
> max_features=None, min_df=1,*
> *                                     max_df=0.5, **stop_words=None,
> binary=False)),*
> *                    ("word_ng",*
> *                     CountVectorizer(analyzer="word", ngram_range=(2,
> 3), *
> *                                     max_features=5000, min_df=1,
> max_df=0.5,*
> *                                     stop_words="english", *
> *binary=False))])*
>
> *pipeline = Pipeline([("custom", cm), ("vec", vec),*
> *                     ("lr", LogisticRegressionCV(scoring="f1_macro"))])*
> *==============================*
>
> And I get the following error when I fit & dump:
>
> *==============================*
> *In [62]: pipeline.fit(docs, [0, 0, 0, 1])*
> *Out[62]:*
> *Pipeline(steps=[('custom', <cm_transformer.CustomTransformer object at
> 0x113dd2310>), ('vec', FeatureUnion(n_jobs=1,** transformer_list=[('char_ng',
> CountVectorizer(analyzer='char_wb', binary=False, decode_error=u'strict',*
> *  ...None,*
> *           refit=True, scoring='f1_macro', solver='lbfgs', tol=0.0001,*
> *           verbose=0))])*
>
> *In [63]: pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"),
> pickle.HIGHEST_PROTOCOL)*
>
> *---------------------------------------------------------------------------*
> *PicklingError                             Traceback (most recent call
> last)*
> *<ipython-input-63-99a63544716d> in <module>()*
> *----> 1 pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"),
> pickle.HIGHEST_PROTOCOL)*
>
> *PicklingError: Can't pickle <type 'function'>: attribute lookup
> __builtin__.function failed*
> *==============================*
>
> Any pointers would be appreciated. There are hints here and there on SO,
> but most point to the solution I referred to above...
>
> Thanks!
> Fred.
>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to