Hi all, I've got a pipeline with some custom transformers that's not pickling, and I'm not sure why. I've had this previously when using custom preprocessors & tokenizers with CountVectorizers. I dealt with it then by defining the custom bits at the module level.
I assumed I could avoid that by creating custom transformers that directly subclass TransformerMixin and importing them to the module where the pipeline is defined. The transformer is implemented like this: *==============================* *[...imports...]* *from text_preprocess import TextPreprocess* *class CustomTransformer(TransformerMixin):* * def __init__(self, param_file_1="params.txt"):* * self.pattern_file = pattern_file* * self.custom = TextPreprocess(self.param_file)* * def transform(self, X, *_):* * if isinstance(X, basestring):* * X = [X]* * return ["%s %s" % (x, " ".join([item["rewrite"] for item in* * self.custom.match(x)["info"] if "rewrite" in item])) for x in X]* * def fit(self, *_):* * return self* *==============================* the full pipeline look like this: *==============================* *cm = CustomTransformer()* *vec = FeatureUnion([("char_ng",* * CountVectorizer(analyzer="char_wb", tokenizer=string.split,* * ngram_range=(3, 5), max_features=None, min_df=1,* * max_df=0.5, **stop_words=None, binary=False)),* * ("word_ng",* * CountVectorizer(analyzer="word", ngram_range=(2, 3), * * max_features=5000, min_df=1, max_df=0.5,* * stop_words="english", * *binary=False))])* *pipeline = Pipeline([("custom", cm), ("vec", vec),* * ("lr", LogisticRegressionCV(scoring="f1_macro"))])* *==============================* And I get the following error when I fit & dump: *==============================* *In [62]: pipeline.fit(docs, [0, 0, 0, 1])* *Out[62]:* *Pipeline(steps=[('custom', <cm_transformer.CustomTransformer object at 0x113dd2310>), ('vec', FeatureUnion(n_jobs=1,** transformer_list=[('char_ng', CountVectorizer(analyzer='char_wb', binary=False, decode_error=u'strict',* * ...None,* * refit=True, scoring='f1_macro', solver='lbfgs', tol=0.0001,* * verbose=0))])* *In [63]: pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"), pickle.HIGHEST_PROTOCOL)* *---------------------------------------------------------------------------* *PicklingError Traceback (most recent call last)* *<ipython-input-63-99a63544716d> in <module>()* *----> 1 pickle.dump(pipeline, open("test_pl_dump.pkl", "wb"), pickle.HIGHEST_PROTOCOL)* *PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed* *==============================* Any pointers would be appreciated. There are hints here and there on SO, but most point to the solution I referred to above... Thanks! Fred.
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general