unable to serialize analytics pipeline

Philip Ogren Tue, 22 Oct 2013 10:51:03 -0700

I have a text analytics pipeline that performs a sequence of steps (e.g.tokenization, part-of-speech tagging, etc.) on a line of text. I havewrapped the whole pipeline up into a simple interface that allows me tocall it from Scala as a POJO - i.e. I instantiate the pipeline, I passit a string, and get back some objects. Now, I would like to do thesame thing for items in a Spark RDD via a map transformation.Unfortunately, my pipeline is not serializable and so I get aNotSerializableException when I try this. I played around with Kryojust now to see if that could help and I ended up with a "missing no-argconstructor" exception on a class I have no control over. It seems theSpark framework expects that I should be able to serialize my pipelinewhen I can't (or at least don't think I can at first glance.)

Is there a workaround for this scenario? I am imagining a few possiblesolutions that seem a bit dubious to me, so I thought I would ask fordirection before wandering about. Perhaps a better understanding ofserialization strategies might help me get the pipeline to serialize.Or perhaps there is a way to instantiate my pipeline on demand on thenodes through a factory call.


Any advice is appreciated.

Thanks,
Philip

unable to serialize analytics pipeline

Reply via email to