Would pipelining include model export?  I didn't see that in the
documentation.

Are there ways that this is being done currently?



On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng <[email protected]> wrote:

> We are working on the pipeline features, which would make this
> procedure much easier in MLlib. This is still a WIP and the main JIRA
> is at:
>
> https://issues.apache.org/jira/browse/SPARK-1856
>
> Best,
> Xiangrui
>
> On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani
> <[email protected]> wrote:
> > Hello,
> >
> > I have been prototyping a text classification model that my company would
> > like to eventually put into production.  Our technology stack is
> currently
> > Java based but we would like to be able to build our models in
> Spark/MLlib
> > and then export something like a PMML file which can be used for model
> > scoring in real-time.
> >
> > I have been using scikit learn where I am able to take the training data
> > convert the text data into a sparse data format and then take the other
> > features and use the dictionary vectorizer to do one-hot encoding for the
> > other categorical variables.  All of those things seem to be possible in
> > mllib but I am still puzzled about how that can be packaged in such a way
> > that the incoming data can be first made into feature vectors and then
> > evaluated as well.
> >
> > Are there any best practices for this type of thing in Spark?  I hope
> this
> > is clear but if there are any confusions then please let me know.
> >
> > Thanks,
> >
> > Chirag
>

Reply via email to