Would pipelining include model export? I didn't see that in the documentation.
Are there ways that this is being done currently? On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng <[email protected]> wrote: > We are working on the pipeline features, which would make this > procedure much easier in MLlib. This is still a WIP and the main JIRA > is at: > > https://issues.apache.org/jira/browse/SPARK-1856 > > Best, > Xiangrui > > On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani > <[email protected]> wrote: > > Hello, > > > > I have been prototyping a text classification model that my company would > > like to eventually put into production. Our technology stack is > currently > > Java based but we would like to be able to build our models in > Spark/MLlib > > and then export something like a PMML file which can be used for model > > scoring in real-time. > > > > I have been using scikit learn where I am able to take the training data > > convert the text data into a sparse data format and then take the other > > features and use the dictionary vectorizer to do one-hot encoding for the > > other categorical variables. All of those things seem to be possible in > > mllib but I am still puzzled about how that can be packaged in such a way > > that the incoming data can be first made into feature vectors and then > > evaluated as well. > > > > Are there any best practices for this type of thing in Spark? I hope > this > > is clear but if there are any confusions then please let me know. > > > > Thanks, > > > > Chirag >
