We are working on the pipeline features, which would make this
procedure much easier in MLlib. This is still a WIP and the main JIRA
is at:

https://issues.apache.org/jira/browse/SPARK-1856

Best,
Xiangrui

On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani
<chirag.lakh...@gmail.com> wrote:
> Hello,
>
> I have been prototyping a text classification model that my company would
> like to eventually put into production.  Our technology stack is currently
> Java based but we would like to be able to build our models in Spark/MLlib
> and then export something like a PMML file which can be used for model
> scoring in real-time.
>
> I have been using scikit learn where I am able to take the training data
> convert the text data into a sparse data format and then take the other
> features and use the dictionary vectorizer to do one-hot encoding for the
> other categorical variables.  All of those things seem to be possible in
> mllib but I am still puzzled about how that can be packaged in such a way
> that the incoming data can be first made into feature vectors and then
> evaluated as well.
>
> Are there any best practices for this type of thing in Spark?  I hope this
> is clear but if there are any confusions then please let me know.
>
> Thanks,
>
> Chirag

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to