how about spark? It contains some common machine learning algorithms and support JAVA api. On Jun 13, 2016 01:26, "Gaurav gupta" <gupta.gaurav0...@gmail.com> wrote:
> > Hi All, > > > > Could you please guide me on how to *create and execute *a machine > learning models/statistical models (regression, Decision tree, K means > clustering, Naive bayes, scorecard/linear/logistic regression etc. and GBM, > GLM ) in *Java/JVM based application* (in production). > > > > We have an ETL sort of Java based product where one can do most of data > Preparation steps for machine learning, like data ingestion from JDBC, > files, HDFS, No SQL etc., joins and aggregations etc.(which are required > for Feature engineering) and now we want to add Analytics capabilities > using machine learning/statistical modeling. > > > > Right now, we are using JPMML- evaluator > <https://github.com/jpmml/jpmml-evaluator> to score the models created in > PMML format using R and python (and Knime) but it needs three separate and > unconnected steps:- > > 1- first step for data preparation in our Java/JVM application and save > the sampling data (training and test) data in csv file or in DB, - *<JAVA/JVM > BASED application>* > > *2- Create a machine learning Model in R and python (and Knime) and > export it in PMML 4.2 format - <NON JAVA BASED >* > > 3- Import/deploy the PMML in our Java based application and use JPMML > evaluator to execute it in production. *<JAVA BASED>* > > > > I am sure it's a common problem in machine learning as generally in > Production JAVA is preferred over Python or R. Could you suggest what is > the better approach(s) to *create* as well as *execute* a python/scikit > based machine learning model in JVM based application. > > > > What are your thought to achieve the steps # 2 and #3 more seamlessly in a > JVM based application, without compromising *performance and usability*:- > > > > 1- Call a java program which internally calls the python scikit > script > <http://stackoverflow.com/questions/12738827/how-can-i-call-scikit-learn-classifiers-from-java>(under > the hood) to create a model in PMML > <https://github.com/jpmml/jpmml-sklearn> and then use JPMML evaluator. It > will pretend to the user that he is in a single JVM based application > (better usability). I am not sure what are the limitations and short coming > of using PMML as not all features are supported in jpmml-sklearn > <https://github.com/jpmml/jpmml-sklearn>. > > 2- Call a java program which internally calls the python script and > do the model creation as well as execution in an external python > environment and serialized the model and the results in a file/csv or in > memory DB (or cache, like hazelcast) from where the parent Java application > will fetch the results etc.. I researched that I can’t use Jython for > executing Sci-kit models > <http://stackoverflow.com/questions/12738827/how-can-i-call-scikit-learn-classifiers-from-java> > . > > 3- Can I use Jep <https://github.com/mrj0/jep> (Embed Python in Java) > to embed Cpython in JVM ? Does anybody tried it for sci-kit models? > > > > Alternatively, I should explore to use Mahout or weka - java based > machine learning libraries in my JVM based application. (I need to support > both windows and non-windows platforms) > > > > I am also exploring H2Oai which is java based. Does anybody tried it. > > > Regards > > Gaurav > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn