Hello Spark/PMML enthusiasts, It's pretty trivial to integrate the JPMML-Evaluator library with Spark. In brief, take the following steps in your Spark application code: 1) Create a Java Map ("arguments") that represents the input data record. You need to specify a key-value mapping for every active MiningField. The key type is org.jpmml.evaluator.FieldName. The value type could be String or any Java primitive data type that can be converted to the requested PMML type. 2) Obtain an instance of org.jpmml.evaluator.Evaluator. Invoke its #evaluate(Map<FieldName, ?>) method using the argument map created in step 1. 3) Process the Java Map ("results") that represents the output data record.
Putting it all together: JavaRDD<Map<FieldName, String>> arguments = ... final ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(null, ModelEvaluatorFactory.getInstance()); // See the JPMML-Evaluator documentation JavaRDD<Map<FieldName, ?>> results = arguments.flatMap(new FlatMapFunction<Map<FieldName, String>, Map<FieldName, ?>>(){ @Override public Iterable<Map<FieldName, ?>> call(Map<FieldName, String> arguments){ Map<FieldName, ?> result = modelEvaluator.evaluate(arguments); return Collections.<Map<FieldName, ?>>singletonList(result); } }); Of course, it's not very elegant to be using JavaRDD<Map<K, V>> here. Maybe someone can give me a hint about making it look and feel more Spark-y? Also, I would like to refute earlier comment by @pacoid, that JPMML-evaluator compares poorly against Augustus and Zementis products. First, JPMML-Evaluator fully supports PMML specification versions 3.0 through 4.2. I would specifically stress the support for PMML 4.2, which was released just a few months ago. Second, JPMML is open source. Perhaps its licensing terms could be more liberal, but it's nevertheless the most open and approachable way of bringing Java and PMML together. VR -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augustus-tp7313p7412.html Sent from the Apache Spark User List mailing list archive at Nabble.com.