Hello Spark/PMML enthusiasts,

It's pretty trivial to integrate the JPMML-Evaluator library with Spark. In
brief, take the following steps in your Spark application code:
1) Create a Java Map ("arguments") that represents the input data record.
You need to specify a key-value mapping for every active MiningField. The
key type is org.jpmml.evaluator.FieldName. The value type could be String or
any Java primitive data type that can be converted to the requested PMML
type.
2) Obtain an instance of org.jpmml.evaluator.Evaluator. Invoke its
#evaluate(Map<FieldName, ?>) method using the argument map created in step
1.
3) Process the Java Map ("results") that represents the output data record.

Putting it all together:
JavaRDD<Map&lt;FieldName, String>> arguments = ...
final ModelEvaluator<?> modelEvaluator =
(ModelEvaluator<?>)pmmlManager.getModelManager(null,
ModelEvaluatorFactory.getInstance()); // See the JPMML-Evaluator
documentation
JavaRDD<Map&lt;FieldName, ?>> results = arguments.flatMap(new
FlatMapFunction<Map&lt;FieldName, String>, Map<FieldName, ?>>(){

        @Override
        public Iterable<Map&lt;FieldName, ?>> call(Map<FieldName, String>
arguments){
                Map<FieldName, ?> result = modelEvaluator.evaluate(arguments);
                return Collections.<Map&lt;FieldName, ?>>singletonList(result);
        }
});

Of course, it's not very elegant to be using JavaRDD<Map&lt;K, V>> here.
Maybe someone can give me a hint about making it look and feel more Spark-y?

Also, I would like to refute earlier comment by @pacoid, that
JPMML-evaluator compares poorly against Augustus and Zementis products.
First, JPMML-Evaluator fully supports PMML specification versions 3.0
through 4.2. I would specifically stress the support for PMML 4.2, which was
released just a few months ago. Second, JPMML is open source. Perhaps its
licensing terms could be more liberal, but it's nevertheless the most open
and approachable way of bringing Java and PMML together.


VR



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augustus-tp7313p7412.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to