[GitHub] spark pull request: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Tr...

vruusmann Thu, 28 Apr 2016 13:24:45 -0700

Github user vruusmann commented on the pull request:

    https://github.com/apache/spark/pull/9207#issuecomment-215551480
  
    The main difference between PMML and PFA is the abstraction level. PMML is 
a high-level language (more similar to modeling languages such as UML), where 
you're supposed to express the computation logic using a fixed vocabulary of ML 
domain-specific concepts. PFA is a medium- to low-level language (more similar 
to regular programming languages), where you can work around some PMML 
limitations by embedding arbitrary computation logic (eg. doing processing in 
loops).
    
    I think that PMML's "high-levelness" and fixed vocabulary is a feature (not 
a bug that needs fixing by introducing another standard). The upside is that 
PMML processing and maintenance can be heavily automated. It is possible to 
load a PMML document, and do all sorts of transformations (eg. standardization, 
optimization) with it, and have a guarantee that the transformation output is 
functionally identical to the input. I'm specifically expanding my work in that 
direction - the Visitor API of the JPMML-Model library is the main tool here.
    
    The difference in abstraction levels means that it's possible to translate 
PMML to PFA (but not the other way around). So, if you're thinking about adding 
PFA support to Apache Spark, then it might be sensible to first translate Spark 
ML Pipelines to PMML, and then translate PMML to PFA. The same PMML-to-PFA 
converter would be usable with all PMML documents regardless of the original 
modeling software (eg. R, Scikit-Learn, Apache Spark).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Tr...

Reply via email to