[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

selvinsource Sun, 26 Apr 2015 23:17:36 -0700

Github user selvinsource commented on the pull request:

    https://github.com/apache/spark/pull/3062#issuecomment-96516889
  
    @mengxr for SVM, I manually tried what you suggested and it looks good.
    
    I loaded the example below in JPMML and evaluated it as Classification map, 
indeed the intercept on the NO category acts as threshold when 
`normalizationMethod = none`.
    Here the example:
    <code>
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <PMML xmlns="http://www.dmg.org/PMML-4_2";>
        <Header description="linear SVM: if predicted value &gt; 0, the outcome 
is positive, or negative otherwise">
            <Application name="Apache Spark MLlib" version="1.4.0-SNAPSHOT"/>
            <Timestamp>2015-04-27T06:58:22</Timestamp>
        </Header>
        <DataDictionary numberOfFields="10">
            <DataField name="field_0" optype="continuous" dataType="double"/>
            <DataField name="field_1" optype="continuous" dataType="double"/>
            <DataField name="field_2" optype="continuous" dataType="double"/>
            <DataField name="field_3" optype="continuous" dataType="double"/>
            <DataField name="field_4" optype="continuous" dataType="double"/>
            <DataField name="field_5" optype="continuous" dataType="double"/>
            <DataField name="field_6" optype="continuous" dataType="double"/>
            <DataField name="field_7" optype="continuous" dataType="double"/>
            <DataField name="field_8" optype="continuous" dataType="double"/>
            <DataField name="target" optype="categorical" dataType="string"/>
        </DataDictionary>
        <RegressionModel modelName="linear SVM: if predicted value &gt; 0, the 
outcome is positive, or negative otherwise" functionName="classification" 
normalizationMethod="none">
            <MiningSchema>
                <MiningField name="field_0" usageType="active"/>
                <MiningField name="field_1" usageType="active"/>
                <MiningField name="field_2" usageType="active"/>
                <MiningField name="field_3" usageType="active"/>
                <MiningField name="field_4" usageType="active"/>
                <MiningField name="field_5" usageType="active"/>
                <MiningField name="field_6" usageType="active"/>
                <MiningField name="field_7" usageType="active"/>
                <MiningField name="field_8" usageType="active"/>
                <MiningField name="target" usageType="target"/>
            </MiningSchema>
            <RegressionTable intercept="-1.2973802920137774" targetCategory="1">
                <NumericPredictor name="field_0" 
coefficient="-0.0818303650185629"/>
                <NumericPredictor name="field_1" 
coefficient="0.5609579878511747"/>
                <NumericPredictor name="field_2" 
coefficient="0.1382792114252377"/>
                <NumericPredictor name="field_3" 
coefficient="0.07497131265977852"/>
                <NumericPredictor name="field_4" 
coefficient="-0.47760356523751296"/>
                <NumericPredictor name="field_5" 
coefficient="0.3817837986572615"/>
                <NumericPredictor name="field_6" 
coefficient="-0.23753782335208481"/>
                <NumericPredictor name="field_7" 
coefficient="0.2548602390316011"/>
                <NumericPredictor name="field_8" 
coefficient="-0.10271528637619945"/>
            </RegressionTable>
            <RegressionTable intercept="0.0" targetCategory="0"/>
        </RegressionModel>
    </PMML>
    </code>
    
    However, I noticed that if the SVM model threshold is set to None, it 
simply displays the margin (which is how it is implemented now in the pmml 
exporter). 
    My question is, should we support both? If `threshold = None`, export as 
regression (like it is implemented now), if `threshold <> None`, export as 
binary classification (as you suggested). What do you think?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

Reply via email to