[jira] [Commented] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark

2017-08-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144021#comment-16144021
 ] 

Apache Spark commented on SPARK-21685:
--

User 'BryanCutler' has created a pull request for this issue:
https://github.com/apache/spark/pull/18982

> Params isSet in scala Transformer triggered by _setDefault in pyspark
> -
>
> Key: SPARK-21685
> URL: https://issues.apache.org/jira/browse/SPARK-21685
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Ratan Rai Sur
>
> I'm trying to write a PySpark wrapper for a Transformer whose transform 
> method includes the line
> {code:java}
> require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both 
> outputNodeName and outputNodeIndex")
> {code}
> This should only throw an exception when both of these parameters are 
> explicitly set.
> In the PySpark wrapper for the Transformer, there is this line in ___init___
> {code:java}
> self._setDefault(outputNodeIndex=0)
> {code}
> Here is the line in the main python script showing how it is being configured
> {code:java}
> cntkModel = 
> CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark,
>  model.uri).setOutputNodeName("z")
> {code}
> As you can see, only setOutputNodeName is being explicitly set but the 
> exception is still being thrown.
> If you need more context, 
> https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the 
> branch with the code, the files I'm referring to here that are tracked are 
> the following:
> src/cntk-model/src/main/scala/CNTKModel.scala
> notebooks/tests/301 - CIFAR10 CNTK CNN Evaluation.ipynb
> The pyspark wrapper code is autogenerated



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark

2017-08-17 Thread Bryan Cutler (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131512#comment-16131512
 ] 

Bryan Cutler commented on SPARK-21685:
--

I believe the problem is during the call to transform, the PySpark model does 
not differentiate between set and default params and then sets them all in 
Java.  I have submitting a fix at https://github.com/apache/spark/pull/18982, 
could you try that and see if it works for you?

> Params isSet in scala Transformer triggered by _setDefault in pyspark
> -
>
> Key: SPARK-21685
> URL: https://issues.apache.org/jira/browse/SPARK-21685
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Ratan Rai Sur
>
> I'm trying to write a PySpark wrapper for a Transformer whose transform 
> method includes the line
> {code:java}
> require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both 
> outputNodeName and outputNodeIndex")
> {code}
> This should only throw an exception when both of these parameters are 
> explicitly set.
> In the PySpark wrapper for the Transformer, there is this line in ___init___
> {code:java}
> self._setDefault(outputNodeIndex=0)
> {code}
> Here is the line in the main python script showing how it is being configured
> {code:java}
> cntkModel = 
> CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark,
>  model.uri).setOutputNodeName("z")
> {code}
> As you can see, only setOutputNodeName is being explicitly set but the 
> exception is still being thrown.
> If you need more context, 
> https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the 
> branch with the code, the files I'm referring to here that are tracked are 
> the following:
> src/cntk-model/src/main/scala/CNTKModel.scala
> notebooks/tests/301 - CIFAR10 CNTK CNN Evaluation.ipynb
> The pyspark wrapper code is autogenerated



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark

2017-08-11 Thread Ratan Rai Sur (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123738#comment-16123738
 ] 

Ratan Rai Sur commented on SPARK-21685:
---

The python wrapper is generated so I've pasted it here so you don't have to 
build it:


{code:java}
# Copyright (C) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See LICENSE in the project root for 
information.


import sys
if sys.version >= '3':
basestring = str

from pyspark.ml.param.shared import *
from pyspark import keyword_only
from pyspark.ml.util import JavaMLReadable, JavaMLWritable
from pyspark.ml.wrapper import JavaTransformer, JavaEstimator, JavaModel
from pyspark.ml.common import inherit_doc
from mmlspark.Utils import *

@inherit_doc
class _CNTKModel(ComplexParamsMixin, JavaMLReadable, JavaMLWritable, 
JavaTransformer):
"""
The ``CNTKModel`` evaluates a pre-trained CNTK model in parallel.  The
``CNTKModel`` takes a path to a model and automatically loads and
distributes the model to workers for parallel evaluation using CNTK's
java bindings.

The ``CNTKModel`` loads the pretrained model into the ``Function`` class
of CNTK.  One can decide which node of the CNTK Function computation
graph to evaluate by passing in the name of the output node with the
output node parameter.  Currently the ``CNTKModel`` supports single
input single output models.

The ``CNTKModel`` takes an input column which should be a column of
spark vectors and returns a column of spark vectors representing the
activations of the selected node.  By default, the CNTK model defaults
to using the model's first input and first output node.

Args:
inputCol (str): The name of the input column (undefined)
inputNode (int): index of the input node (default: 0)
miniBatchSize (int): size of minibatches (default: 10)
model (object): Array of bytes containing the serialized CNTKModel 
(undefined)
outputCol (str): The name of the output column (undefined)
outputNodeIndex (int): index of the output node (default: 0)
outputNodeName (str): name of the output node (undefined)
"""

@keyword_only
def __init__(self, inputCol=None, inputNode=0, miniBatchSize=10, 
model=None, outputCol=None, outputNodeIndex=0, outputNodeName=None):
super(_CNTKModel, self).__init__()
self._java_obj = self._new_java_obj("com.microsoft.ml.spark.CNTKModel")
self.inputCol = Param(self, "inputCol", "inputCol: The name of the 
input column (undefined)")
self.inputNode = Param(self, "inputNode", "inputNode: index of the 
input node (default: 0)")
self._setDefault(inputNode=0)
self.miniBatchSize = Param(self, "miniBatchSize", "miniBatchSize: size 
of minibatches (default: 10)")
self._setDefault(miniBatchSize=10)
self.model = Param(self, "model", "model: Array of bytes containing the 
serialized CNTKModel (undefined)")
self.outputCol = Param(self, "outputCol", "outputCol: The name of the 
output column (undefined)")
self.outputNodeIndex = Param(self, "outputNodeIndex", "outputNodeIndex: 
index of the output node (default: 0)")
self._setDefault(outputNodeIndex=0)
self.outputNodeName = Param(self, "outputNodeName", "outputNodeName: 
name of the output node (undefined)")
if hasattr(self, "_input_kwargs"):
kwargs = self._input_kwargs
else:
kwargs = self.__init__._input_kwargs
self.setParams(**kwargs)

@keyword_only
def setParams(self, inputCol=None, inputNode=0, miniBatchSize=10, 
model=None, outputCol=None, outputNodeIndex=0, outputNodeName=None):
"""
Set the (keyword only) parameters

Args:
inputCol (str): The name of the input column (undefined)
inputNode (int): index of the input node (default: 0)
miniBatchSize (int): size of minibatches (default: 10)
model (object): Array of bytes containing the serialized CNTKModel 
(undefined)
outputCol (str): The name of the output column (undefined)
outputNodeIndex (int): index of the output node (default: 0)
outputNodeName (str): name of the output node (undefined)
"""
if hasattr(self, "_input_kwargs"):
kwargs = self._input_kwargs
else:
kwargs = self.__init__._input_kwargs
return self._set(**kwargs)

def setInputCol(self, value):
"""

Args:
inputCol (str): The name of the input column (undefined)

"""
self._set(inputCol=value)
return self


def getInputCol(self):
"""

Returns:
str: The name of the input column (undefined)
"""
return self.getOrDefault(self.inputCol)


def setInputNode(self, value):
"""

   

[jira] [Commented] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark

2017-08-10 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122492#comment-16122492
 ] 

Joseph K. Bradley commented on SPARK-21685:
---

Could you please point to more info, such as the Python wrappers you are 
calling?  I don't see enough info here to identify the problem.

> Params isSet in scala Transformer triggered by _setDefault in pyspark
> -
>
> Key: SPARK-21685
> URL: https://issues.apache.org/jira/browse/SPARK-21685
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Ratan Rai Sur
>
> I'm trying to write a PySpark wrapper for a Transformer whose transform 
> method includes the line
> {code:java}
> require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both 
> outputNodeName and outputNodeIndex")
> {code}
> This should only throw an exception when both of these parameters are 
> explicitly set.
> In the PySpark wrapper for the Transformer, there is this line in ___init___
> {code:java}
> self._setDefault(outputNodeIndex=0)
> {code}
> Here is the line in the main python script showing how it is being configured
> {code:java}
> cntkModel = 
> CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark,
>  model.uri).setOutputNodeName("z")
> {code}
> As you can see, only setOutputNodeName is being explicitly set but the 
> exception is still being thrown.
> If you need more context, 
> https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the 
> branch with the code, the files I'm referring to here that are tracked are 
> the following:
> src/cntk-model/src/main/scala/CNTKModel.scala
> notebooks/tests/301 - CIFAR10 CNTK CNN Evaluation.ipynb
> The pyspark wrapper code is autogenerated



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org