[jira] [Commented] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark
[ https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123738#comment-16123738 ] Ratan Rai Sur commented on SPARK-21685: --- The python wrapper is generated so I've pasted it here so you don't have to build it: {code:java} # Copyright (C) Microsoft Corporation. All rights reserved. # Licensed under the MIT License. See LICENSE in the project root for information. import sys if sys.version >= '3': basestring = str from pyspark.ml.param.shared import * from pyspark import keyword_only from pyspark.ml.util import JavaMLReadable, JavaMLWritable from pyspark.ml.wrapper import JavaTransformer, JavaEstimator, JavaModel from pyspark.ml.common import inherit_doc from mmlspark.Utils import * @inherit_doc class _CNTKModel(ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer): """ The ``CNTKModel`` evaluates a pre-trained CNTK model in parallel. The ``CNTKModel`` takes a path to a model and automatically loads and distributes the model to workers for parallel evaluation using CNTK's java bindings. The ``CNTKModel`` loads the pretrained model into the ``Function`` class of CNTK. One can decide which node of the CNTK Function computation graph to evaluate by passing in the name of the output node with the output node parameter. Currently the ``CNTKModel`` supports single input single output models. The ``CNTKModel`` takes an input column which should be a column of spark vectors and returns a column of spark vectors representing the activations of the selected node. By default, the CNTK model defaults to using the model's first input and first output node. Args: inputCol (str): The name of the input column (undefined) inputNode (int): index of the input node (default: 0) miniBatchSize (int): size of minibatches (default: 10) model (object): Array of bytes containing the serialized CNTKModel (undefined) outputCol (str): The name of the output column (undefined) outputNodeIndex (int): index of the output node (default: 0) outputNodeName (str): name of the output node (undefined) """ @keyword_only def __init__(self, inputCol=None, inputNode=0, miniBatchSize=10, model=None, outputCol=None, outputNodeIndex=0, outputNodeName=None): super(_CNTKModel, self).__init__() self._java_obj = self._new_java_obj("com.microsoft.ml.spark.CNTKModel") self.inputCol = Param(self, "inputCol", "inputCol: The name of the input column (undefined)") self.inputNode = Param(self, "inputNode", "inputNode: index of the input node (default: 0)") self._setDefault(inputNode=0) self.miniBatchSize = Param(self, "miniBatchSize", "miniBatchSize: size of minibatches (default: 10)") self._setDefault(miniBatchSize=10) self.model = Param(self, "model", "model: Array of bytes containing the serialized CNTKModel (undefined)") self.outputCol = Param(self, "outputCol", "outputCol: The name of the output column (undefined)") self.outputNodeIndex = Param(self, "outputNodeIndex", "outputNodeIndex: index of the output node (default: 0)") self._setDefault(outputNodeIndex=0) self.outputNodeName = Param(self, "outputNodeName", "outputNodeName: name of the output node (undefined)") if hasattr(self, "_input_kwargs"): kwargs = self._input_kwargs else: kwargs = self.__init__._input_kwargs self.setParams(**kwargs) @keyword_only def setParams(self, inputCol=None, inputNode=0, miniBatchSize=10, model=None, outputCol=None, outputNodeIndex=0, outputNodeName=None): """ Set the (keyword only) parameters Args: inputCol (str): The name of the input column (undefined) inputNode (int): index of the input node (default: 0) miniBatchSize (int): size of minibatches (default: 10) model (object): Array of bytes containing the serialized CNTKModel (undefined) outputCol (str): The name of the output column (undefined) outputNodeIndex (int): index of the output node (default: 0) outputNodeName (str): name of the output node (undefined) """ if hasattr(self, "_input_kwargs"): kwargs = self._input_kwargs else: kwargs = self.__init__._input_kwargs return self._set(**kwargs) def setInputCol(self, value): """ Args: inputCol (str): The name of the input column (undefined) """ self._set(inputCol=value) return self def getInputCol(self): """ Returns: str: The name of the input column (undefined) """ return self.getOrDefault(self.inputCol) def setInputNode(self, value): """
[jira] [Updated] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark
[ https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratan Rai Sur updated SPARK-21685: -- Description: I'm trying to write a PySpark wrapper for a Transformer whose transform method includes the line {code:java} require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both outputNodeName and outputNodeIndex") {code} This should only throw an exception when both of these parameters are explicitly set. In the PySpark wrapper for the Transformer, there is this line in ___init___ {code:java} self._setDefault(outputNodeIndex=0) {code} Here is the line in the main python script showing how it is being configured {code:java} cntkModel = CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark, model.uri).setOutputNodeName("z") {code} As you can see, only setOutputNodeName is being explicitly set but the exception is still being thrown. If you need more context, https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the branch with the code, the files I'm referring to here that are tracked are the following: src/cntk-model/src/main/scala/CNTKModel.scala notebooks/tests/301 - CIFAR10 CNTK CNN Evaluation.ipynb The pyspark wrapper code is autogenerated was: I'm trying to write a PySpark wrapper for a Transformer whose transform method includes the line {code:java} require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both outputNodeName and outputNodeIndex") {code} This should only throw an exception when both of these parameters are explicitly set. In the PySpark wrapper for the Transformer, there is this line in ___init___ {code:java} self._setDefault(outputNodeIndex=0) {code} Here is the line in the main python script showing how it is being configured {code:java} cntkModel = CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark, model.uri).setOutputNodeName("z") {code} As you can see, only setOutputNodeName is being explicitly set but the exception is still being thrown. If you need more context, https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the branch with the code, the files I'm referring to here are src/cntk-model/src/main/scala/CNTKModel.scala src/src/main/resources/mmlspark/_CNTKModel.py notebooks/tests/301 - CIFAR10 CNTK CNN Evaluation.ipynb > Params isSet in scala Transformer triggered by _setDefault in pyspark > - > > Key: SPARK-21685 > URL: https://issues.apache.org/jira/browse/SPARK-21685 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.0 >Reporter: Ratan Rai Sur > > I'm trying to write a PySpark wrapper for a Transformer whose transform > method includes the line > {code:java} > require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both > outputNodeName and outputNodeIndex") > {code} > This should only throw an exception when both of these parameters are > explicitly set. > In the PySpark wrapper for the Transformer, there is this line in ___init___ > {code:java} > self._setDefault(outputNodeIndex=0) > {code} > Here is the line in the main python script showing how it is being configured > {code:java} > cntkModel = > CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark, > model.uri).setOutputNodeName("z") > {code} > As you can see, only setOutputNodeName is being explicitly set but the > exception is still being thrown. > If you need more context, > https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the > branch with the code, the files I'm referring to here that are tracked are > the following: > src/cntk-model/src/main/scala/CNTKModel.scala > notebooks/tests/301 - CIFAR10 CNTK CNN Evaluation.ipynb > The pyspark wrapper code is autogenerated -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark
[ https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratan Rai Sur updated SPARK-21685: -- Description: I'm trying to write a PySpark wrapper for a Transformer whose transform method includes the line {code:java} require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both outputNodeName and outputNodeIndex") {code} This should only throw an exception when both of these parameters are explicitly set. In the PySpark wrapper for the Transformer, there is this line in ___init___ {code:java} self._setDefault(outputNodeIndex=0) {code} Here is the line in the main python script showing how it is being configured {code:java} cntkModel = CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark, model.uri).setOutputNodeName("z") {code} As you can see, only setOutputNodeName is being explicitly set but the exception is still being thrown. If you need more context, https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the branch with the code, the files I'm referring to here are src/cntk-model/src/main/scala/CNTKModel.scala src/src/main/resources/mmlspark/_CNTKModel.py notebooks/tests/301 - CIFAR10 CNTK CNN Evaluation.ipynb was: I'm trying to write a PySpark wrapper for a Transformer whose transform method includes the line {code:java} require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both outputNodeName and outputNodeIndex") {code} This should only throw an exception when both of these parameters are explicitly set. In the PySpark wrapper for the Transformer, there is this line in ___init___ {code:java} self._setDefault(outputNodeIndex=0) {code} Here is the line in the main python script showing how it is being configured {code:java} cntkModel = CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark, model.uri).setOutputNodeName("z") {code} As you can see, only setOutputNodeName is being explicitly set but the exception is still being thrown. If you need more context, https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the branch with the code, the files I'm referring to here are CNTKModel.scala and _CNTKModel.py > Params isSet in scala Transformer triggered by _setDefault in pyspark > - > > Key: SPARK-21685 > URL: https://issues.apache.org/jira/browse/SPARK-21685 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.0 >Reporter: Ratan Rai Sur > > I'm trying to write a PySpark wrapper for a Transformer whose transform > method includes the line > {code:java} > require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both > outputNodeName and outputNodeIndex") > {code} > This should only throw an exception when both of these parameters are > explicitly set. > In the PySpark wrapper for the Transformer, there is this line in ___init___ > {code:java} > self._setDefault(outputNodeIndex=0) > {code} > Here is the line in the main python script showing how it is being configured > {code:java} > cntkModel = > CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark, > model.uri).setOutputNodeName("z") > {code} > As you can see, only setOutputNodeName is being explicitly set but the > exception is still being thrown. > If you need more context, > https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the > branch with the code, the files I'm referring to here are > src/cntk-model/src/main/scala/CNTKModel.scala > src/src/main/resources/mmlspark/_CNTKModel.py > notebooks/tests/301 - CIFAR10 CNTK CNN Evaluation.ipynb -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark
[ https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratan Rai Sur updated SPARK-21685: -- Description: I'm trying to write a PySpark wrapper for a Transformer whose transform method includes the line {code:java} require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both outputNodeName and outputNodeIndex") {code} This should only throw an exception when both of these parameters are explicitly set. In the PySpark wrapper for the Transformer, there is this line in ___init___ {code:java} self._setDefault(outputNodeIndex=0) {code} Here is the line in the main python script showing how it is being configured {code:java} cntkModel = CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark, model.uri).setOutputNodeName("z") {code} As you can see, only setOutputNodeName is being explicitly set but the exception is still being thrown. If you need more context, https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the branch with the code, the files I'm referring to here are CNTKModel.scala and _CNTKModel.py was: I'm trying to write a PySpark wrapper for a Transformer whose transform method includes the line {code:scala} require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both outputNodeName and outputNodeIndex") {code} This should only throw an exception when both of these parameters are explicitly set. In the PySpark wrapper for the Transformer, there is this line in ___init___ {code:python} self._setDefault(outputNodeIndex=0) {code} Here is the line in the main python script showing how it is being configured {code:python} cntkModel = CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark, model.uri).setOutputNodeName("z") {code} As you can see, only setOutputNodeName is being explicitly set but the exception is still being thrown. If you need more context, https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the branch with the code, the files I'm referring to here are CNTKModel.scala and _CNTKModel.py > Params isSet in scala Transformer triggered by _setDefault in pyspark > - > > Key: SPARK-21685 > URL: https://issues.apache.org/jira/browse/SPARK-21685 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.0 >Reporter: Ratan Rai Sur > > I'm trying to write a PySpark wrapper for a Transformer whose transform > method includes the line > {code:java} > require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both > outputNodeName and outputNodeIndex") > {code} > This should only throw an exception when both of these parameters are > explicitly set. > In the PySpark wrapper for the Transformer, there is this line in ___init___ > {code:java} > self._setDefault(outputNodeIndex=0) > {code} > Here is the line in the main python script showing how it is being configured > {code:java} > cntkModel = > CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark, > model.uri).setOutputNodeName("z") > {code} > As you can see, only setOutputNodeName is being explicitly set but the > exception is still being thrown. > If you need more context, > https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the > branch with the code, the files I'm referring to here are CNTKModel.scala and > _CNTKModel.py -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark
Ratan Rai Sur created SPARK-21685: - Summary: Params isSet in scala Transformer triggered by _setDefault in pyspark Key: SPARK-21685 URL: https://issues.apache.org/jira/browse/SPARK-21685 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.1.0 Reporter: Ratan Rai Sur I'm trying to write a PySpark wrapper for a Transformer whose transform method includes the line {code:scala} require(!(isSet(outputNodeName) && isSet(outputNodeIndex)), "Can't set both outputNodeName and outputNodeIndex") {code} This should only throw an exception when both of these parameters are explicitly set. In the PySpark wrapper for the Transformer, there is this line in ___init___ {code:python} self._setDefault(outputNodeIndex=0) {code} Here is the line in the main python script showing how it is being configured {code:python} cntkModel = CNTKModel().setInputCol("images").setOutputCol("output").setModelLocation(spark, model.uri).setOutputNodeName("z") {code} As you can see, only setOutputNodeName is being explicitly set but the exception is still being thrown. If you need more context, https://github.com/RatanRSur/mmlspark/tree/default-cntkmodel-output is the branch with the code, the files I'm referring to here are CNTKModel.scala and _CNTKModel.py -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17726) Allow RDD.pipe to take script contents
Ratan Rai Sur created SPARK-17726: - Summary: Allow RDD.pipe to take script contents Key: SPARK-17726 URL: https://issues.apache.org/jira/browse/SPARK-17726 Project: Spark Issue Type: New Feature Reporter: Ratan Rai Sur Priority: Minor I have a need to run arbitrary shell scripts which get passed to me as strings. I was wondering if there could be a flag that indicates that the string needs to be written to a temp file and then executed. I took a look at PipedRDD.scala and would be willing to take a crack at making changes, but I wanted to get some feed back on why this can't or shouldn't happen. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org