[GitHub] [spark] huaxingao commented on a change in pull request #26480: [SPARK-29808][ML][PYTHON] StopWordsRemover should support multi-cols

2019-11-12 Thread GitBox
huaxingao commented on a change in pull request #26480: 
[SPARK-29808][ML][PYTHON] StopWordsRemover should support multi-cols
URL: https://github.com/apache/spark/pull/26480#discussion_r345356303
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala
 ##
 @@ -22,23 +22,29 @@ import java.util.Locale
 import org.apache.spark.annotation.Since
 import org.apache.spark.ml.Transformer
 import org.apache.spark.ml.param._
-import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol}
+import org.apache.spark.ml.param.shared.{HasInputCol, HasInputCols, 
HasOutputCol, HasOutputCols}
 import org.apache.spark.ml.util._
 import org.apache.spark.sql.{DataFrame, Dataset}
 import org.apache.spark.sql.functions.{col, udf}
-import org.apache.spark.sql.types.{ArrayType, StringType, StructType}
+import org.apache.spark.sql.types.{ArrayType, StringType, StructField, 
StructType}
 
 /**
  * A feature transformer that filters out stop words from input.
  *
+ * Since 3.0.0,
 
 Review comment:
   Sorry, I accidentally broke the line, but I prefer to have it. When other 
features added the multi columns support, ```since xxx``` was added to the doc. 
Just try to be consistent with others. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #26480: [SPARK-29808][ML][PYTHON] StopWordsRemover should support multi-cols

2019-11-12 Thread GitBox
huaxingao commented on a change in pull request #26480: 
[SPARK-29808][ML][PYTHON] StopWordsRemover should support multi-cols
URL: https://github.com/apache/spark/pull/26480#discussion_r345356303
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala
 ##
 @@ -22,23 +22,29 @@ import java.util.Locale
 import org.apache.spark.annotation.Since
 import org.apache.spark.ml.Transformer
 import org.apache.spark.ml.param._
-import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol}
+import org.apache.spark.ml.param.shared.{HasInputCol, HasInputCols, 
HasOutputCol, HasOutputCols}
 import org.apache.spark.ml.util._
 import org.apache.spark.sql.{DataFrame, Dataset}
 import org.apache.spark.sql.functions.{col, udf}
-import org.apache.spark.sql.types.{ArrayType, StringType, StructType}
+import org.apache.spark.sql.types.{ArrayType, StringType, StructField, 
StructType}
 
 /**
  * A feature transformer that filters out stop words from input.
  *
+ * Since 3.0.0,
 
 Review comment:
   Sorry, I accidentally broke the line, but I prefer to have it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #26480: [SPARK-29808][ML][PYTHON] StopWordsRemover should support multi-cols

2019-11-12 Thread GitBox
huaxingao commented on a change in pull request #26480: 
[SPARK-29808][ML][PYTHON] StopWordsRemover should support multi-cols
URL: https://github.com/apache/spark/pull/26480#discussion_r345356363
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala
 ##
 @@ -142,16 +165,40 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") 
override val uid: String
 terms.filter(s => !lowerStopWords.contains(toLower(s)))
   }
 }
-val metadata = outputSchema($(outputCol)).metadata
-dataset.select(col("*"), t(col($(inputCol))).as($(outputCol), metadata))
+
+val (inputColNames, outputColNames) = getInOutCols()
+val ouputCols = inputColNames.map { inputColName =>
 
 Review comment:
   Tried this. It doesn't work :(


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #26480: [SPARK-29808][ML][PYTHON] StopWordsRemover should support multi-cols

2019-11-11 Thread GitBox
huaxingao commented on a change in pull request #26480: 
[SPARK-29808][ML][PYTHON] StopWordsRemover should support multi-cols
URL: https://github.com/apache/spark/pull/26480#discussion_r345028011
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala
 ##
 @@ -51,6 +57,14 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") 
override val uid: String
   @Since("1.5.0")
   def setOutputCol(value: String): this.type = set(outputCol, value)
 
+  /** @group setParam */
+  @Since("3.0.0")
+  def setInputCols(value: Array[String]): this.type = set(inputCols, value)
+
+  /** @group setParam */
+  @Since("3.0.0")
+  def setOutputCols(value: Array[String]): this.type = set(outputCols, value)
+
 
 Review comment:
   I am debating if I should add 
```stopWordsArray/caseSensitiveArray/localArray```. Seems to me that users will 
use the same set of ```stopWords``` for all columns, so it's no need to add 
those. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org