[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/17995 @yanboliang I update this PR and revert changes on `setSolver` in GLR and LiR. Thanks for your reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17995 **[Test build #78607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78607/testReport)** for PR 17995 at commit [`28941f3`](https://github.com/apache/spark/commit/28941f39187380b9f7ca6a49d24fbee8a759a505). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123931580 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala --- @@ -81,7 +83,8 @@ private[classification] trait MultilayerPerceptronParams extends PredictorParams final val solver: Param[String] = new Param[String](this, "solver", "The solver algorithm for optimization. Supported options: " + s"${MultilayerPerceptronClassifier.supportedSolvers.mkString(", ")}. (Default l-bfgs)", - ParamValidators.inArray[String](MultilayerPerceptronClassifier.supportedSolvers)) +(value: String) => MultilayerPerceptronClassifier.supportedSolvers + .contains(value.toLowerCase(Locale.ROOT))) --- End diff -- I think it a good idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18334: [SPARK-21127] [SQL] Update statistics after data ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18334#discussion_r123930251 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -0,0 +1,112 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.command + +import java.net.URI + +import scala.util.control.NonFatal + +import org.apache.hadoop.fs.{FileSystem, Path} + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, CatalogTable} +import org.apache.spark.sql.internal.SessionState + + +object CommandUtils extends Logging { + + /** + * Update statistics (currently only sizeInBytes) after changing data by commands. + */ + def updateTableStats( + sparkSession: SparkSession, + table: CatalogTable, + newTableSize: Option[BigInt] = None, + newRowCount: Option[BigInt] = None): Unit = { +if (sparkSession.sessionState.conf.autoStatsUpdate && table.stats.nonEmpty) { + val catalog = sparkSession.sessionState.catalog + val newTable = catalog.getTableMetadata(table.identifier) + val newSize = newTableSize.getOrElse( +CommandUtils.calculateTotalSize(sparkSession.sessionState, newTable)) + catalog.alterTableStats(table.identifier, +CatalogStatistics(sizeInBytes = newSize, rowCount = newRowCount)) --- End diff -- since we are protected by a flag, can we be more aggressive and auto update all stats? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecution.ign...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18419 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123930006 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. For example: + * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2. + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} --- End diff -- If `minValue == maxValue `, then `lower==upper`, result is `numBucket + 1L`: ``` val lower: Double = Math.min(minValue, maxValue) val upper: Double = Math.max(minValue, maxValue) val result: Long = if (expr < lower) { 0 } else if (expr >= upper) { numBucket + 1L } else { (numBucket.toDouble * (expr - lower) / (upper - lower) + 1).toLong } if (minValue > maxValue) (numBucket - result) + 1 else result ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecut...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18419#discussion_r123929987 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -118,4 +125,19 @@ object SQLExecution { sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, oldExecutionId) } } + + /** + * Wrap an action which may have nested execution id. This method can be used to run an execution + * inside another execution, e.g., `CacheTableCommand` need to call `Dataset.collect`. --- End diff -- nit: All Spark jobs in the body won't be tracked in UI. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18334: [SPARK-21127] [SQL] Update statistics after data ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18334#discussion_r123929887 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala --- @@ -165,6 +167,22 @@ private[sql] trait SQLTestUtils } /** + * Creates the specified number of temporary directories, which is then passed to `f` and will be + * deleted after `f` returns. + */ + protected def withTempPaths(numPaths: Int)(f: Seq[File] => Unit): Unit = { +val files = mutable.Buffer[File]() --- End diff -- nit: we can just create an array as we know the size. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17084 **[Test build #78606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78606/testReport)** for PR 17084 at commit [`60fc2a7`](https://github.com/apache/spark/commit/60fc2a78d4c3e985e91fd14522642d861df58d99). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17084 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17084 the pip packaging failing seems to be unrelated to the code... let me try this again --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecut...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18419#discussion_r123928964 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -118,4 +125,19 @@ object SQLExecution { sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, oldExecutionId) } } + + /** + * Wrap an action which may have nested execution id. This method can be used to run an execution + * inside another execution, e.g., `CacheTableCommand` need to call `Dataset.collect`. + */ + def ignoreNestedExecutionId[T](sparkSession: SparkSession)(body: => T): T = { --- End diff -- Although we ignore nested execution id, the job stages and metrics created by the body here will still be recorded into the `SQLExecutionUIData` referred by the current execution id. But looks it should be fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18334: [SPARK-21127] [SQL] Update statistics after data ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18334#discussion_r123928827 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -0,0 +1,112 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.command + +import java.net.URI + +import scala.util.control.NonFatal + +import org.apache.hadoop.fs.{FileSystem, Path} + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, CatalogTable} +import org.apache.spark.sql.internal.SessionState + + +object CommandUtils extends Logging { + + /** + * Update statistics (currently only sizeInBytes) after changing data by commands. + */ + def updateTableStats( + sparkSession: SparkSession, + table: CatalogTable, + newTableSize: Option[BigInt] = None, + newRowCount: Option[BigInt] = None): Unit = { +if (sparkSession.sessionState.conf.autoStatsUpdate && table.stats.nonEmpty) { + val catalog = sparkSession.sessionState.catalog + val newTable = catalog.getTableMetadata(table.identifier) + val newSize = newTableSize.getOrElse( +CommandUtils.calculateTotalSize(sparkSession.sessionState, newTable)) + catalog.alterTableStats(table.identifier, +CatalogStatistics(sizeInBytes = newSize, rowCount = newRowCount)) --- End diff -- so we never auto update column stats? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11994 **[Test build #78605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78605/testReport)** for PR 11994 at commit [`dd981ba`](https://github.com/apache/spark/commit/dd981ba1db4066109d61af1cfb18a06819b4bed5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18368 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78596/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18368 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18405 **[Test build #78604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78604/testReport)** for PR 18405 at commit [`255c50a`](https://github.com/apache/spark/commit/255c50a87051df42933bbd83aea14ccd54c18826). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18368 **[Test build #78596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78596/testReport)** for PR 18368 at commit [`fc2b7c0`](https://github.com/apache/spark/commit/fc2b7c02fab7f570ae3ca080ae1c2c9502300de7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18405 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17084 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78598/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17084 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17084 **[Test build #78598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78598/testReport)** for PR 17084 at commit [`60fc2a7`](https://github.com/apache/spark/commit/60fc2a78d4c3e985e91fd14522642d861df58d99). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SPARK-21144][SQL] Make it more consistent ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17758 **[Test build #78603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78603/testReport)** for PR 17758 at commit [`3f56d04`](https://github.com/apache/spark/commit/3f56d04c7131fe833a3efbf56e7318e2c08f79dc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17084 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17084 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78597/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17084 **[Test build #78597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78597/testReport)** for PR 17084 at commit [`cf59c62`](https://github.com/apache/spark/commit/cf59c62f272ade192dfbf28ab53881251ea0d95e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class BinaryClassificationMetrics @Since(\"2.2.0\") (` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SPARK-21144][SQL] Make it more consistent ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17758 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18334 **[Test build #78602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78602/testReport)** for PR 18334 at commit [`5a43594`](https://github.com/apache/spark/commit/5a43594fb8a2fb2885c4d268140f28827a65ff5a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecution.ign...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18419 **[Test build #78600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78600/testReport)** for PR 18419 at commit [`0795c16`](https://github.com/apache/spark/commit/0795c16b4beaf70430e8dc62f135f99ac801960e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18418: [SPARK-19104][SQL] Lambda variables should work when par...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18418 **[Test build #78601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78601/testReport)** for PR 18418 at commit [`bd0221a`](https://github.com/apache/spark/commit/bd0221a6b745be938ade7596658e788dbddbab91). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecution.ign...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18419 cc @rdblue @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecut...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/18419 [SPARK-20213][SQL][follow-up] introduce SQLExecution.ignoreNestedExecutionId ## What changes were proposed in this pull request? in https://github.com/apache/spark/pull/18064, to work around the nested sql execution id issue, we introduced several internal methods in `Dataset`, like `collectInternal`, `countInternal`, `showInternal`, etc., to avoid nested execution id. However, this approach has poor expansibility. When we hit other nested execution id cases, we may need to add more internal methods in `Dataset`. Our goal is to ignore the nested execution id in some cases, and we can have a better approach to achieve this goal, by introducing `SQLExecution.ignoreNestedExecutionId`. Whenever we find a place which needs to ignore the nested execution, we can just wrap the action with `SQLExecution.ignoreNestedExecutionId`, and this is more expansible than the previous approach. The idea comes from https://github.com/apache/spark/pull/17540/files#diff-ab49028253e599e6e74cc4f4dcb2e3a8R57 by @rdblue ## How was this patch tested? existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark follow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18419.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18419 commit 0795c16b4beaf70430e8dc62f135f99ac801960e Author: Wenchen FanDate: 2017-06-26T04:36:59Z introduce SQLExecution.ignoreNestedExecutionId --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18366: [SPARK-20889][SparkR] Grouped documentation for S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18366#discussion_r123925897 --- Diff: R/pkg/R/functions.R --- @@ -635,20 +652,16 @@ setMethod("dayofyear", column(jc) }) -#' decode -#' -#' Computes the first argument into a string from a binary using the provided character set -#' (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). +#' @details +#' \code{decode}: Computes the first argument into a string from a binary using the provided +#' character set. #' -#' @param x Column to compute on. -#' @param charset Character set to use +#' @param charset Character set to use (one of "US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE", +#'"UTF-16LE", "UTF-16"). --- End diff -- Not a big deal as they contain same information. So, just rather a weak opinion - it'd be nicer if we match this to Scala/Python too IMHO or just leave as is. It's also fine to me as is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18366: [SPARK-20889][SparkR] Grouped documentation for S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18366#discussion_r123926400 --- Diff: R/pkg/R/functions.R --- @@ -1503,18 +1491,12 @@ setMethod("skewness", column(jc) }) -#' soundex -#' -#' Return the soundex code for the specified expression. -#' -#' @param x Column to compute on. +#' @details +#' \code{soundex}: Returns the soundex code for the specified expression. #' -#' @rdname soundex -#' @name soundex -#' @family string functions -#' @aliases soundex,Column-method +#' @rdname column_string_functions +#' @aliases soundex soundex,Column-method #' @export -#' @examples \dontrun{soundex(df$c)} --- End diff -- It looks this example is missed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18334 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18418: [SPARK-19104][SQL] Lambda variables should work when par...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18418 **[Test build #78599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78599/testReport)** for PR 18418 at commit [`c417e22`](https://github.com/apache/spark/commit/c417e229f4563a6ee857d7ee55582e3b6ca2ed6b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18405 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78593/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18405 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18405 **[Test build #78593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78593/testReport)** for PR 18405 at commit [`255c50a`](https://github.com/apache/spark/commit/255c50a87051df42933bbd83aea14ccd54c18826). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18418: [SPARK-19104][SQL] Lambda variables should work w...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/18418 [SPARK-19104][SQL] Lambda variables should work when parent expression splits generated codes ## What changes were proposed in this pull request? When an expression using lambda variables split the generated codes, the generated local variables of lambda variables can't be accessed in the generated functions. This patch fixes this issue by adding the lambda variables into the function parameter list. ## How was this patch tested? Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-19104 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18418 commit c417e229f4563a6ee857d7ee55582e3b6ca2ed6b Author: Liang-Chi HsiehDate: 2017-06-26T04:26:00Z Add lambda variables into the parameters of functions generated by splitExpressions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17084 **[Test build #78598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78598/testReport)** for PR 17084 at commit [`60fc2a7`](https://github.com/apache/spark/commit/60fc2a78d4c3e985e91fd14522642d861df58d99). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17084 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17084#discussion_r123925697 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/binary/BinaryConfusionMatrix.scala --- @@ -22,22 +22,22 @@ package org.apache.spark.mllib.evaluation.binary */ private[evaluation] trait BinaryConfusionMatrix { /** number of true positives */ - def numTruePositives: Long + def numTruePositives: Double --- End diff -- good idea, updated the names of the variables --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17084#discussion_r123925523 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -146,11 +160,13 @@ class BinaryClassificationMetrics @Since("1.3.0") ( private lazy val ( cumulativeCounts: RDD[(Double, BinaryLabelCounter)], confusions: RDD[(Double, BinaryConfusionMatrix)]) = { -// Create a bin for each distinct score value, count positives and negatives within each bin, -// and then sort by score values in descending order. -val counts = scoreAndLabels.combineByKey( - createCombiner = (label: Double) => new BinaryLabelCounter(0L, 0L) += label, - mergeValue = (c: BinaryLabelCounter, label: Double) => c += label, +// Create a bin for each distinct score value, count weighted positives and +// negatives within each bin, and then sort by score values in descending order. +val counts = scoreAndLabelsWithWeights.combineByKey( + createCombiner = (labelAndWeight: (Double, Double)) => +new BinaryLabelCounter(0L, 0L) += (labelAndWeight._1, labelAndWeight._2), --- End diff -- updated, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17451: [SPARK-19866][ML][PySpark] Add local version of W...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17451#discussion_r123925184 --- Diff: python/pyspark/ml/feature.py --- @@ -2869,6 +2869,18 @@ def findSynonyms(self, word, num): word = _convert_to_vector(word) return self._call_java("findSynonyms", word, num) +@since("2.2.0") +def findSynonymsTuple(self, word, num): --- End diff -- ```findSynonymsTuple``` -> ```findSynonymsArray```, we should keep the same function name and return type with Scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17451: [SPARK-19866][ML][PySpark] Add local version of W...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17451#discussion_r123925233 --- Diff: python/pyspark/ml/feature.py --- @@ -2869,6 +2869,18 @@ def findSynonyms(self, word, num): word = _convert_to_vector(word) return self._call_java("findSynonyms", word, num) +@since("2.2.0") +def findSynonymsTuple(self, word, num): +""" +Find "num" number of words closest in similarity to "word". +word can be a string or vector representation. +Returns an array with two fields word and similarity (which +gives the cosine similarity). +""" +if not isinstance(word, basestring): +word = _convert_to_vector(word) +return self._call_java("findSynonymsTuple", word, num) + --- End diff -- We need to convert result back to array of tuple, which would be consistent with Scala output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17451: [SPARK-19866][ML][PySpark] Add local version of W...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17451#discussion_r123925086 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -274,6 +274,31 @@ class Word2VecModel private[ml] ( wordVectors.findSynonyms(word, num) } + /** + * Find "num" number of words whose vector representation is most similar to the supplied vector. + * If the supplied vector is the vector representation of a word in the model's vocabulary, + * that word will be in the results. + * @return a tuple of the words list and the cosine similarities list between the synonyms given + * word vector. + */ + @Since("2.2.0") + def findSynonymsTuple(vec: Vector, num: Int): (Array[String], Array[Double]) = { +val result = findSynonymsArray(vec, num) +(result.map(e => e._1), result.map(e => e._2)) + } + + /** + * Find "num" number of words closest in similarity to the given word, not + * including the word itself. + * @return a tuple of the words list and the cosine similarities list between the synonyms given + * word vector. + */ + @Since("2.2.0") + def findSynonymsTuple(word: String, num: Int): (Array[String], Array[Double]) = { --- End diff -- Ditto, should be private. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17451: [SPARK-19866][ML][PySpark] Add local version of W...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17451#discussion_r123925064 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -274,6 +274,31 @@ class Word2VecModel private[ml] ( wordVectors.findSynonyms(word, num) } + /** + * Find "num" number of words whose vector representation is most similar to the supplied vector. + * If the supplied vector is the vector representation of a word in the model's vocabulary, + * that word will be in the results. + * @return a tuple of the words list and the cosine similarities list between the synonyms given + * word vector. + */ + @Since("2.2.0") + def findSynonymsTuple(vec: Vector, num: Int): (Array[String], Array[Double]) = { --- End diff -- This should be private. Meanwhile, add annotation to clarify this is only the Java stubs for the Python bindings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17084#discussion_r123925437 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -41,13 +41,27 @@ import org.apache.spark.sql.DataFrame *partition boundaries. */ @Since("1.0.0") -class BinaryClassificationMetrics @Since("1.3.0") ( -@Since("1.3.0") val scoreAndLabels: RDD[(Double, Double)], -@Since("1.3.0") val numBins: Int) extends Logging { +class BinaryClassificationMetrics @Since("2.2.0") ( +val numBins: Int, +@Since("2.2.0") val scoreAndLabelsWithWeights: RDD[(Double, (Double, Double))]) + extends Logging { require(numBins >= 0, "numBins must be nonnegative") /** + * Retrieves the score and labels (for binary compatibility). + * @return The score and labels. + */ + @Since("1.0.0") --- End diff -- good catch, updated version for both: 1.) def scoreAndLabels: RDD[(Double, Double)] = 2.) def this(@Since("1.3.0") scoreAndLabels: RDD[(Double, Double)], @Since("1.3.0") numBins: Int) = ... to 1.3.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17084#discussion_r123925177 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -77,12 +87,16 @@ class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0") override va SchemaUtils.checkNumericType(schema, $(labelCol)) // TODO: When dataset metadata has been implemented, check rawPredictionCol vector length = 2. -val scoreAndLabels = - dataset.select(col($(rawPredictionCol)), col($(labelCol)).cast(DoubleType)).rdd.map { -case Row(rawPrediction: Vector, label: Double) => (rawPrediction(1), label) -case Row(rawPrediction: Double, label: Double) => (rawPrediction, label) +val scoreAndLabelsWithWeights = + dataset.select(col($(rawPredictionCol)), col($(labelCol)).cast(DoubleType), +if (!isDefined(weightCol) || $(weightCol).isEmpty) lit(1.0) else col($(weightCol))) --- End diff -- added check for numeric type and did cast to Double, similar to labelCol --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17084#discussion_r123924688 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -36,12 +36,18 @@ import org.apache.spark.sql.types.DoubleType @Since("1.2.0") @Experimental class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0") override val uid: String) - extends Evaluator with HasRawPredictionCol with HasLabelCol with DefaultParamsWritable { + extends Evaluator with HasRawPredictionCol with HasLabelCol +with HasWeightCol with DefaultParamsWritable { @Since("1.2.0") def this() = this(Identifiable.randomUID("binEval")) /** + * Default number of bins to use for binary classification evaluation. + */ + val defaultNumberOfBins = 1000 --- End diff -- It seemed like a good default value to use - for graphing ROC curve, it's not too large for most plots, but it's not so small that the graph would be jagged. The user can always specify a value to override the default. However, it's usually not a good idea to sort over the entire label/score values, since the dataset will probably be very large, the operation will be very slow, and when visualizing the data there won't be any difference, so by default we should try to discourage the user from not down-sampling the number of bins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17084 **[Test build #78597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78597/testReport)** for PR 17084 at commit [`cf59c62`](https://github.com/apache/spark/commit/cf59c62f272ade192dfbf28ab53881251ea0d95e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/18174#discussion_r123924134 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -123,6 +126,8 @@ this.inMemSorter = new ShuffleInMemorySorter( this, initialSize, conf.getBoolean("spark.shuffle.sort.useRadixSort", true)); this.peakMemoryUsedBytes = getMemoryUsage(); +this.diskWriteBufferSize = +conf.getInt("spark.shuffle.spill.diskWriteBufferSize", DISK_WRITE_BUFFER_SIZE); --- End diff -- @cloud-fan thanks for review it. Thank you for your advice. I tried to fix it. However, org.apache.spark.internal.config cannot be imported in the ShuffleExternalSorter.java class This modification will change the org.apache.spark.internal.config and affect other code changes, so I suggest using the PR to modify it. Do you agree? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18334 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18334 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78592/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18334 **[Test build #78592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78592/testReport)** for PR 18334 at commit [`5a43594`](https://github.com/apache/spark/commit/5a43594fb8a2fb2885c4d268140f28827a65ff5a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/18174#discussion_r123923859 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -82,6 +82,9 @@ /** The buffer size to use when writing spills using DiskBlockObjectWriter */ private final int fileBufferSizeBytes; + /** The buffer size to use when writes the sorted records to an on-disk file */ --- End diff -- @jiangxb1987 thanks for review it, The UnsafeSorterSpillWriter changes were updated. please review it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18417: [INFRA] Close stale PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18417 Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18417: [INFRA] Close stale PRs
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/18417 an you please keep 17084 open? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17084 yes, will update the PR, thanks for the ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18368 **[Test build #78596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78596/testReport)** for PR 18368 at commit [`fc2b7c0`](https://github.com/apache/spark/commit/fc2b7c02fab7f570ae3ca080ae1c2c9502300de7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18346 ping @cloud-fan any more feedback on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11994 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78595/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11994 **[Test build #78595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78595/testReport)** for PR 11994 at commit [`15c79f2`](https://github.com/apache/spark/commit/15c79f26aae206a390ae5609d911bd8f0ad6). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11994 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/18368 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18235: [SPARK-21012][Submit] Add glob support for resour...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/18235#discussion_r123922136 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -310,33 +310,28 @@ object SparkSubmit extends CommandLineUtils { RPackageUtils.checkAndBuildRPackage(args.jars, printStream, args.verbose) } -// In client mode, download remote files. -if (deployMode == CLIENT) { - val hadoopConf = new HadoopConfiguration() - args.primaryResource = Option(args.primaryResource).map(downloadFile(_, hadoopConf)).orNull - args.jars = Option(args.jars).map(downloadFileList(_, hadoopConf)).orNull - args.pyFiles = Option(args.pyFiles).map(downloadFileList(_, hadoopConf)).orNull - args.files = Option(args.files).map(downloadFileList(_, hadoopConf)).orNull -} +val hadoopConf = new HadoopConfiguration() +val targetDir = Files.createTempDirectory("tmp").toFile --- End diff -- From my understanding currently no code is responsible for deleting, let me check the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11994 **[Test build #78595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78595/testReport)** for PR 11994 at commit [`15c79f2`](https://github.com/apache/spark/commit/15c79f26aae206a390ae5609d911bd8f0ad6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18305: [SPARK-20988][ML] Logistic regression uses aggregator hi...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18305 @sethah I will take a look in a few days after some backlog, thanks for your patience. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/11994 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9518: [SPARK-11574][Core] Add metrics StatsD sink
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9518 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9518: [SPARK-11574][Core] Add metrics StatsD sink
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78588/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9518: [SPARK-11574][Core] Add metrics StatsD sink
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9518 **[Test build #78588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78588/testReport)** for PR 9518 at commit [`1d50f6f`](https://github.com/apache/spark/commit/1d50f6f5237ca01f7611677795d19e4975244316). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123921193 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala --- @@ -81,7 +83,8 @@ private[classification] trait MultilayerPerceptronParams extends PredictorParams final val solver: Param[String] = new Param[String](this, "solver", "The solver algorithm for optimization. Supported options: " + s"${MultilayerPerceptronClassifier.supportedSolvers.mkString(", ")}. (Default l-bfgs)", - ParamValidators.inArray[String](MultilayerPerceptronClassifier.supportedSolvers)) +(value: String) => MultilayerPerceptronClassifier.supportedSolvers + .contains(value.toLowerCase(Locale.ROOT))) --- End diff -- What do you think of adding a new function in ```object ParamValidators``` as ``` def inStringArray(allowed: Array[String]): String => Boolean = { (value: String) => allowed.contains(value.toLowerCase(java.util.Locale.ROOT)) } ``` to facilitate similar check here and other place. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123920923 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -313,7 +313,11 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val * @group setParam */ @Since("2.0.0") - def setSolver(value: String): this.type = set(solver, value) + def setSolver(value: String): this.type = { +require("irls" == value.toLowerCase(Locale.ROOT), + s"Solver $value was not supported. Supported options: irls") +set(solver, value) + } --- End diff -- Actually we can't do this, since MLlib supports set params via other entrances. Currently we can leave as it is, until we resolved #16028. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123919667 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -128,7 +130,8 @@ private[feature] trait ChiSqSelectorParams extends Params final val selectorType = new Param[String](this, "selectorType", "The selector type of the ChisqSelector. " + "Supported options: " + OldChiSqSelector.supportedSelectorTypes.mkString(", "), - ParamValidators.inArray[String](OldChiSqSelector.supportedSelectorTypes)) +(value: String) => OldChiSqSelector.supportedSelectorTypes.map(_.toLowerCase(Locale.ROOT)) --- End diff -- Supported selector types should always be stored as lower case, please update corresponding code snippet in ```mllib.feature.ChiSqSelector``` from: ``` private[spark] val NumTopFeatures: String = "numTopFeatures" .. ``` to ``` private[spark] val NumTopFeatures: String = "numTopFeatures".toLowerCase .. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123919494 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala --- @@ -48,10 +50,11 @@ final class RegressionEvaluator @Since("1.4.0") (@Since("1.4.0") override val ui * @group param */ @Since("1.4.0") - val metricName: Param[String] = { -val allowedParams = ParamValidators.inArray(Array("mse", "rmse", "r2", "mae")) -new Param(this, "metricName", "metric name in evaluation (mse|rmse|r2|mae)", allowedParams) - } + val metricName: Param[String] = new Param[String](this, "metricName", "metric name in" + +" evaluation (mse|rmse|r2|mae)", +(value: String) => Array("mse", "rmse", "r2", "mae") --- End diff -- Ditto. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123888757 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -46,11 +48,10 @@ class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0") override va * @group param */ @Since("1.2.0") - val metricName: Param[String] = { -val allowedParams = ParamValidators.inArray(Array("areaUnderROC", "areaUnderPR")) -new Param( - this, "metricName", "metric name in evaluation (areaUnderROC|areaUnderPR)", allowedParams) - } + val metricName: Param[String] = new Param[String](this, "metricName", "metric name in" + +" evaluation (areaUnderROC|areaUnderPR)", +(value: String) => Array("areaunderroc", "areaunderpr").contains( + value.toLowerCase(Locale.ROOT))) --- End diff -- Could we organize as ``` val AreaUnderROC: String = "areaUnderROC".toLowerCase val AreaUnderPR: String = "areaUnderPR".toLowerCase val supportedMetricNames = Set(AreaUnderROC, AreaUnderPR) ``` in ```object BinaryClassificationEvaluator```? This should be more clear. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123920437 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -45,7 +47,7 @@ private[feature] trait ImputerParams extends Params with HasInputCols { final val strategy: Param[String] = new Param(this, "strategy", s"strategy for imputation. " + s"If ${Imputer.mean}, then replace missing values using the mean value of the feature. " + s"If ${Imputer.median}, then replace missing values using the median value of the feature.", -ParamValidators.inArray[String](Array(Imputer.mean, Imputer.median))) +(value: String) => Array(Imputer.mean, Imputer.median).contains(value.toLowerCase(Locale.ROOT))) --- End diff -- Ditto. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123919458 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -44,12 +46,10 @@ class MulticlassClassificationEvaluator @Since("1.5.0") (@Since("1.5.0") overrid * @group param */ @Since("1.5.0") - val metricName: Param[String] = { -val allowedParams = ParamValidators.inArray(Array("f1", "weightedPrecision", - "weightedRecall", "accuracy")) -new Param(this, "metricName", "metric name in evaluation " + - "(f1|weightedPrecision|weightedRecall|accuracy)", allowedParams) - } + val metricName: Param[String] = new Param[String](this, "metricName", "metric name in" + +" evaluation (f1|weightedPrecision|weightedRecall|accuracy)", +(value: String) => Array("f1", "weightedprecision", "weightedrecall", "accuracy") + .contains(value.toLowerCase(Locale.ROOT))) --- End diff -- Ditto. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r123920352 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -70,6 +71,10 @@ class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0") override va setDefault(metricName -> "areaUnderROC") + private def getFormattedMetricName = --- End diff -- Is this really necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18414: [SPARK-21169] [core] Make sure to update application sta...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18414 @srini-daruna I think I already addressed this issue in SPARK-12552, here is the [code](https://github.com/srini-daruna/spark/blob/b3ea3358a7bf55cedaa5cd7d08860bc625e83cd2/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L552). Did you test with SPARK-12552 in or not? Also is that fix not enough to address the problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11994 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11994 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78594/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11994 **[Test build #78594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78594/testReport)** for PR 11994 at commit [`15c79f2`](https://github.com/apache/spark/commit/15c79f26aae206a390ae5609d911bd8f0ad6). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18416: [SPARK-21204][SQL][WIP] Add support for Scala Set collec...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18416 cc @cloud-fan I'd like to hear your opinion about this `Set` support. Can you provide some insights? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18416: [SPARK-21204][SQL][WIP] Add support for Scala Set...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18416#discussion_r123920728 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -992,6 +1123,128 @@ case class ExternalMapToCatalyst private( } } +object ExternalSetToCatalystArray { + private val curId = new java.util.concurrent.atomic.AtomicInteger() + + def apply( + inputSet: Expression, + elementType: DataType, + elementConverter: Expression => Expression, + elementNullable: Boolean): ExternalSetToCatalystArray = { +val id = curId.getAndIncrement() +val elementName = "ExternalSetToCatalystArray_element" + id +val elementIsNull = "ExternalSetToCatalystArray_element_isNull" + id + +ExternalSetToCatalystArray( + elementName, + elementIsNull, + elementType, + elementConverter(LambdaVariable(elementName, elementIsNull, elementType, elementNullable)), + inputSet +) + } +} + +/** + * Converts a Scala/Java set object into catalyst array format, by applying the converter when + * iterate the set. + * + * @param element the name of the set element variable that used when iterate the set, and used as + *input for the `elementConverter` + * @param elementIsNull the nullability of the element variable that used when iterate the set, and + *used as input for the `elementConverter` + * @param elementType the data type of the element variable that used when iterate the set, and + * used as input for the `elementConverter` + * @param elementConverter A function that take the `element` as input, and converts it to catalyst + * array format. + * @param child An expression that when evaluated returns the input set object. + */ +case class ExternalSetToCatalystArray private( +element: String, +elementIsNull: String, +elementType: DataType, +elementConverter: Expression, +child: Expression) + extends UnaryExpression with NonSQLExpression { + + override def foldable: Boolean = false + + override def dataType: ArrayType = ArrayType( +elementType = elementConverter.dataType, containsNull = elementConverter.nullable) + + override def eval(input: InternalRow): Any = +throw new UnsupportedOperationException("Only code-generated evaluation is supported") + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val inputSet = child.genCode(ctx) +val genElementConverter = elementConverter.genCode(ctx) +val length = ctx.freshName("length") +val index = ctx.freshName("index") + +val iter = ctx.freshName("iter") +val (defineIterator, defineElement) = child.dataType match { + case ObjectType(cls) if classOf[java.util.Set[_]].isAssignableFrom(cls) => +val javaIteratorCls = classOf[java.util.Iterator[_]].getName --- End diff -- I'd prefer to leave java set support to other PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123918251 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -92,3 +92,8 @@ select abs(-3.13), abs('-2.19'); -- positive/negative select positive('-1.11'), positive(-1.11), negative('-1.11'), negative(-1.11); + +-- width_bucket +select width_bucket(5.35, 0.024, 10.06, 5); +select width_bucket(5.35, 0.024, 10.06, -5); --- End diff -- add a case for wrong input type: `select width_bucket(5.35, 0.024, 10.06, 0.5);` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123918240 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created --- End diff -- nit: id -> is --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123919502 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. For example: + * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2. --- End diff -- Let's remove these examples in the description, they are just corner cases. My previous comment was just to make sure both ends should be included. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123919350 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. For example: + * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2. + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} --- End diff -- Do we consider minValue == maxValue and numBucket > 1 valid input or not? Please also add a test case for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18405 **[Test build #78593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78593/testReport)** for PR 18405 at commit [`255c50a`](https://github.com/apache/spark/commit/255c50a87051df42933bbd83aea14ccd54c18826). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123920115 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); --- End diff -- Yes, This is what I worry about. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11994 **[Test build #78594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78594/testReport)** for PR 11994 at commit [`15c79f2`](https://github.com/apache/spark/commit/15c79f26aae206a390ae5609d911bd8f0ad6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18405 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123919660 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); + 2009-02-01. + > SELECT _FUNC_('2015-10-27', 'YEAR'); + 2015-01-01 + > SELECT _FUNC_('1989-03-13'); + 1989-03-01 + > SELECT _FUNC_(1234567891.1234567891, 4); + 1234567891.1234 + > SELECT _FUNC_(1234567891.1234567891, -4); + 123456 + > SELECT _FUNC_(1234567891.1234567891); + 1234567891 + """) +// scalastyle:on line.size.limit +case class Trunc(data: Expression, format: Expression) + extends BinaryExpression with ExpectsInputTypes { + + def this(data: Expression) = { +this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 0)) + } + + override def left: Expression = data + override def right: Expression = format + + override def dataType: DataType = data.dataType + + override def inputTypes: Seq[AbstractDataType] = dataType match { +case NullType => Seq(dataType, TypeCollection(StringType, IntegerType)) +case DateType => Seq(dataType, StringType) +case DoubleType | DecimalType.Fixed(_, _) => Seq(dataType, IntegerType) +case _ => Seq(TypeCollection(DateType, DoubleType, DecimalType), --- End diff -- Add this case to show all supported type: ``` > select trunc(false, 'MON'); Error in query: cannot resolve 'trunc(false, 'MON')' due to data type mismatch: argument 1 requires (date or double or decimal) type, however, 'false' is of boolean type.; line 1 pos 7; 'Project [unresolvedalias(trunc(false, MON), None)] +- OneRowRelation$ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18405 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18405 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78590/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18405 **[Test build #78590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78590/testReport)** for PR 18405 at commit [`255c50a`](https://github.com/apache/spark/commit/255c50a87051df42933bbd83aea14ccd54c18826). * This patch **fails due to an unknown error code, -10**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18416: [SPARK-21204][SQL][WIP] Add support for Scala Set collec...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18416 Currently I can't think of possible issues of serializing `Set` as array. But welcome comments to point any possible issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org