[GitHub] spark issue #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs with Inte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15457 **[Test build #66869 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66869/consoleFull)** for PR 15457 at commit [`9f7db6f`](https://github.com/apache/spark/commit/9f7db6f0ea0831669e92ff2fe5231085e4e71895). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15416: [SPARK-17849] [SQL] Fix NPE problem when using grouping ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15416 **[Test build #3337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3337/consoleFull)** for PR 15416 at commit [`69f6e4f`](https://github.com/apache/spark/commit/69f6e4f1bc37afd6b3ca529c8b0f0afec891459a). * This patch **fails Scala style tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user loneknightpy commented on the issue: https://github.com/apache/spark/pull/15285 @tdas Based on our offline discussion, I added file size cache for the compressed log files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15416: [SPARK-17849] [SQL] Fix NPE problem when using grouping ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15416 **[Test build #3337 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3337/consoleFull)** for PR 15416 at commit [`69f6e4f`](https://github.com/apache/spark/commit/69f6e4f1bc37afd6b3ca529c8b0f0afec891459a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15452: minor doc fix for Row.scala
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15452 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15414 **[Test build #66872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66872/consoleFull)** for PR 15414 at commit [`7e2d501`](https://github.com/apache/spark/commit/7e2d501c951d6a3f7250156619979d29c080dc4b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15452: minor doc fix for Row.scala
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15452 Merging in master/branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15285 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66865/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15285 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15285 **[Test build #66865 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66865/consoleFull)** for PR 15285 at commit [`bd47bd4`](https://github.com/apache/spark/commit/bd47bd46962f6e7ee0bdf1bdfa5e777a506dd506). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15432 Given different databases diverge (they don't even have the same function names), I think it's fine to just have null be treated as 0 like Hive/MySQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15456: [SPARK-17686][Core] Support printing out scala an...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15456#discussion_r83146972 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -104,6 +104,8 @@ object SparkSubmit { /___/ .__/\_,_/_/ /_/\_\ version %s /_/ """.format(SPARK_VERSION)) +printStream.println("Using Scala %s (%s, Java %s)".format( --- End diff -- Thanks Reynold for your comments. I will change it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15432 Oh, strictly, it does not ignore in case of `PostgreSQL`. It unsets. ```sql postgres=# SELECT setseed(0); setseed - (1 row) postgres=# SELECT random(); random --- 0.840187716763467 (1 row) postgres=# SELECT random(); random --- 0.394382926635444 (1 row) postgres=# SELECT setseed(null); setseed - (1 row) postgres=# SELECT random(); random --- 0.783099223393947 (1 row) postgres=# SELECT random(); random --- 0.798440033104271 (1 row) postgres=# SELECT setseed(0); setseed - (1 row) postgres=# SELECT random(); random --- 0.840187716763467 (1 row) postgres=# SELECT random(); random --- 0.394382926635444 (1 row) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15230: [SPARK-17657] [SQL] Disallow Users to Change Table Type
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15230 LGTM except one minor comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r83146607 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import scala.util.Random + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.linalg.{Vector, VectorUDT} +import org.apache.spark.ml.param.{IntParam, ParamMap, ParamValidators} +import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} +import org.apache.spark.ml.util.SchemaUtils +import org.apache.spark.sql._ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ + +/** + * :: Experimental :: + * Params for [[LSH]]. + */ +@Since("2.1.0") +private[ml] trait LSHParams extends HasInputCol with HasOutputCol { + /** + * Param for the dimension of LSH OR-amplification. + * + * In this implementation, we use LSH OR-amplification to reduce the false negative rate. The + * higher the dimension is, the lower the false negative rate. + * @group param + */ + @Since("2.1.0") + final val outputDim: IntParam = new IntParam(this, "outputDim", "output dimension, where" + +"increasing dimensionality lowers the false negative rate, and decreasing dimensionality" + +" improves the running performance", ParamValidators.gt(0)) + + /** @group getParam */ + @Since("2.1.0") + final def getOutputDim: Int = $(outputDim) + + /** + * Transform the Schema for LSH + * @param schema The schema of the input dataset without [[outputCol]] + * @return A derived schema with [[outputCol]] added + */ + @Since("2.1.0") + protected[this] final def validateAndTransformSchema(schema: StructType): StructType = { +SchemaUtils.appendColumn(schema, $(outputCol), new VectorUDT) + } +} + +/** + * :: Experimental :: + * Model produced by [[LSH]]. + */ +@Experimental +@Since("2.1.0") +private[ml] abstract class LSHModel[T <: LSHModel[T]] extends Model[T] with LSHParams { + self: T => + + @Since("2.1.0") + override def copy(extra: ParamMap): T = defaultCopy(extra) + + /** + * The hash function of LSH, mapping a predefined KeyType to a Vector + * @return The mapping of LSH function. + */ + @Since("2.1.0") + protected[this] val hashFunction: Vector => Vector + + /** + * Calculate the distance between two different keys using the distance metric corresponding + * to the hashFunction + * @param x One input vector in the metric space + * @param y One input vector in the metric space + * @return The distance between x and y + */ + @Since("2.1.0") + protected[ml] def keyDistance(x: Vector, y: Vector): Double + + /** + * Calculate the distance between two different hash Vectors. + * + * @param x One of the hash vector + * @param y Another hash vector + * @return The distance between hash vectors x and y + */ + @Since("2.1.0") + protected[ml] def hashDistance(x: Vector, y: Vector): Double + + @Since("2.1.0") + override def transform(dataset: Dataset[_]): DataFrame = { +transformSchema(dataset.schema, logging = true) +val transformUDF = udf(hashFunction, new VectorUDT) +dataset.withColumn($(outputCol), transformUDF(dataset($(inputCol + } + + @Since("2.1.0") + override def transformSchema(schema: StructType): StructType = { +validateAndTransformSchema(schema) + } + + /** + * Given a large dataset and an item, approximately find at most k items which have the closest + * distance to the item. If the [[outputCol]] is missing, the method will transform the data; if + * the [[outputCol]] exists, it
[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15427 Thanks for review! @rxin @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropd...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15427 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15427 LGTM, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15272: [SPARK-17698] [SQL] Join predicates should not contain f...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15272 hm looks like another legitimate failing test too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14702 @tejasapatil looks like there is a legitimate failing test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15458 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15458 **[Test build #66871 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66871/consoleFull)** for PR 15458 at commit [`e821f1a`](https://github.com/apache/spark/commit/e821f1a9d19215fe180ffcbd8183aabd1185a316). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15458: [SPARK-17899][SQL] add a debug mode to keep raw t...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/15458 [SPARK-17899][SQL] add a debug mode to keep raw table properties in HiveExternalCatalog ## What changes were proposed in this pull request? Currently `HiveExternalCatalog` will filter out the Spark SQL internal table properties, e.g. `spark.sql.sources.provider`, `spark.sql.sources.schema`, etc. This is reasonable for external users as they don't want to see these internal properties in `DESC TABLE`. However, as a Spark developer, sometimes we do wanna see the raw table properties. This PR adds a new internal SQL conf, `spark.sql.debug`, to enable debug mode and keep these raw table properties. This config can also be used in similar places where we wanna retain debug information in the future. ## How was this patch tested? new test in MetastoreDataSourcesSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark debug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15458.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15458 commit e821f1a9d19215fe180ffcbd8183aabd1185a316 Author: Wenchen FanDate: 2016-10-13T04:21:01Z add a debug mode to keep raw table properties in HiveExternalCatalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15458 cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15432 Now, at least we have four options, when users setting `NULL` as a seed number for `rand`: 1. Hive/MySQL - `NULL` is equivalent to `0` 2. DB2 - when the seed is `NULL`, `rand` returns `NULL` 3. PostgreSQL - when the seed is `NULL`, ignore it. 4. SparkSQL - does not allow it. I do not have a strong opinion. Maybe @rxin need to make a decision. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15437: [SPARK-17876] Write StructuredStreaming WAL to a ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15437 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15432 Ah, PostgreSQL seems unsetting the seed. ``` postgres=# SELECT setseed(0), random(), random(); setseed | random | random -+---+--- | 0.840187716763467 | 0.394382926635444 (1 row) postgres=# SELECT setseed(0), random(), random(); setseed | random | random -+---+--- | 0.840187716763467 | 0.394382926635444 (1 row) postgres=# SELECT setseed(null), random(), random(); setseed | random | random -+---+--- | 0.783099223393947 | 0.798440033104271 (1 row) postgres=# SELECT setseed(null), random(), random(); setseed | random | random -+---+--- | 0.911647357512265 | 0.197551369201392 (1 row) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15437: [SPARK-17876] Write StructuredStreaming WAL to a stream ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15437 LGTM. Thanks! Merging to master and 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15456 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66862/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15456 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15456 **[Test build #66862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66862/consoleFull)** for PR 15456 at commit [`98e7015`](https://github.com/apache/spark/commit/98e70150f26ee6d1fd0e587b59ba7467d70dcfe3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15365: [SPARK-17157][SPARKR]: Add multiclass logistic re...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15365#discussion_r83143464 --- Diff: R/pkg/R/mllib.R --- @@ -647,6 +654,195 @@ setMethod("predict", signature(object = "KMeansModel"), predict_internal(object, newData) }) +#' Logistic Regression Model +#' +#' Fits an logistic regression model against a Spark DataFrame. It supports "binomial": Binary logistic regression +#' with pivoting; "multinomial": Multinomial logistic (softmax) regression without pivoting, similar to glmnet. +#' Users can print, make predictions on the produced model and save the model to the input path. +#' +#' @param data SparkDataFrame for training +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#' @param regParam the regularization parameter. Default is 0.0. +#' @param elasticNetParam the ElasticNet mixing parameter. For alpha = 0, the penalty is an L2 penalty. +#'For alpha = 1, it is an L1 penalty. For 0 < alpha < 1, the penalty is a combination +#'of L1 and L2. Default is 0.0 which is an L2 penalty. +#' @param maxIter maximum iteration number. +#' @param tol convergence tolerance of iterations. +#' @param fitIntercept whether to fit an intercept term. Default is TRUE. +#' @param family the name of family which is a description of the label distribution to be used in the model. +#' Supported options: +#' - "auto": Automatically select the family based on the number of classes: +#' If numClasses == 1 || numClasses == 2, set to "binomial". +#' Else, set to "multinomial". +#' - "binomial": Binary logistic regression with pivoting. +#' - "multinomial": Multinomial logistic (softmax) regression without pivoting. +#' Default is "auto". +#' @param standardization whether to standardize the training features before fitting the model. The coefficients +#'of models will be always returned on the original scale, so it will be transparent for +#'users. Note that with/without standardization, the models should be always converged +#'to the same solution when no regularization is applied. Default is TRUE, same as glmnet. +#' @param threshold in binary classification, in range [0, 1]. If the estimated probability of class label 1 +#' is > threshold, then predict 1, else 0. A high threshold encourages the model to predict 0 +#' more often; a low threshold encourages the model to predict 1 more often. Note: Setting this with +#' threshold p is equivalent to setting thresholds (Array(1-p, p)). When threshold is set, any user-set +#' value for thresholds will be cleared. If both threshold and thresholds are set, then they must be +#' equivalent. Default is 0.5. +#' @param thresholds in multiclass (or binary) classification to adjust the probability of predicting each class. +#' Array must have length equal to the number of classes, with values > 0, excepting that at most one +#' value may be 0. The class with largest value p/t is predicted, where p is the original probability +#' of that class and t is the class's threshold. Note: When thresholds is set, any user-set +#' value for threshold will be cleared. If both threshold and thresholds are set, then they must be +#' equivalent. Default is NULL. +#' @param weightCol The weight column name. +#' @param aggregationDepth depth for treeAggregate (>= 2). If the dimensions of features or the number of partitions +#' are large, this param could be adjusted to a larger size. Default is 2. +#' @param ... additional arguments passed to the method. +#' @return \code{spark.logit} returns a fitted logistic regression model +#' @rdname spark.logit +#' @aliases spark.logit,SparkDataFrame,formula-method +#' @name spark.logit +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' # binary logistic regression +#' label <- c(1.0, 1.0, 1.0, 0.0, 0.0) +#' feature <- c(1.1419053, 0.9194079, -0.9498666, -1.1069903, 0.2809776) +#' binary_data <- as.data.frame(cbind(label, feature)) +#' binary_df <- suppressWarnings(createDataFrame(binary_data)) +#' blr_model <- spark.logit(binary_df, label ~ feature, threshold = 1.0) +#' blr_predict <- collect(select(predict(blr_model,
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15432 What is the behavior of `PostgreSQL`? Treating `NULL` as zero? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15432 Not urgent but in my experience such PR have been being hold. So, I am trying to fix the problem specified in the JIRA only rather than fixing others together. @srowen said "I'm not even sure that's a bug.." but "... reasonable to try to follow it.". At least, all the implementations of DB2, MySQL and PostgreSQL do not throw an exception but it defines its own behaviour. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user tnachen commented on the issue: https://github.com/apache/spark/pull/12933 I just tried running it locally and I'm getting the same error. It seems like with your change that test is simply declining the offer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15449: [SPARK-17884][SQL] To resolve Null pointer exception whe...
Github user priyankagargnitk commented on the issue: https://github.com/apache/spark/pull/15449 Thanks rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15432 At first, we do not strictly follow Hive. You can easily find many in Spark. I do not think this is an urgent JIRA, right? Like what @srowen replied in the JIRA, he does not think this is a bug. The existing output message looks reasonable to me too. ``` Input argument to rand must be an integer literal.;; line 1 pos 0 ``` Setting the seed as `null` also looks weird to me. DB2 and Oracle have free versions to download. You can easily install the docker versions. You also can google their documentation. What we need to do at first is to do an investigation to save the times of all the other reviewers; otherwise, they have to do it too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15408 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15408 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66860/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15408 **[Test build #66860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66860/consoleFull)** for PR 15408 at commit [`b74fb36`](https://github.com/apache/spark/commit/b74fb36de321fd03b48f0a6b9b772589df3d84b9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15432 Strictly, the JIRA describes handling `null` and we might not have to generalize the cases further. > it will failed when do select rand(null) Also, I would like to add the edge cases here but I'd like to avoid PR is being hold. As not all the things have a standard to follow, we can define the behaviour here. I don't have access to Oracle and DB2. Do you think Hive, PostgreSQL and MySQL examples are not enough? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66870/consoleFull)** for PR 9766 at commit [`8171b85`](https://github.com/apache/spark/commit/8171b8515107ea66fa277c52823167d206b4756a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15432 Unfortunately, not all the things have a standard to follow. That is why I suggested you to do a research about it. Like Oracle, it does not have such a function in their SQL-function list: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions001.htm Since you are doing the change in `rand`, I think you can check whether the existing `rand` behaves as expected and adds the missing test cases if needed. This JIRA is just trying to cover an edge case of a seed number. Why not checking whether we appropriately handle all the cases? Then, we do not need to submit more small fixes for `rand`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r83141382 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -616,6 +617,44 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat client.getPartition(db, table, spec) } + override def listPartitionsByFilter( + db: String, + table: String, + predicates: Seq[Expression]): Seq[CatalogTablePartition] = withClient { +val catalogTable = client.getTable(db, table) +val partitionColumnNames = catalogTable.partitionColumnNames.toSet +val nonPartitionPruningPredicates = predicates.filterNot { + _.references.map(_.name).toSet.subsetOf(partitionColumnNames) +} + +if (nonPartitionPruningPredicates.nonEmpty) { +sys.error("Expected only partition pruning predicates: " + + predicates.reduceLeft(And)) +} + +val partitionSchema = catalogTable.partitionSchema + +if (predicates.nonEmpty) { + val clientPrunedPartitions = +client.getPartitionsByFilter(catalogTable, predicates) + val boundPredicate = +InterpretedPredicate.create(predicates.reduce(And).transform { + case att: AttributeReference => +val index = partitionSchema.indexWhere(_.name == att.name) --- End diff -- I tested this with unit tests from two test suites on two branches. The first test suite was `SQLQuerySuite` from the Hive codebase, specifically the test "SPARK-10562: partition by column with mixed case name". The second test suite was (a modified) `ParquetMetastoreSuite`. I modified the name of the partition column in the partitioned tables in the latter suite from `p` to `pQ`. The two branches on which I tested were this PR and commit 8d33e1e from the master branch. The first test suite passed on both branches. I guess that's to be expected since our Jenkins bot has been reporting it as passed. The second suite failed (as modified) on both branches. In both branches, Spark SQL failed to find the partitions on-disk. This makes me wonder: 1. Is this a known/accepted limitation? 1. If unknown, is this an acceptable limitation or a bug to be fixed? The best I found regarding support for mixed-case partition columns was in https://issues.apache.org/jira/browse/SPARK-10562. Unlike in the first test (which uses the `saveAsTable` method), the tables in `ParquetMetastoreSuite` are built with SQL DDL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs with Inte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15457 **[Test build #66869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66869/consoleFull)** for PR 15457 at commit [`9f7db6f`](https://github.com/apache/spark/commit/9f7db6f0ea0831669e92ff2fe5231085e4e71895). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66868/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66868 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66868/consoleFull)** for PR 9766 at commit [`00f65cd`](https://github.com/apache/spark/commit/00f65cde80b15c174183a52707643642a2bcf7b8). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15435 So, based on my interpretation of this and how this can actually work, we need to have: scala sealed trait LogisticRegressionSummary sealed trait LogisticRegressionTrainingSummary class MulticlassLogisticRegressionSummary extends LogisticRegressionSummary class MulticlassLogisticRegressionTrainingSummary extends MulticlassLogisticRegressionSummary with LogisticRegressionTrainingSummary class BinaryLogisticRegressionSummary extends MulticlassLogisticRegressionSummary class BinaryLogisticRegressionTrainingSummary extends BinaryLogisticRegressionSummary with LogisticRegressionTrainingSummary Then, in `LogisticRegressionModel` we have: scala def summary: LogisticRegressionTrainingSummary def binarySummary: BinaryLogisticRegressionTrainingSummary = summary match { case b: BinaryLogisticRegressionTrainingSummary => b case _ => throw new Exception() } And we avoid downcasting in the summary case since `MulticlassLogisticRegressionSummary` only implements the methods defined in the trait. Otherwise, we would have to downcast to get access to those methods. Then if the summary is binary, you can just call binary summary. Anyway, I got this to compile, and if there is some other way, I'm not seeing it. Would really like to get some clarification from @jkbradley. Not sure if @feynmanliang is still involved with Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66868/consoleFull)** for PR 9766 at commit [`00f65cd`](https://github.com/apache/spark/commit/00f65cde80b15c174183a52707643642a2bcf7b8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs wi...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/15457 [SPARK-17830][SQL] Annotate remaining SQL APIs with InterfaceStability ## What changes were proposed in this pull request? This patch annotates all the remaining APIs in SQL (excluding streaming) with InterfaceStability. ## How was this patch tested? N/A - just annotation change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-17830-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15457.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15457 commit 5f51cbb02d90f16477802601fa93b18664a57dfa Author: Reynold XinDate: 2016-10-13T03:45:24Z [SPARK-17830][SQL] Annotate remaining SQL APIs with InterfaceStability --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs with Inte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15457 **[Test build #66867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66867/consoleFull)** for PR 15457 at commit [`5f51cbb`](https://github.com/apache/spark/commit/5f51cbb02d90f16477802601fa93b18664a57dfa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs wi...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15457#discussion_r83140489 --- Diff: sql/core/src/main/java/org/apache/spark/sql/api/java/UDF1.java --- @@ -19,14 +19,12 @@ import java.io.Serializable; -// ** --- End diff -- I can't find FunctionRegistration anymore, so deleting this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs wi...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15457#discussion_r83140501 --- Diff: sql/core/src/main/java/org/apache/spark/sql/api/java/UDF1.java --- @@ -19,14 +19,12 @@ import java.io.Serializable; -// ** -// THIS FILE IS AUTOGENERATED BY CODE IN -// org.apache.spark.sql.api.java.FunctionRegistration -// ** +import org.apache.spark.annotation.InterfaceStability; /** * A Spark SQL UDF that has 1 arguments. */ +@InterfaceStability.Stable public interface UDF1extends Serializable { - public R call(T1 t1) throws Exception; + R call(T1 t1) throws Exception; --- End diff -- methods in interface in Java are by default public. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15455: [SPARK-16827] [Branch-2.0] Avoid reporting spill ...
Github user dafrista closed the pull request at: https://github.com/apache/spark/pull/15455 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15435 So I've been reading through some of the history with logistic regression summaries. There was a lot of discussion on how to design the abstractions for this, [here](https://github.com/apache/spark/pull/7538) and [here](https://github.com/apache/spark/pull/8197). I'm reposting some of the relevant snippets (I will comment on them in a follow up): "We'll need to use traits to fix the multiple inheritance issue:" sealed trait LogisticRegressionSummary sealed trait LogisticRegressionTrainingSummary class BinaryLogisticRegressionSummary extends LogisticRegressionSummary class BinaryLogisticRegressionTrainingSummary extends BinaryLogisticRegressionSummary with LogisticRegressionTrainingSummary "Are we planning to have a MulticlassLogisticRegressionSummary inheriting from LogisticRegressionSummary in the future because without that I'm unable to understand how using a trait would help since there is no access to the predictions dataframe." "Yes, MulticlassLogisticRegressionSummary should be analogous to the binary version, with both inheriting from LogisticRegressionSummary." ... "Synced with @jkbradley offline. Summary: We should not require end users to perform any sort of downcasting in the stabilized API. This is OK for now since the API is still experimental. Eventually we could provide two methods, a summary : LogisticRegressionSummary and a binarySummary : BInaryLogisticRegressionSummary which errors when called on a multiclass LRModel. This will be easy to implement because summary is returning the base LogisticRegressionSummary class so will not require any public API change." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66866/consoleFull)** for PR 9766 at commit [`18fa6e3`](https://github.com/apache/spark/commit/18fa6e3bb00c5a81a3d44364b3644e35263bedbd). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66866/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66866/consoleFull)** for PR 9766 at commit [`18fa6e3`](https://github.com/apache/spark/commit/18fa6e3bb00c5a81a3d44364b3644e35263bedbd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15455: [SPARK-16827] [Branch-2.0] Avoid reporting spill metrics...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15455 Merging in. Thanks. Can you also close the pr? GitHub wont' close it automatically because it is not merged into master branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropd...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15427#discussion_r83140093 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1878,17 +1878,25 @@ class Dataset[T] private[sql]( def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan { val resolver = sparkSession.sessionState.analyzer.resolver val allColumns = queryExecution.analyzed.output -val groupCols = colNames.map { colName => - allColumns.find(col => resolver(col.name, colName)).getOrElse( +val groupCols = colNames.flatMap { colName => + // It is possibly there are more than one columns with the same name, + // so we call filter instead of find. + val cols = allColumns.filter(col => resolver(col.name, colName)) + if (cols.isEmpty) { throw new AnalysisException( --- End diff -- My thought is: When an user mistakenly gives wrong column to `Dataset.drop`, it can be easily found out. But for `Dataset.dropDuplicates`, it might be harder to figure out duplicate rows are still there. So to throw an explicit exception looks more proper to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14690 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66861/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14690 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14690 **[Test build #66861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66861/consoleFull)** for PR 14690 at commit [`59fecdf`](https://github.com/apache/spark/commit/59fecdf1e889c218ac81cdf73ba3e46142d052e6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15432 Initially, this JIRA was only handling `null` as seed. If you both worry the change here, I would like to make the PR smaller as suggested initially. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15432 That is a great reference. However, is this the function described in a standard? I guess it is different for each implementation of database. For example, > The result can be null; if the argument is null, the result is the null value. MySQL treats it as 0 rather than returning `null` value. Also, I gave both references of MySQL and Hive in the PR description. Can we define the behaviour here? Do we have a target DBMS to follow? I guess it is usually Hive, PostgreSQL and MySQL as I recall. In case of PostgreSQL, it seems there is both functions for this, `random()` and `setseed()`. This works differently with MySQL also DB2 (assuming from the comment you left). So, I got rid of this. I think I have checked other examples enough. Do we usually have such explanations and tests of all the DBMS, Oracle, MySQL, SQL Server, Hive, DB2, Informix and PostgresSQL and mentions in ANSI standard? It can be problematic if we don't comply the standard which all other implementations follow but I think it'd be fine if other databases have different implementations. I am sure I am taking every look for other PRs time to time and trying to make mine sensible but I don't think we always have references from all other DBMS and explanations from ANSI standard. It is hard to change it again and that is why I am asking to review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15285 **[Test build #66865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66865/consoleFull)** for PR 15285 at commit [`bd47bd4`](https://github.com/apache/spark/commit/bd47bd46962f6e7ee0bdf1bdfa5e777a506dd506). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15414 Thanks, I'll take a more detailed look in the next couple of days. Let's also wait and see if we can get @yanboliang or @jkbradley to give an opinion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15230: [SPARK-17657] [SQL] Disallow Users to Change Tabl...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15230#discussion_r83138864 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -111,6 +111,10 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat s"as table property keys may not start with '$DATASOURCE_PREFIX' or '$STATISTICS_PREFIX':" + s" ${invalidKeys.mkString("[", ", ", "]")}") } +// External users are not allowed to set/switch the table type. +if (table.properties.contains("EXTERNAL")) { --- End diff -- I tried Hive. Hive only accepts `EXTERNAL` if users want to change the table type. That means, if users do it like `external` or `ExterRnal`, Hive just treats it as a regular property key. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15414#discussion_r83057213 --- Diff: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml + +import org.apache.spark.SparkFunSuite +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param.ParamMap +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.sql.Dataset +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ + +class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext with DefaultReadWriteTest { + + import PredictorSuite._ + + test("should support all NumericType labels and not support other types") { +val df = spark.createDataFrame(Seq( + (0, Vectors.dense(0, 2, 3)), + (1, Vectors.dense(0, 3, 9)), + (0, Vectors.dense(0, 2, 6)) +)).toDF("label", "features") + +val types = + Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DoubleType, DecimalType(10, 0)) + +val predictor = new MockPredictor() + +types.foreach { t => + predictor.fit(df.select(col("label").cast(t), col("features"))) +} + +intercept[IllegalArgumentException] { + predictor.fit(df.select(col("label").cast(StringType), col("features"))) +} + } +} + +object PredictorSuite { + + class MockPredictor(override val uid: String) +extends Predictor[Vector, MockPredictor, MockPredictionModel] { + +def this() = this(Identifiable.randomUID("mockpredictor")) + +override def train(dataset: Dataset[_]): MockPredictionModel = { + require(dataset.schema("label").dataType == DoubleType) + new MockPredictionModel(uid) +} + +override def copy(extra: ParamMap): MockPredictor = defaultCopy(extra) --- End diff -- change the copy methods to throw NotImplementedError --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15230: [SPARK-17657] [SQL] Disallow Users to Change Tabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15230#discussion_r83138671 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -111,6 +111,10 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat s"as table property keys may not start with '$DATASOURCE_PREFIX' or '$STATISTICS_PREFIX':" + s" ${invalidKeys.mkString("[", ", ", "]")}") } +// External users are not allowed to set/switch the table type. +if (table.properties.contains("EXTERNAL")) { --- End diff -- should we be case-insensitive here? e.g. `external`, `ExteRNal`, etc. are all not allowed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15285 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15285 **[Test build #66864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66864/consoleFull)** for PR 15285 at commit [`60cc130`](https://github.com/apache/spark/commit/60cc130790d8b9f5531bd7290b5c40e419e3016f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15285 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66864/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66859/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66859/consoleFull)** for PR 15307 at commit [`cafbeb7`](https://github.com/apache/spark/commit/cafbeb72f064295a6d9b07c31515e59f14c17305). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class AssertOnLastQueryStatus(condition: StreamingQueryStatus => Unit)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15285 **[Test build #66864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66864/consoleFull)** for PR 15285 at commit [`60cc130`](https://github.com/apache/spark/commit/60cc130790d8b9f5531bd7290b5c40e419e3016f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15285 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15285 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66863/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15285 **[Test build #66863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66863/consoleFull)** for PR 15285 at commit [`89b9acd`](https://github.com/apache/spark/commit/89b9acd8642640f987837caa3df23b68a043b43f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15414 @sethah I have maken some modification according to the comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15285 **[Test build #66863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66863/consoleFull)** for PR 15285 at commit [`89b9acd`](https://github.com/apache/spark/commit/89b9acd8642640f987837caa3df23b68a043b43f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15432 Let me show you an example: https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2z_bif_rand.html This is the official document of `rand` in DB2 z/OS. Below is about the input parameter: 1. If numeric-expression is specified, it is used as the seed value. The argument must be an expression that returns a value of a built-in integer data type (SMALLINT or INTEGER). The value must be between 0 and 2,147,483,646. 2. The argument must be an expression that returns a value of a built-in integer data type (SMALLINT or INTEGER). The value must be between 0 and 2,147,483,646. 3. The result can be null; if the argument is null, the result is the null value. 4. RAND(0) is processed the same as RAND(). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15402: [SPARK-17835][ML][MLlib] Optimize NaiveBayes mlli...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15402 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15402: [SPARK-17835][ML][MLlib] Optimize NaiveBayes mllib wrapp...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15402 Merged into master. Thanks for review. @zhengruifeng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15406: [Spark-17745][ml][PySpark] update NB python api -...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15406 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15406: [Spark-17745][ml][PySpark] update NB python api - add we...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15406 LGTM2, merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r83134970 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry: FunctionRegistry) extends // /** --- End diff -- I can turn it on, but it would make the function less readable, especially for the following statements where it beyond line length limitation. ``` case 14 => register(name, udf.asInstanceOf[UDF13[_, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15408 **[Test build #66860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66860/consoleFull)** for PR 15408 at commit [`b74fb36`](https://github.com/apache/spark/commit/b74fb36de321fd03b48f0a6b9b772589df3d84b9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15456 **[Test build #66862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66862/consoleFull)** for PR 15456 at commit [`98e7015`](https://github.com/apache/spark/commit/98e70150f26ee6d1fd0e587b59ba7467d70dcfe3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66858/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15253: [SPARK-17678][REPL][Branch-1.6] Honor spark.replClassSer...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15253 @zsxwing , would you mind taking a look at this fix for 1.6 branch, thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66858/consoleFull)** for PR 15307 at commit [`00a7415`](https://github.com/apache/spark/commit/00a741519e07fdda6dc2e4161e0f0d4382ef7c0a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15456: [SPARK-17686][Core] Support printing out scala an...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/15456 [SPARK-17686][Core] Support printing out scala and java version with spark-submit --version command ## What changes were proposed in this pull request? In our universal gateway service we need to specify different jars to Spark according to scala version. For now only after launching Spark application can we know which version of Scala it depends on. It makes hard for us to support different Scala + Spark versions to pick the right jars. So here propose to print out Scala version according to Spark version in "spark-submit --version", so that user could leverage this output to make the choice without needing to launching application. ## How was this patch tested? Manually verified in local environment. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-17686 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15456.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15456 commit 98e70150f26ee6d1fd0e587b59ba7467d70dcfe3 Author: jerryshaoDate: 2016-10-13T02:07:46Z print out scala and java version with --version command --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14690 **[Test build #66861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66861/consoleFull)** for PR 14690 at commit [`59fecdf`](https://github.com/apache/spark/commit/59fecdf1e889c218ac81cdf73ba3e46142d052e6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15432 (Oh, I am making a comment via my phone. Sorry for occasional closing and reopening here..) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...
GitHub user HyukjinKwon reopened a pull request: https://github.com/apache/spark/pull/15432 [SPARK-17854][SQL] rand/randn allows null/long as input seed ## What changes were proposed in this pull request? This PR proposes `rand`/`randn` accept `null` as input in Scala/SQL and `LongType` as input in SQL. In this case, it treats the values as `0`. So, this PR includes both changes below: - `null` support It seems MySQL also accepts this. ```sql mysql> select rand(0); +-+ | rand(0) | +-+ | 0.15522042769493574 | +-+ 1 row in set (0.00 sec) mysql> select rand(NULL); +-+ | rand(NULL) | +-+ | 0.15522042769493574 | +-+ 1 row in set (0.00 sec) ``` and also Hive does according to [HIVE-14694](https://issues.apache.org/jira/browse/HIVE-14694) So the codes below: ```scala spark.range(1).selectExpr("rand(null)").show() ``` prints.. **Before** ``` Input argument to rand must be an integer literal.;; line 1 pos 0 org.apache.spark.sql.AnalysisException: Input argument to rand must be an integer literal.;; line 1 pos 0 at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465) at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:444) ``` **After** ``` +---+ |rand(CAST(NULL AS INT))| +---+ |0.13385709732307427| +---+ ``` - `LongType` support in SQL. In addition, it make the function allows to take `LongType` consistently within Scala/SQL. In more details, the codes below: ```scala spark.range(1).select(rand(1), rand(1L)).show() spark.range(1).selectExpr("rand(1)", "rand(1L)").show() ``` prints.. **Before** ``` +--+--+ | rand(1)| rand(1)| +--+--+ |0.2630967864682161|0.2630967864682161| +--+--+ Input argument to rand must be an integer literal.;; line 1 pos 0 org.apache.spark.sql.AnalysisException: Input argument to rand must be an integer literal.;; line 1 pos 0 at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465) at ``` **After** ``` +--+--+ | rand(1)| rand(1)| +--+--+ |0.2630967864682161|0.2630967864682161| +--+--+ +--+--+ | rand(1)| rand(1)| +--+--+ |0.2630967864682161|0.2630967864682161| +--+--+ ``` ## How was this patch tested? Unit tests in `DataFrameSuite.scala` and `RandomSuite.scala`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-17854 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15432.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15432 commit 7fa7db22dd4f2ba88ab1f09e4b776003b3f62fdb Author: hyukjinkwonDate: 2016-10-11T09:21:18Z rand/randn allows null as input seed commit 6f8f3f33f9b67d77285048bfd7d794990e072b8a Author: hyukjinkwon Date: 2016-10-12T12:23:56Z Use ExpectsInputTypes and allow LongType and IntegerType commit a99f674ff9b9cebb730a1e290c0fa05af8627f1d Author: hyukjinkwon Date: 2016-10-12T14:31:11Z Override constructor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org