[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16782 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73713/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16782 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17129: [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17129 **[Test build #73717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73717/testReport)** for PR 17129 at commit [`0d84296`](https://github.com/apache/spark/commit/0d84296ca09423121ed8707eb0c083516bb1440c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/16782 I think this is ready for a final review @jkbradley @davies - thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17110: [SPARK-19635][ML] DataFrame-based API for chi squ...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17110#discussion_r103813679 --- Diff: mllib/src/test/scala/org/apache/spark/ml/stat/ChiSquareSuite.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.stat + +import java.util.Random + +import org.apache.spark.{SparkException, SparkFunSuite} +import org.apache.spark.ml.feature.LabeledPoint +import org.apache.spark.ml.linalg.{Vector, Vectors} +import org.apache.spark.ml.util.DefaultReadWriteTest +import org.apache.spark.ml.util.TestingUtils._ +import org.apache.spark.mllib.util.MLlibTestSparkContext + +class ChiSquareSuite + extends SparkFunSuite with MLlibTestSparkContext with DefaultReadWriteTest { + + import testImplicits._ + + test("test DataFrame of labeled points") { +// labels: 1.0 (2 / 6), 0.0 (4 / 6) +// feature1: 0.5 (1 / 6), 1.5 (2 / 6), 3.5 (3 / 6) +// feature2: 10.0 (1 / 6), 20.0 (1 / 6), 30.0 (2 / 6), 40.0 (2 / 6) +val data = Seq( + LabeledPoint(0.0, Vectors.dense(0.5, 10.0)), + LabeledPoint(0.0, Vectors.dense(1.5, 20.0)), + LabeledPoint(1.0, Vectors.dense(1.5, 30.0)), + LabeledPoint(0.0, Vectors.dense(3.5, 30.0)), + LabeledPoint(0.0, Vectors.dense(3.5, 40.0)), + LabeledPoint(1.0, Vectors.dense(3.5, 40.0))) +for (numParts <- List(2, 4, 6, 8)) { + val df = spark.createDataFrame(sc.parallelize(data, numParts)) + val chi = ChiSquare.test(df, "features", "label") + val (pValues: Vector, degreesOfFreedom: Array[Int], statistics: Vector) = +chi.select("pValues", "degreesOfFreedom", "statistics") + .as[(Vector, Array[Int], Vector)].head() + assert(pValues ~== Vectors.dense(0.6873, 0.6823) relTol 1e-4) + assert(degreesOfFreedom === Array(2, 3)) + assert(statistics ~== Vectors.dense(0.75, 1.5) relTol 1e-4) +} + } + + test("large number of features (SPARK-3087)") { +// Test that the right number of results is returned +val numCols = 1001 +val sparseData = Array( + LabeledPoint(0.0, Vectors.sparse(numCols, Seq((100, 2.0, + LabeledPoint(0.1, Vectors.sparse(numCols, Seq((200, 1.0) +val df = spark.createDataFrame(sparseData) +val chi = ChiSquare.test(df, "features", "label") +val (pValues: Vector, degreesOfFreedom: Array[Int], statistics: Vector) = + chi.select("pValues", "degreesOfFreedom", "statistics") +.as[(Vector, Array[Int], Vector)].head() +assert(pValues.size === numCols) +assert(degreesOfFreedom.length === numCols) +assert(statistics.size === numCols) +assert(pValues(1000) !== null) // SPARK-3087 + } + + test("fail on continuous features or labels") { +// Detect continuous features or labels +val random = new Random(11L) +val continuousLabel = + Seq.fill(10)(LabeledPoint(random.nextDouble(), Vectors.dense(random.nextInt(2 --- End diff -- can the special value that is above the max categorical limit of 1 be refactored to a constant? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #3588 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3588/testReport)** for PR 14299 at commit [`6b8ae85`](https://github.com/apache/spark/commit/6b8ae85dc362ebef0f8d416a8e35970f57130a9f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17106: [SPARK-19775][SQL] Remove an obsolete `partitionBy().ins...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17106 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16954: [SPARK-18874][SQL] First phase: Deferring the correlated...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16954 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16954: [SPARK-18874][SQL] First phase: Deferring the correlated...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16954 **[Test build #73712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73712/testReport)** for PR 16954 at commit [`3b4bb90`](https://github.com/apache/spark/commit/3b4bb90deb34e6c1bb1671c76b66c83741937578). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16954: [SPARK-18874][SQL] First phase: Deferring the correlated...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16954 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73712/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16944 **[Test build #73720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73720/testReport)** for PR 16944 at commit [`281bc6d`](https://github.com/apache/spark/commit/281bc6d53fbd0c0b5a99224d700b7d929397f090). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified column...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17067 **[Test build #73722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73722/testReport)** for PR 17067 at commit [`5594eb0`](https://github.com/apache/spark/commit/5594eb0864376bbac617bf744755330f1e7bff49). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified column...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17067 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73722/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified column...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17067 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16782 **[Test build #73713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73713/testReport)** for PR 16782 at commit [`e578320`](https://github.com/apache/spark/commit/e5783209dff55a6010ca17da819542f7a1cdb12c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17081 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17081 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17081 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73716/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17100 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17100 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73714/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17100 **[Test build #73714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73714/testReport)** for PR 17100 at commit [`65b9596`](https://github.com/apache/spark/commit/65b9596c229ac2b62ecdfeb98e541d2ea92e078d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17106: [SPARK-19775][SQL] Remove an obsolete `partitionB...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17106 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #3588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3588/testReport)** for PR 14299 at commit [`6b8ae85`](https://github.com/apache/spark/commit/6b8ae85dc362ebef0f8d416a8e35970f57130a9f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17124: [SPARK-19779][SS]Delete needless tmp file after restart ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17124 **[Test build #3589 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3589/testReport)** for PR 17124 at commit [`5600776`](https://github.com/apache/spark/commit/5600776066e083655fe328915b56936775273e15). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17130 **[Test build #73719 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73719/testReport)** for PR 17130 at commit [`fdce240`](https://github.com/apache/spark/commit/fdce2404688fee1b22154258de5d85f0cee8aa4b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17081 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17081 **[Test build #73715 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73715/testReport)** for PR 17081 at commit [`f1da0a4`](https://github.com/apache/spark/commit/f1da0a4cf457f4efb6128beca3c08ccf95ef37a0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17031: [SPARK-19702][MESOS] Increase default refuse_seconds tim...
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17031 Your understanding is correct. You must set refuse_seconds for all your frameworks to some value N, such that N >= #frameworks. So for this change, if some operator is running >120 frameworks, they may need to configure this value. However, I'm not aware of any Mesos cluster on Earth running that many frameworks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17124: [SPARK-19779][SS]Delete needless tmp file after restart ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17124 **[Test build #3589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3589/testReport)** for PR 17124 at commit [`5600776`](https://github.com/apache/spark/commit/5600776066e083655fe328915b56936775273e15). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103838737 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -77,6 +77,10 @@ trait CodegenSupport extends SparkPlan { */ final def produce(ctx: CodegenContext, parent: CodegenSupport): String = executeQuery { this.parent = parent + +// to track the existence of apply() call in the current produce-consume cycle +// if apply is not called (e.g. in aggregation), we can skip shoudStop in the inner-most loop +parent.shouldStopRequired = false --- End diff -- Do we need this? The default value of `shouldStopRequired` is already false. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17129: [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredR...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17129 @mgummelt I merged this to 2.1 (and so you will have to close this PR manually) but the cherry-pick to 2.0.x doesn't succeed either, and it's non-trivial. If you're willing to evaluate the conflict and resolve it for 2.0 I can merge that too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r103822448 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -240,12 +240,13 @@ class FPGrowthModel private[ml] ( val predictUDF = udf((items: Seq[_]) => { if (items != null) { val itemset = items.toSet -brRules.value.flatMap(rule => - if (items != null && rule._1.forall(item => itemset.contains(item))) { +brRules.value.flatMap { rule => --- End diff -- Nit, while we're here -- why change this bit? Or if simplifying, what about ``` brRules.value.filter(_._1_forall(itemset.contains)).flatMap(_._2.filter(!itemset.contains(_))) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73718/testReport)** for PR 17059 at commit [`3050f6e`](https://github.com/apache/spark/commit/3050f6eeda769127196e8d1ad4b432b92af0ea7c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17059 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73718/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified column...
Github user skambha commented on the issue: https://github.com/apache/spark/pull/17067 - Changes to the SQLQueryTestSuite framework to mask the exprId so I can add the -ve cases as well using this framework. - Added -ve test cases to the SQLQueryTestSuite framework and so removed the hive specific test suite. For the hive table testcase, I will add that test as part of the actual code changes PR. - I synced up the codeline and there was one test output inner-join.sql.out that needed a comment to be updated, so I have updated that as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified column...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17067 **[Test build #73722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73722/testReport)** for PR 17067 at commit [`5594eb0`](https://github.com/apache/spark/commit/5594eb0864376bbac617bf744755330f1e7bff49). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17129: [SPARK-19373][MESOS] Base spark.scheduler.minRegi...
Github user mgummelt closed the pull request at: https://github.com/apache/spark/pull/17129 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17120 @steveloughran thanks for the comments. @marmbrus @zsxwing it'd be great if you could share some thoughts! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73718/testReport)** for PR 17059 at commit [`3050f6e`](https://github.com/apache/spark/commit/3050f6eeda769127196e8d1ad4b432b92af0ea7c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17106: [SPARK-19775][SQL] Remove an obsolete `partitionBy().ins...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17106 Thank you for merging, @srowen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17081 **[Test build #73716 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73716/testReport)** for PR 17081 at commit [`f79f12c`](https://github.com/apache/spark/commit/f79f12c552ee1721295c347744fc5f92f048c74b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16944 **[Test build #73720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73720/testReport)** for PR 16944 at commit [`281bc6d`](https://github.com/apache/spark/commit/281bc6d53fbd0c0b5a99224d700b7d929397f090). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17130 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17130 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73719/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17130 **[Test build #73719 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73719/testReport)** for PR 17130 at commit [`fdce240`](https://github.com/apache/spark/commit/fdce2404688fee1b22154258de5d85f0cee8aa4b). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaFPGrowthExample ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16909: [SPARK-13450] Introduce ExternalAppendOnlyUnsafeRowArray...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16909 **[Test build #73723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73723/testReport)** for PR 16909 at commit [`173b5d5`](https://github.com/apache/spark/commit/173b5d57d180603133ebebd1c64dad424aa8d61a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16944 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16944 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73720/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17081 **[Test build #73724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73724/testReport)** for PR 17081 at commit [`a8c1dea`](https://github.com/apache/spark/commit/a8c1deab0fc8e59863bf4a3d3b551f77fbebbc6d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17110: [SPARK-19635][ML] DataFrame-based API for chi squ...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17110#discussion_r103813169 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/ChiSquare.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.stat + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.linalg.{Vector, Vectors, VectorUDT} +import org.apache.spark.ml.util.SchemaUtils +import org.apache.spark.mllib.linalg.{Vectors => OldVectors} +import org.apache.spark.mllib.regression.{LabeledPoint => OldLabeledPoint} +import org.apache.spark.mllib.stat.{Statistics => OldStatistics} +import org.apache.spark.sql.DataFrame +import org.apache.spark.sql.functions.col + + +/** + * :: Experimental :: + * + * Chi-square hypothesis testing for categorical data. + * + * See http://en.wikipedia.org/wiki/Chi-squared_test;>Wikipedia for more information + * on the Chi-squared test. + */ +@Experimental +@Since("2.2.0") +object ChiSquare { + + /** Used to construct output schema of tests */ + private case class ChiSquareResult( + pValues: Vector, + degreesOfFreedom: Array[Int], + statistics: Vector) + + /** + * Conduct Pearson's independence test for every feature against the label across the input RDD. + * For each feature, the (feature, label) pairs are converted into a contingency matrix for which + * the Chi-squared statistic is computed. All label and feature values must be categorical. + * + * The null hypothesis is that the occurrence of the outcomes is statistically independent. + * + * @param dataset DataFrame of categorical labels and categorical features. + * Real-valued features will be treated as categorical for each distinct value. + * @param featuresCol Name of features column in dataset, of type `Vector` (`VectorUDT`) + * @param labelCol Name of label column in dataset, of any numerical type + * @return DataFrame containing the test result for every feature against the label. + * This DataFrame will contain a single Row with the following fields: + * - `pValues: Vector` + * - `degreesOfFreedom: Array[Int]` + * - `statistics: Vector` + * Each of these fields has one value per feature. + */ + @Since("2.2.0") + def test(dataset: DataFrame, featuresCol: String, labelCol: String): DataFrame = { +val spark = dataset.sparkSession +import spark.implicits._ + +SchemaUtils.checkColumnType(dataset.schema, featuresCol, new VectorUDT) +SchemaUtils.checkNumericType(dataset.schema, labelCol) +val rdd = dataset.select(col(labelCol).cast("double"), col(featuresCol)).as[(Double, Vector)] + .rdd.map { case (label, features) => OldLabeledPoint(label, OldVectors.fromML(features)) } +val testResults = OldStatistics.chiSqTest(rdd) --- End diff -- it would be nice to optimize this in the future -- since we have schema, if the label and features have been converted to categorical, we can get the unique values right away instead of having to re-generate the maps for distinct labels and features --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17129: [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredR...
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17129 2.1 is sufficient. Thanks for the merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17123 **[Test build #3590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3590/testReport)** for PR 17123 at commit [`b3f98b6`](https://github.com/apache/spark/commit/b3f98b66e63c9c61c69a1429819feb236fad56c7). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103837174 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -434,6 +434,17 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range) val input = ctx.freshName("input") // Right now, Range is only used when there is one upstream. ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") + +val localIdx = ctx.freshName("localIdx") +val localEnd = ctx.freshName("localEnd") +val range = ctx.freshName("range") +// we need to place consume() before calling isShouldStopRequired +val body = consume(ctx, Seq(ev)) +val shouldStop = if (isShouldStopRequired) { --- End diff -- `isShouldStopRequired` complicates the logic. Is it necessary? How much improvement it brings? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17100 **[Test build #73714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73714/testReport)** for PR 17100 at commit [`65b9596`](https://github.com/apache/spark/commit/65b9596c229ac2b62ecdfeb98e541d2ea92e078d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17081 **[Test build #73716 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73716/testReport)** for PR 17081 at commit [`f79f12c`](https://github.com/apache/spark/commit/f79f12c552ee1721295c347744fc5f92f048c74b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16782 **[Test build #73709 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73709/testReport)** for PR 16782 at commit [`8dafc20`](https://github.com/apache/spark/commit/8dafc20fd2bbbe9678fa44f7216982fdd0955c14). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `class KeywordOnlyTests(unittest.TestCase):` * `class Wrapped(object):` * `class Setter(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16782 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73709/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16782 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15505 **[Test build #73721 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73721/testReport)** for PR 15505 at commit [`b2b1eec`](https://github.com/apache/spark/commit/b2b1eec3c41873eb217cf041f3cf6d71d4cfa265). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17125: [SPARK-19211][SQL] Explicitly prevent Insert into View o...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/17125 cc @gatorsmile @cloud-fan Please have a look at this when you have time, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...
Github user stanzhai commented on the issue: https://github.com/apache/spark/pull/17099 ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17129: [SPARK-19373][MESOS] Base spark.scheduler.minRegi...
GitHub user mgummelt opened a pull request: https://github.com/apache/spark/pull/17129 [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio … …on registered cores rather than accepted cores See JIRA Unit tests, Mesos/Spark integration tests cc skonto susanxhuynh Author: Michael GummeltCloses #17045 from mgummelt/SPARK-19373-registered-resources. ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mesosphere/spark SPARK-19373-registered-resources-2.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17129 commit 0d84296ca09423121ed8707eb0c083516bb1440c Author: Michael Gummelt Date: 2017-02-28T23:10:55Z [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio on registered cores rather than accepted cores See JIRA Unit tests, Mesos/Spark integration tests cc skonto susanxhuynh Author: Michael Gummelt Closes #17045 from mgummelt/SPARK-19373-registered-resources. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17129: [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredR...
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17129 @srowen As discussed here https://github.com/apache/spark/pull/17045#issuecomment-283192230, this is the backport of SPARK-19373 into branch-2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17129: [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17129 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73717/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17129: [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17129 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17129: [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17129 **[Test build #73717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73717/testReport)** for PR 17129 at commit [`0d84296`](https://github.com/apache/spark/commit/0d84296ca09423121ed8707eb0c083516bb1440c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17123 **[Test build #3590 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3590/testReport)** for PR 17123 at commit [`b3f98b6`](https://github.com/apache/spark/commit/b3f98b66e63c9c61c69a1429819feb236fad56c7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17081 **[Test build #73715 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73715/testReport)** for PR 17081 at commit [`f1da0a4`](https://github.com/apache/spark/commit/f1da0a4cf457f4efb6128beca3c08ccf95ef37a0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17130 [SPARK-19791] [ML] Add doc and example for fpgrowth ## What changes were proposed in this pull request? Add a new section for fpm Add Example for FPGrowth in scala and Java ## How was this patch tested? local doc generation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark fpmdoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17130.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17130 commit fdce2404688fee1b22154258de5d85f0cee8aa4b Author: Yuhao YangDate: 2017-03-01T23:47:53Z fpm doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17081 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73715/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17081 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17113 @tgravescs , the main scenario is external shuffle service unavailable scenario, this could be happened in working preserving + NM failure situation. Also like Mesos + external standalone shuffle service could introduce this issue. In scenarios like rolling upgrade I agreed that NM unavailability is short and this issue could be self-recoverable. One scenario I'm simulating is NM failure. In my test, when NM is failed, RM will detect this failure after 10 minutes by default, before that executors on that NM can still serve the tasks, and Spark doesn't blacklist these containers, so re-issued tasks could still be failed. `FetchFailed` will immediately abort the running stage and re-issue parent stage, configurations like failed task number per stage may not be so useful, so my thinking is to backlist these executors/nodes immediately after fetch failure. This proposal may have many problems for different scenario, that's why I opened here for comments. If you don't think it is necessary to fix then I could close it. @markhamstra this patch is targeted to master branch and all the investigations and changes is based on master branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17052 \cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17080 \cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17099: [SPARK-19766][SQL] Constant alias columns in INNE...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17099 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17099 Thanks! Merging to master/2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17099: [SPARK-19766][SQL] Constant alias columns in INNER JOIN ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17099 @stanzhai Could you submit another PR to backport it to Spark 2.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17093: [SPARK-19761][SQL]create InMemoryFileIndex with an empty...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17093 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17121: [SPARK-19787][ML] Changing the default parameter of regP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17121 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17123 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17122 **[Test build #73689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73689/testReport)** for PR 17122 at commit [`47f405c`](https://github.com/apache/spark/commit/47f405c32ffac9b0356050c0d6bbb8c0ea5e0f51). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17119: [SPARK-19784][SQL][WIP]refresh table after alter the loc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17119 **[Test build #73690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73690/testReport)** for PR 17119 at commit [`dccac9a`](https://github.com/apache/spark/commit/dccac9a02e6191d09782d8a97d7d9a4ab0edc92e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17120 **[Test build #73691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73691/testReport)** for PR 17120 at commit [`aeb10d1`](https://github.com/apache/spark/commit/aeb10d100a24ca644745fb8b26985b584fd5118e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9571 **[Test build #73695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73695/testReport)** for PR 9571 at commit [`d8ae876`](https://github.com/apache/spark/commit/d8ae876505de9599480929905f88612dcbc3905b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17059 **[Test build #73693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73693/testReport)** for PR 17059 at commit [`a1e32aa`](https://github.com/apache/spark/commit/a1e32aa3b600841118060cdb3a299b6569438816). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17116: [SPARK-18890][CORE](try 2) Move task serialization from ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17116 **[Test build #73692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73692/testReport)** for PR 17116 at commit [`1c26e8c`](https://github.com/apache/spark/commit/1c26e8c98317ec6f97c00da3262050959e1d6910). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15505 **[Test build #73694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73694/testReport)** for PR 15505 at commit [`335b7b9`](https://github.com/apache/spark/commit/335b7b937a7aaa355a6810b9e8d8080732f19078). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9571 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73695/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9571 **[Test build #73695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73695/testReport)** for PR 9571 at commit [`d8ae876`](https://github.com/apache/spark/commit/d8ae876505de9599480929905f88612dcbc3905b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9571 Style police. FWIW I think the lines that failed were already >100 chars, it was just they got indented slightly more. ``` Scalastyle checks failed at following occurrences: [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:275: File line length exceeds 100 characters [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:283: File line length exceeds 100 characters [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:284: File line length exceeds 100 characters [error] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:498: File line length exceeds 100 characters ``` will fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17093: [SPARK-19761][SQL]create InMemoryFileIndex with a...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17093 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17120 -1, non binding I understand the rationale for this, to aid migration from s3/s3n to s3a, but given the need is schema independence, you should be using the full path name from `Path.getUri().getPath()` instead of getName(), which means only the filename is checked. match only on name and the two files ``` s3a://bucket/incoming/dataset.avro s3a://bucket/2015/12/dataset.avro ``` will be mistaken for the same file, even when they aren't. If this scenario arises then someone will end up fielding support calls about missing data, or worse, incorrect query results. If you use the full path, that problem goes away and the filtering is only on schema and filesystem/bucket name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9571 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17093: [SPARK-19761][SQL]create InMemoryFileIndex with an empty...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17093 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16909: [SPARK-13450] Introduce ExternalAppendOnlyUnsafeRowArray...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16909 **[Test build #73723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73723/testReport)** for PR 16909 at commit [`173b5d5`](https://github.com/apache/spark/commit/173b5d57d180603133ebebd1c64dad424aa8d61a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17099: [SPARK-19766][SQL] Constant alias columns in INNE...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17099#discussion_r103844650 --- Diff: sql/core/src/test/resources/sql-tests/results/inner-join.sql.out --- @@ -0,0 +1,68 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 13 --- End diff -- Actually, this number is wrong. Next time, please do not manually change this file. You should run the command to generate the file. @stanzhai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user witgo commented on the issue: https://github.com/apache/spark/pull/15505 @kayousterhout It takes some time to update the test report. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16954#discussion_r103850385 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -365,17 +385,66 @@ object TypeCoercion { } /** - * Convert the value and in list expressions to the common operator type - * by looking at all the argument types and finding the closest one that - * all the arguments can be cast to. When no common operator type is found - * the original expression will be returned and an Analysis Exception will - * be raised at type checking phase. + * Handles type coercion for both IN expression with subquery and IN + * expressions without subquery. + * 1. In the first case, find the common type by comparing the left hand side + *expression types against corresponding right hand side expression derived + *from the subquery expression's plan output. Inject appropriate casts in the + *LHS and RHS side of IN expression. + * + * 2. In the second case, convert the value and in list expressions to the + *common operator type by looking at all the argument types and finding + *the closest one that all the arguments can be cast to. When no common + *operator type is found the original expression will be returned and an + *Analysis Exception will be raised at the type checking phase. */ object InConversion extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { // Skip nodes who's children have not been resolved yet. case e if !e.childrenResolved => e + // Handle type casting required between value expression and subquery output + // in IN subquery. + case i @ In(a, Seq(ListQuery(sub, children, exprId))) if !i.resolved => +// lhs is the value expression of IN subquery. --- End diff -- `lhs` -> `LHS`. Please correct all the similar cases in comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16954#discussion_r103852025 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -83,29 +95,150 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { } /** - * Given a predicate expression and an input plan, it rewrites - * any embedded existential sub-query into an existential join. - * It returns the rewritten expression together with the updated plan. - * Currently, it does not support null-aware joins. Embedded NOT IN predicates - * are blocked in the Analyzer. + * Given a predicate expression and an input plan, it rewrites any embedded existential sub-query + * into an existential join. It returns the rewritten expression together with the updated plan. + * Currently, it does not support NOT IN nested inside a NOT expression. This case is blocked in + * the Analyzer. */ private def rewriteExistentialExpr( exprs: Seq[Expression], plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => e transformUp { -case PredicateSubquery(sub, conditions, nullAware, _) => - // TODO: support null-aware join +case Exists(sub, conditions, exprId) => --- End diff -- `case Exists(sub, conditions, _)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org